A Sample Bash Script: Reading, Transforming and Writing to Files

Data transformation is a crucial step that can be implemented in a shell script, using the sed utility to read and transform an input file. After transformation, it writes to output files.

Importance of data transformation

Data transformation in Linux involves converting data from one format to another to make it more useful. This process includes manipulating files and text, which helps users write better scripts.

Real-world examples of data transformation using ‘sed’

In sed, some real-world examples of data transformation include:

Replacing text

To replace text in a file using Sed, you can use the following command. It will replace all occurrences of “old text” with “new text” in the specified file.

Command: sed 's/apple/orange/' input.txt

Deleting lines

Sed can be used to delete lines from a file. For example, deleting all lines containing the word “delete” in a file can be done using the command.

Command: sed '/pattern/d' input-file > output-file

Inserting text

Sed can be used to insert text into a file. For example, inserting the text “new line” before every occurrence of the word “insert” in a file can be done using the command.

Command: sed '/pattern/i\text-to-insert' input-file > output-file

Rearranging text

Sed can be used to rearrange the order of text in a file. For example, swapping the position of two words in a file can be done using the command.

Command: echo "Hello, World" | sed 's/\(.*\), \(.*\)/\2, \1/'

Extracting data

Sed can be used to extract data from a file. For example, removing all email addresses from a file can be done using the command.

Command: grep -E -o '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b' input.txt

Bash Script: Reads input, and writes transformed data to output files

Below is an example script that uses the Sed utility to replace all occurrences of the word ‘foo’ with ‘bar’ globally.

#!/bin/bash

# Read data from input file
input_file="input.txt"
data=$(cat $input_file)

# Transform the data
transformed_data=$(echo $data | sed 's/foo/bar/g')

# Write transformed data to multiple files
output_dir="output"
mkdir -p $output_dir

# Write transformed data to file1.txt
file1="$output_dir/file1.txt"
echo $transformed_data > $file1

# Write transformed data to file2.txt
file2="$output_dir/file2.txt"
echo $transformed_data > $file2

# Write transformed data to file3.txt
file3="$output_dir/file3.txt"
echo $transformed_data > $file3

echo "Data transformation and file writing completed!"

Step-by-step explanation

The script reads the content of an input file using the “cat” command. It then applies transformations using the “sed” command and stores the result in the “transformed_data” variable.

The output_dir variable decides the folder where the output files are stored. If the folder doesn’t exist, it will be created automatically. The paths of the output files (file1, file2, and file3) are specified within the output_dir.

The transformed data is written to each output file using the “>” operator. Finally, a message indicating the completion of the data transformation and file writing is printed.

Conclusion

You can modify the script to suit your specific needs, including changes to the transformation process or the number and names of output files. The tips provided can help improve your ability to write scripts.

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.