1. Overview

This article will guide us on how to use multiple tools to arrive at the same goal. Our goal will be to append a line to a file only if that file does not have such a line in it already. This is very useful when we want to, for example, update configuration files with some general includes.

2. Our Example Files

Throughout this article, we’ll be working with a real-world example.

Let’s imagine we want to ensure that all our users’ .bashrc files contain a line to include our special project variables. Some users already have this include. They are power-users who might have some personal overrides and definitely don’t want us to mess around with their .bashrc files.

So, regular users’ .bashrc would contain:

# regular.bashrc

VAR1=foo

Whereas the power users’ .bashrc would be:

# power-user.bashrc

OVERRIDE_VAR1=foo
source shared/projectx/project.bashrc

PROJECT_VAR=$OVERRIDE_VAR1

Finally, our project’s .bashrc contains the special variables we want to be set for all users:

# shared/projectx/project.bashrc
# Source this file in your .bashrc to include ProjectX's special variables

PROJECT_VAR=bar

Our goal is to add the line source shared/projectx/project.bashrc to regular.bashrc but not to power-user.bashrc from the command line.

3. Using grep and echo

Our first tools at hand when using the Linux command line are grep and echo. We’ll use grep to search if the line exists in the file, and echo to write it to the file:

for bashrc_file in *.bashrc; do
    if ! grep -qF 'source shared/projectx/project.bashrc' $bashrc_file; then
        echo "source shared/projectx/project.bashrc" >> $bashrc_file
    fi
done

We combine here two flags of the grep command to check for the existence of our line:

  • -q tells grep to be quiet — we only want to check the exit status of our command
  • -F instructs grep to interpret our pattern as a fixed-string, not as a regular expression

4. Using sed

sed, shorthand for Stream Editor, is the improved version of the original Unix editor ed.

We can use sed to search the file for our string and try to quit using the “q” command once we find our string and to append if we didn’t find such a string.

Unfortunately, that won’t work as sed reads our file line by line and then writes it back, so quitting will stop sed from continuing to read the file, effectively causing the rest of the file to be deleted.

Therefore, we’ll use a clever little trick to test for the existence of our string instead.

When we find the string, we’ll copy the line to the hold space, after which we’ll continue to the end of the file.

At this point, we have the last line in our pattern space and the string we’re searching for in the hold space, or an empty hold space in case the string was not found.

Here comes the tricky part. Sed has a test command that permits us to check for a successful substitution in the pattern space. To use this functionality, we’ll first exchange the hold and pattern spaces and substitute our string with an empty pattern, effectively erasing it.

At this point, we can test but not before we get back our last line from the hold space.

Failing the test command will cause sed to finish the cycle, effectively not running the next command which will append the string in case the string was not found.

Let’s put it all together:

source_str='source shared/projectx/project.bashrc'
for bashrc_file in *.bashrc; do
    sed -i -e "\|$source_str|h;"      `# Search for the source string and copy it to the hold space` \
        -e "\${"                      `# Go to the end of the file and run the following commands` \
        -e "x;"                       `# Exchange the last line with the hold space` \
        -e "s|$source_str||;"         `# Erase the source string if it was actually found` \
        -e "{g;"                      `# Bring back the last line` \
        -e "t};"                      `# Test if the substitution succeeded (the source string was found)` \
        -e "a\\" -e "$source_str"     `# Append the source string if we didn't move to the next cycle` \
        -e "}" $bashrc_file
done

5. Using awk

The awk programming language was created to process large text files efficiently.

Let’s remember our goal: append a line in a file where the line does not exist, and leave it alone if it does.

Unlike sed‘s -i flag, which allows in-place modification of a file, awk does not allow us to write to the same file we’re reading from. For that, we’ll use the Linux mktemp utility.

We’ll start by printing the record with the print command. Next, we’ll set the found variable to 1 if we find the source string. Finally, once we reach the last record, if found is not set, we’ll print our string.

To use the environment variable, we’ll use the awk command’s -v flag to pass along the environment variable set by our shell.

Let’s put it all together:

source_str='source shared/projectx/project.bashrc'
tempfile=$(mktemp)
for bashrc_file in *.bashrc; do
    awk -v source_str="$source_str" \
      '{print}; $0~source_str {found=1}; END {if (!found) {print source_str}}' $bashrc_file > $tempfile
    mv $tempfile $bashrc_file
done

6. Comparing Our Approaches

When comparing our approaches, we’d like to rate them according to clarity and performance.

In terms of clarity, our definite winner is using grep and echo. It’s clear and concise, and besides two flags, of which one is optional for our solution, nothing else needs to be known in terms of usage.

But what about performance?

We’ll compare two scenarios:

  1. A few of our users (10%) need the file to be edited
  2. Most of our users (90%) need the file to be edited

6.1. Creating Our Test Scenarios

We’ll start by creating 500 files of variable length with some random words:

for number in {1..500}; do
    cat /usr/share/dict/words | sort -R | head -$(shuf -i 1000-10000 -n 1) > file$number.bashrc
done

We’ll use our short script with some slight modifications to insert our source line in some of the files:

for bashrc_file in $(ls | shuf -n 50); do
    if ! grep -qF 'source shared/projectx/project.bashrc' $bashrc_file; then
        echo "source shared/projectx/project.bashrc" >> $bashrc_file
        cat $bashrc_file | shuf -o $bashrc_file
     fi
done

To time our results, we’ll use the time tool. Running time before any Linux command will provide us with three different time values for the command we ran. We’ll want to take a look at the real value, which shows us what’s also known as the wall-clock time.

6.2. Analyzing the Results

Let’s take a look at the results:

grep+echo

awk

sed

Change 50 files

0.358 sec

2.047 sec

2.194 sec

Change 450 files

0.378 sec

2.171 sec

2.128 sec

The numbers for using sed look counterintuitive at first. Shouldn’t changing 450 files take more time than changing 50? Until we realize that sed will use the hold buffer if it finds the line in the file and test for the substitution.

If the substitution test passes (meaning the string was found), sed will stop the program by emitting brk syscalls. These extra syscalls invert the performance logic in the case of sed. Therefore, it will take less time to update more files.

What if we modify our scripts a bit to run our tools in parallel for all files? In such a case, we’ll see the time it took to fix up our largest file:

grep+echo

awk

sed

Change 50 files

0.101 sec

0.364 sec

0.378 sec

Change 450 files

0.107 sec

0.405 sec

0.381 sec

We can readily see that, not only is the grep solution the clearest one, but it is also the fastest one. This makes sense, as awk and sed actually go through the complete file, whereas grep can exit once the line has been found.

7. Conclusion

To conclude, we’ve shown three different ways to append a line to a file if the line does not exist in that file.

In the process, we learned a few handy uses of grep, echo, sed, and awk. We’ve also shown a performance comparison of three approaches that clearly shows that using a simple combination of grep and echo is faster than either sed or awk at achieving our goal.

We should definitely consider this result if we are dealing with large text files or a large number of files.