1. Overview
When we write shell scripts or work with the Linux command line, we often need to handle file path strings. Extracting the last directory or filename from a given path string is a pretty common operation.
For example, for a given path string “*/tmp/dir/target“, we attempt to get “target*” as a result.
Yes, this looks like a pretty simple problem. Probably, several solutions may already come to mind when we read the example above. However, the simple problem may include a few corner cases that break our solutions.
In this tutorial, we’ll take a closer look at this problem and evaluate common solutions.
2. Discussion of Common Solutions
We know that Linux filesystems don’t allow a slash (/) to be a part of a filename or directory name.
Therefore, if we look at the input path string as slash-separated values, we can just take the last value to solve the problem.
If we take a look at our Linux command arsenal, many powerful weapons may help us to do the job, such as grep, sed, and awk:
$ sed 's#.*/##' <<< "/tmp/dir/target"
target
$ awk -F'/' '{print $NF}' <<< "/tmp/dir/target"
target
$ grep -o '[^/]*$' <<< "/tmp/dir/target"
target
Or we can use Bash’s parameter substitution to solve the problem:
$ INPUT="/tmp/dir/target"
$ echo ${INPUT##*/}
target
Of course, there could be a lot more similar solutions using other command-line tools. However, are they really stable solutions to the problem?
In Linux, a directory path string often ends with a slash, such as “*/tmp/dir/target/*“. Thus, if we take this path string as the input, all approaches above will fail:
$ sed 's#.*/##' <<< "/tmp/dir/target/"
( empty output )
$ awk -F'/' '{print $NF}' <<< "/tmp/dir/target/"
( empty output )
$ grep -o '[^/]*$' <<< "/tmp/dir/target/"
( empty output )
$ INPUT="/tmp/dir/target/"
$ echo ${INPUT##*/}
( empty output )
Ok, we may consider fixing the solutions above to cover the trailing-slash case. Then, for example, we can change the awk one-liner a little bit to work for both cases:
$ awk -F'/' '{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/target"
target
$ awk -F'/' '{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/target/"
target
The fixed awk one-liner could work for 99% of cases. However, there are still edge cases that may break it.
Next, let’s take a closer look at them.
3. Looking Into the Corner Cases
In the previous section, we’ve learned a Linux path string can end with a slash. Now, let’s see if there are other possible patterns of a path string.
First, in Linux, the root directory is the parent of all other files and directories. So, the root directory “*/*” is a valid path string.
Additionally, most Linux filesystems allow spaces to be filenames or directory names. Therefore, it’s also a valid path string if a file or a directory is named by ” “.
Now, let’s summarize all the possible patterns of a Linux path string (input) and our expected result (output):
Input
Expected Output
“/tmp/dir/target”
“target”
“/tmp/dir/target/”
“target”
“/”
“/”
“/tmp/dir/ “
” “
“/tmp/dir/ /”
” “
If we like, we can still extend the awk one-liner to cover all the cases. Similarly, a Bash function can do the job as well.
Here, we show an awk one-liner as an example:
$ awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/target"
target
$ awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/target/"
target
$ awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/"
/
$ echo "^$( awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/ " )\$"
^ $
$ echo "^$( awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/ /" )\$"
^ $
Note that in the last two examples, we print the result between “^” and “*$*” so that we can more easily see that the expected result (four spaces) has been extracted.
As we can see, the awk one-liner works for all the cases. But if we compare it to the first version (awk -F’/’ ‘{print $NF}’ ), it’s pretty complex now.
Actually, the Coreutils package has provided a convenient command to solve our problem.
4. Using the basename Command
As the name implies, the basename command can strip the parent directories of a given path string.
Further, it’s pretty stable and covers all the corner cases. Next, let’s do a test with different inputs:
$ basename "/tmp/dir/target"
target
$ basename "/tmp/dir/target/"
target
$ basename "/"
/
$ echo "^$(basename '/tmp/dir/ ')\$"
^ $
$ echo "^$(basename '/tmp/dir/ /')\$"
^ $
As the output above shows, the basename command is a straightforward solution to the problem.
It’s worth mentioning that the basename command has a brother, dirname, which does the opposite — strip the last component from the given path string:
$ dirname "/tmp/dir/target"
/tmp/dir
When we need to handle path strings, we can first consider if basename and/or dirname can solve the problem. Usually, solutions with these two commands are stable and easier to understand.
awk is a powerful utility, and it can certainly solve the problem. However, we have to think if our awk implementation has covered all the corner cases. Otherwise, our solution may lead to an unexpected result — particularly, if it’s a part of a script.
5. Conclusion
In this article, we’ve looked into the problem: extract the last component from a given path string.
The simple problem has several corner cases. We’ve seen an awk one-liner solution covering all those corner cases.
Also, we’ve addressed a more straightforward solution: using the basename command.