1. Overview
The find command is a convenient utility when searching files in the Linux command line. It provides various output options to satisfy our common requirements.
In this quick tutorial, we’ll explore how to tell find to only output found filenames without extensions.
2. Introduction to the Problem
To make the explanation clear, let’s first prepare an example directory:
$ tree myDir
myDir
├── camera_dump
│ ├── image_001.jpg
│ ├── image_002.jpg
│ ├── image_003.jpg
│ ├── image_004.jpg
│ └── image_005.jpg
├── picture_001.jpg
├── picture_002.jpg
├── picture_003.jpg
├── text01.txt
├── text02.txt
└── text03.txt
1 directory, 11 files
As the tree command’s output shows, under myDir, we have a subdirectory and some files. As find searches files recursively by default, we can easily find all image files (*.jpg) under myDir recursively:
$ find myDir -name '*.jpg'
myDir/camera_dump/image_005.jpg
myDir/camera_dump/image_004.jpg
myDir/camera_dump/image_003.jpg
myDir/camera_dump/image_002.jpg
myDir/camera_dump/image_001.jpg
myDir/picture_003.jpg
myDir/picture_002.jpg
myDir/picture_001.jpg
The output above shows what find’s default output looks like. However, we require only to get the filenames without extensions. In other words, we need to strip each file’s path information and extension to achieve our goal:
...
image_002
image_001
picture_003
picture_002
...
The find command’s –printf action can output various metadata about our files, such as filesize, modification time, the number of links, and so on. Therefore, our first idea would be to check whether -printf supports any format pattern to directly output the filenames without extensions.
The closest one is the “*%f*” format, which prints the file’s basename with leading path information removed:
$ find myDir -name '*.jpg' -printf "%f\n"
image_005.jpg
image_004.jpg
...
picture_002.jpg
picture_001.jpg
As we can see, the extensions “*.jpg*” are still there. Therefore, find cannot solve this problem on its own. Instead, we need support from other commands.
Next, let’s see how to solve our problem.
3. Using the basename Command
We often use the basename command to strip the parent directories from a given file path string. If we pass a suffix to the command using the “basename NAME SUFFIX” syntax, both the parent directories and the suffix will be chopped from the given path string:
$ basename "/a/very/nice/file.txt" ".txt"
file
Next, let’s solve the problem using find’s -exec action:
$ find myDir -name '*.jpg' -exec basename {} .jpg \;
image_005
image_004
image_003
image_002
image_001
picture_003
picture_002
picture_001
As the output above shows, we got the expected output. Alternatively, we can pipe find‘s output to the xargs command:
$ find myDir -name '*.jpg' | xargs -I{} basename {} ".jpg"
image_005
image_004
image_003
image_002
image_001
picture_003
picture_002
picture_001
4. Using the sed Command
sed and awk are two powerful text processing tools in Linux. We can pipe find‘s output to them for post-processing and get the desired result.
In this section, let’s first look at how to use sed to extract the filenames:
$ find myDir -name '*.jpg' | sed 's#.*/##; s#[.][^.]*$##'
image_005
image_004
image_003
image_002
image_001
picture_003
picture_002
picture_001
As we can see, the sed command solves the problem. sed does the job through two substitutions:
- s#.*/## – Remove everything until the last slash inclusively, which is the file’s parent directories
- s#[.][^.]*$## – Remove everything after the last dot, which is the file’s extension
It’s worth mentioning that we’ve used “#” (s#…#…#) instead of the usual “/” (s/…/…/) as the separator in sed‘s ‘s‘* substitution***. This is because the path patterns often contain slashes (‘*/‘). If we still use ‘/*‘ as the separator, we must escape the pattern’s slashes. This makes the code harder to read.
So next, let’s see how awk solves the problem.
5. Using the awk Command
We can apply a similar substitution to solve the problem using awk:
$ find myDir -name '*.jpg' | awk -F '/' '{sub(/[.][^.]*/,"",$NF); print $NF}'
image_005
image_004
image_003
image_002
image_001
picture_003
picture_002
picture_001
awk is good at processing column-based data. We can define ‘*/‘ as the field separator (FS), then the last field ($NF*) contains the filename without the path information.
Next, we call awk‘s sub() function to remove the file extension and output the result.
6. Conclusion
In this article, we’ve discussed how to process output from the find command and extract the filenames without extensions.
Using the find command alone cannot solve this problem. Instead, we’ve learned three different approaches through examples.