1. Introduction

Files are the essence of Linux, regardless of the exact data we deal with. Even though extensions are less important in a UNIX-like operating system (OS), they are nevertheless a common part of filenames. Because of this, knowing how to get the actual name alone can be important in different situations.

In this tutorial, we talk about ways to extract the name and extension of a file. First, we briefly refresh our knowledge about file extensions. After that, we see how to extract the filename from a path. Next, we look at a special feature of a common command that can process the name of a file. Further, we explore a basic way to split the extension from a filename via the shell, but also note some common pitfalls of the process. Finally, we work out a more robust solution.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15. It should work in most POSIX-compliant environments unless otherwise specified.

2. File Extensions

File extensions consist of one or more groups with a number of characters after a dot (period) and appear at the end of a filename. Typically, the extension of a file consists of one to four characters and hints at the nature of the file, its format, or other related characteristics.

The keyword here is a hint because of multiple pitfalls associated with the practice of adding a suffix to filenames:

  • not all period-separated suffixes are meant to be extensions
  • not all environments interpret extensions (properly)
  • some extensions, e.g., .tar.gz, are compound
  • different extensions can designate the same format
  • the same extension can be used for different formats
  • lack of proper standardization

In fact, an incorrect extension or interpretation of a suffix can lead to serious issues, including exploits and payloads, viruses, and similar. Naturally, we can maliciously mix formats even without extensions.

In an operating system like Microsoft Windows, the filename suffix can mean the difference between having or not having execute permissions. Despite different attempts to standardize them, extensions are still as valuable as the developers and users make them.

Due to all the above, it’s often valuable to be able to separate the filename from the extension when processing data.

3. Extract Filename From Path

First, we need a filename to process its extension. Since files reside in paths, we usually have or need the latter in order to process the former.

For example, if we have /dir/file as our target within the $XPATH variable, we can extract the file part via several approaches:

All of these options work on the same principle: only NULL and forward slash are disallowed when naming filesystem objects.

Once we have the filename itself, let’s process it.

4. Using basename

In fact, basename has a second argument, which specifies an optional suffix to remove from the final output:

$ extension='.txt'
$ namenoext="$(basename "$XFILENAME" "$extension")"

Here, we assume our filename is in the $XFILENAME variable. We store the resulting extension in $extension and the name without an extension – in $namenoext. This works similarly to implementations like the basename() function in Perl.

Of course, we’d have to know the expected extension beforehand. While this isn’t ideal, it’s rare to not have at least a basic expectation for the suffix of a file.

5. Basic Filename Name and Extension Extraction

As usual for a Linux environment, we have multiple ways to get the same result.

Still, since Bash is the standard shell in many major distributions, we can employ it to process our filename:

$ extension="${XFILENAME##*.}"
$ namenoext="${XFILENAME%.*}"

In this case, we employ two parameter expansion features:

  • ## deletes the longest matching pattern of the expression after ## from the beginning of the variable value
  • % deletes the shortest matching pattern of the expression after % from the end of the variable value

Notably, we might want to use # and %% instead when it comes to extensions like .tar.gz.

In both cases above, we use the * asterisk wildcard also available when globbing.

6. Pitfalls

Although we can employ simple means to separate the extension from the rest of the filename, there are caveats:

+---------------------------------------------------------------------------+
| $XFILENAME     | ${XFILENAME##*.} | ${XFILENAME#*.}  | Expected  | Match? |
|----------------+------------------+------------------+-----------+--------|
| name.ext       | ext              | ext              | ext       | yes    |
| name           | name             | name             |           | no     |
| .name          | name             | name             |           | no     |
| name.ext1.ext2 | ext2             | ext1.ext2        | ext2      | /      |
+---------------------------------------------------------------------------+

In this case, we see that there is little overlap between what’s expected and what we get when it comes to special cases of names.

In fact, these apply to name extraction as well:

+---------------------------------------------------------------------------+
| $XFILENAME     | ${XFILENAME%.*}  | ${XFILENAME%%.*} | Expected  | Match? |
|----------------+------------------+------------------+-----------+--------|
| name.ext       | name             | name             | name      | yes    |
| name           | name             | name             | name      | yes    |
| .name          |                  |                  | .name     | no     |
| name.ext1.ext2 | name.ext1        | name             | name.ext1 | /      |
+---------------------------------------------------------------------------+

Further, it’s up to the user whether they want the longest or shortest suffix match. Because of this, a custom implementation might be beneficial.

7. Robust Extraction of Name and Extension From Filename

At this point, we can implement our own solution to split the extension from the name of a file in the form of a Bash function. In fact, this works similarly to the fileparse function of the Perl File::Basename cpan module.

So, let’s check the Bash function code:

splitpath() {
  local cmp_dirname=
  local cmp_basename=
  local cmp_basename_root=
  local cmp_suffix=

  if [[ $# -eq 0 ]]; then
    >&2 echo "$FUNCNAME: ERROR: Expected at least one argument (path)."
    return 1
  fi

  cmp_dirname=$(dirname "$1")
  cmp_basename=$(basename "$1")
  if [[ $# -gt 1 ]]; then
    cmp_suffix=$([[ $cmp_basename = *.* ]] && printf %s "${cmp_basename#*.}" || printf '')
  else
    cmp_suffix=$([[ $cmp_basename = *.* ]] && printf %s "${cmp_basename##*.}" || printf '')
  fi

  if [[ "$cmp_basename" == "$cmp_suffix" ]]; then
    cmp_basename_root=$cmp_basename
    cmp_suffix=''
  else
    cmp_basename_root=${cmp_basename%\.$cmp_suffix}
  fi

  printf "$cmp_dirname\0$cmp_basename\0$cmp_basename_root\0$cmp_suffix"
  return 0
}

First, we define local variables:

  • cmp_dirname holds the file path
  • cmp_basename holds the filename
  • cmp_basename_root holds the partial filename
  • cmp_suffix holds the file extension

After checking we have at least one argument, which should be the full path, we extract the containing directory path via dirname and the filename via basename.

Next, we check whether we have a second argument. If so, we use a long match for the longest suffix (chain). Otherwise, we just get the last suffix.

Then, we drop the suffix part from the filename and leave the rest as the partial filename without an extension. Finally, we use printf to output all variables, separated by \0. This way, we ensure paths and filenames that include any valid character can be processed.

So, let’s see the function in action:

$ splitpath /dir/subdir/file.ext1.ext2 | tr '\0' ','
/dir/subdir,file.ext1.ext2,file.ext1,ext2
$ splitpath /dir/subdir/file.ext1.ext2 maxext | tr '\0' ','
/dir/subdir,file.ext1.ext2,file,ext1.ext2

In this case, we add the tr command to make the NULL characters visible as commas. Notably, the first example only considers ext2 to be the extension (last value), while the second gets ext1.ext2, as expected.

8. Summary

In this article, we explored ways to extract the extension from a filename.

In conclusion, although we have simple means to separate a filename, there are considerations that might necessitate a more complex solution to the problem.