1. Overview

Finding a file in Linux systems is part of the daily activities for many.

In this tutorial, we’ll learn how to search for files that aren’t present in the .gitignore file in a Linux system. In brief, the .gitignore is used to omit the files from a git repository. So, first of all, we’ll understand what exactly is .gitignore. After that, we’ll see how we can check its contents and subsequently exclude the files from it.

2. Understanding .gitignore

In essence, .gitignore is a file that contains a list of files and directories from a git repository. This list of files tells git to ignore them completely. When we commit files to a git repository, they’re permanently saved as a part of the project. However, the files listed in .gitignore are ignored by git when committing all files to a remote repository.

A Git project can have a lot of files:

  • source code
  • data files
  • cache files
  • log files

Not all the files are commit-worthy. Some files are generated but don’t contribute to the project in any way. In this case, we can omit the files from commits to the Git project repository. By ignoring all unnecessary files, we can keep the repository size as minimal as possible. Further, this helps when uploading and downloading.

In the following section, we’ll see how we can find files on a system with specific paths or directories.

3. Finding All Files From a Git Project

Manually finding files on a system can be a very tedious task because we can’t go to every directory to look for a file. However, if we know the exact path under which we should look for a file, we can use the Linux find command.

3.1. Using find

The find command is perhaps the most common Linux command used to locate file objects and list the file hierarchy. This should list all the existing files, even the unnecessary ones:

$ find . -type f -path "*aws*" | head -n 20
./.local/bin/aws_completer
./.local/bin/aws_zsh_completer.sh
./.local/bin/aws
./.local/bin/aws_bash_completer
./.local/bin/aws.cmd
./.local/lib/python3.6/site-packages/google/auth/__pycache__/aws.cpython-36.pyc
./.local/lib/python3.6/site-packages/google/auth/aws.py
./.local/lib/python3.7/site-packages/botocore/__pycache__/awsrequest.cpython-37.pyc
./.local/lib/python3.7/site-packages/botocore/awsrequest.py
./.local/lib/python3.7/site-packages/awscli-1.31.11.dist-info/INSTALLER
./.local/lib/python3.7/site-packages/awscli-1.31.11.dist-info/top_level.txt
./.local/lib/python3.7/site-packages/awscli-1.31.11.dist-info/METADATA
./.local/lib/python3.7/site-packages/awscli-1.31.11.dist-info/LICENSE.txt
./.local/lib/python3.7/site-packages/awscli-1.31.11.dist-info/WHEEL
./.local/lib/python3.7/site-packages/awscli-1.31.11.dist-info/REQUESTED
./.local/lib/python3.7/site-packages/awscli-1.31.11.dist-info/RECORD
./.local/lib/python3.7/site-packages/awscli/commands.py
./.local/lib/python3.7/site-packages/awscli/argparser.py
./.local/lib/python3.7/site-packages/awscli/__main__.py
./.local/lib/python3.7/site-packages/awscli/__pycache__/help.cpython-37.pyc

The example above lists all the files having aws word in their path. We’re limiting the results to 20 lines with the head command to avoid a huge size of the listing.

If we don’t use a path, the system looks for the file in each directory it can traverse. Thus, the processing time increases. So, it’s always a good practice to use a specific path to match a pattern.

If we’re looking for specific Git project files, the next command proves to be helpful.

3.2. Using git ls-files

The command git ls-files lists files that Git tracks. It’s possible to use various options for this command to exclude .gitignore files. Noticeably, this option helps only if git tracks the file we’re looking for. Let’s see an example:

$ git ls-files | head -n 20
.github/workflows/Build-test.yml
.gitignore
Berksfile
Berksfile.lock
CHEF_README.md
file
file.lock
README-registry-box.txt
README.md
Thorfile
cookbook_readme.md
cookbooks/.gitkeep

Now, let’s focus on how to filter the files that are part of .gitignore.

4. How to Find Files Excluded From .gitignore

When we find and list the files from a system, it doesn’t make sense to list the unnecessary files, such as cache files, temporary files, and similar. To avoid this, we need to use some specific utilities.

4.1. Using git-check-ignore

Git provides the git-check-ignore utility to verify whether a file is a part of .gitgnore. This works best with find command options:

$ find . -type f  -path *.git* \
> -exec sh -c '
> for f do
>   git check-ignore -q "$f" ||
>   printf '%s\n' "$f"
> done
> ' find-sh {} +
./disk_usage/.git/info/excluden
./disk_usage/.git/hooks/pre-push.samplen
./disk_usage/.git/hooks/pre-rebase.samplen
./disk_usage/.git/hooks/pre-applypatch.samplen
./disk_usage/.git/hooks/post-update.samplen
./disk_usage/.git/hooks/commit-msg.samplen
./disk_usage/.git/hooks/pre-commit.samplen
./disk_usage/.git/HEADn./disk_usage/.git/confign
./disk_usage/.git/descriptionn./roles/
.gitkeepn./environments/

The snippet shows that the find command is getting files located at the path having the pattern .git in it. At the same time, it ignores the files located at the path having the pattern .gitignore. In other words, it excludes files that are part of the .gitgnore file. The only downside of this is the for loop traversing through all the files to find if it’s ignored by Git.

4.2. Using git-grep

Another utility named git-grep finds files based on the Linux command grep. It finds only files that match the pattern:

$ git grep linux
cookbooks/abcd/recipes/abcd.rb:execute "disable huge pages in linux" do
cookbooks/library/recipes/abcd-common.rb:execute "disable selinux" do
cookbooks/library/recipes/abcd-common.rb:cookbook_file '/etc/selinux/co

The example lists a few files from the Git repository where the word linux is included. Evidently, git-grep* won’t look into files listed in .*gitignore.

5. Conclusion

Thus, we’ve looked at a few effective ways to find particular files on Linux systems. Specifically, we went through ways of finding objects while excluding files from the .gitignore file. Minimizing the effort and time when searching for files is often a priority. To conclude, this article helps to achieve the goal of finding files quickly while excluding files from file .gitignore.