1. Overview
Git submodules are a powerful feature that enables us to include external repositories as subdirectories within the main Git project. In practice, this can be beneficial when we need to incorporate third-party libraries, shared components, or even other projects into the codebase. However, there may come a time when we need to remove a submodule as part of refactoring or deprecating a dependency, for example.
Removing a submodule from a Git repository can be a multi-step process. Further, if not done correctly, it can lead to issues such as dangling references or conflicts.
In this tutorial, we’ll explore the step-by-step process of removing a submodule from both the parent and local repositories, thereby ensuring a clean and consistent state.
2. Why Use Git Submodules?
Before we look at the process of removing submodules, let’s understand the purpose of submodules and how we can create them.
The primary purpose of using submodules is to enable us, as developers, to include external code or libraries as dependencies within a project. Thus, using this approach has some benefits:
- code reusability
- separation of concerns
- modular development practices
So, to understand this concept, let’s look at an example of creating a submodule.
2.1. Create a Submodule
Let’s say we’re developing a cloud-based application monitoring tool called WatchIt and one of the key features of WatchIt is its ability to visualize and analyze log data from various sources:
- Web servers
- databases
- application servers
Consequently, we decide to incorporate an open-source log processing library called LogStash as a submodule within the WatchIt repository. In particular, LogStash has capabilities for collecting, parsing, and transforming log data.
To begin with, we run the add command to add the LogStash repository as a submodule in the WatchIt repository’s directory:
$ git submodule add https://github.com/elastic/logstash.git logstash
This git command uses the submodule subcommand to add the LogStash repository as a submodule within the logstash directory inside our WatchIt repository. As a result, Git also creates an entry for the submodule in the .git/config file. This entry stores information about the submodule:
- name
- URL
- path where it’s located within the main repository
When we perform operations like updating or removing the submodule, Git refers to this configuration entry to determine the appropriate actions to take.
2.2. The .gitmodules File
In addition to the entry Git created in the .git/config file, Git also creates a new entry in the special .gitmodules file. Notably, this file stores the configuration for all submodules we create. Further, the .gitmodules file is an important component of submodule management as it contains the mapping between the submodules’s path and the URL of the external repository.
In our case, the .gitmodules file is fairly simple:
$ cat .gitmodules
[submodule "logstash"]
path = logstash
url = https://github.com/elastic/logstash.git
Consequently, we commit the changes to our WatchIt repository after adding the new submodule:
$ git add .gitmodules logstash
$ git commit -m "Add LogStash as a submodule"
$ git push
Thus, we ensure Git tracks the new submodule configuration and other collaborators can easily clone or pull the WatchIt repository with the correct submodule setup.
2.3. Cloning a Repository Containing Submodules
Alternatively, if we’re cloning a repository that already contains submodules, we need to initialize and update the submodules after cloning the parent repository:
$ git clone https://github.com/ararar/watchit.git
$ cd watchit
$ git submodule init
$ git submodule update
In particular, the git submodule init command initializes the submodule configurations by reading the .gitmodules file. Following that, the git submodule update command retrieves the actual submodule repositories and checks out the committed submodule versions.
3. Removing a Submodule
This is a task that requires careful execution. If not done properly, it can lead to issues such as orphaned submodule files, broken build processes, and inconsistent repository states across different branches or local repositories of team members.
In this section, we’ll walk through the step-by-step process of removing a submodule.
3.1. Deinitialize Submodule
The first step for removing a Git submodules is to deinitialize the submodule:
$ git submodule deinit -f logstash
Cleared directory 'logstash'
Submodule 'logstash' (https://github.com/elastic/logstash.git) unregistered for path 'logstash'
The deinit subcommand removes the submodule entry from the .git/config file, effectively deregistering the submodule from the repository. Obviously, this is the reverse of init.
Further, we use the -f option to discard any local changes in the submodule directory. Without this option, the command may fail if there are any uncommitted changes in the submodule.
However, the submodule directory and files are still present in the project working tree after deinit.
3.2. Remove Submodule Git Directory
Next, we remove the submodule’s directory from .git/modules directory:
$ rm -rf .git/modules/logstash
This command removes the submodule’s associated Git repository which was stored in the .git/modules/ directory. The -r option removes directories and their contents recursively while the -f option forces the removal without prompting for confirmation.
3.3. Remove From .gitmodules
Now, we remove the section corresponding to the logstash submodule in the .gitmodules file:
$ git config -f .gitmodules --remove-section submodule.logstash
By running this command, we remove the logstash submodule entry from the .gitmodules file. Also, the -f option specifies the configuration file where the change should be made.
3.4. Stage Changes to .gitmodules
The next step is to stage the changes we made to the .gitmodules file:
$ git add .gitmodules
By staging the changes to the .gitmodules file, we’re ensuring that the removal of the submodule’s configuration is properly recorded in our repository history.
Without staging the .gitmodules file, the submodule entry might not be removed correctly from the cache or history.
3.5. Remove From Git Cache
Next, we remove the submodule’s entry from the Git cache:
$ git rm --cached logstash
rm 'logstash'
By running this command, we’re telling Git to remove the logstash submodule directory from the Git index. Additionally, the Git index is also known as the staging area and it’s a temporary storage area where Git keeps track of the changes we’ve made to files before committing them.
3.6. Commit and Push Changes
Finally, we commit and push the changes to make the submodule removal permanent:
$ git add .
$ git commit -m 'rm submodule: logstash'
[main f65cf1a] rm submodule: logstash
2 files changed, 4 deletions(-)
delete mode 160000 logstash
$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 286 bytes | 143.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To https://github.com/ararar/watchit.git
a9b0d0f..f65cf1a main -> main
Any references or configurations related to the submodule are removed, and future clones or checkouts of the repository no longer include the submodule.
Also, removing a submodule doesn’t affect the external repository itself; it only removes the submodule reference from our project.
We can also remove the logstash directory from the project working tree using rm -rf logstash command.
4. Conclusion
In this article, we’ve explored the steps required to properly add or remove a Git submodule within a project.
Specifically, we saw that removing a submodule is a process that must be followed carefully to ensure a clean and complete removal. The key steps involve deinitializing the submodule, removing its directory from the Git’s internal storage, updating the relevant configuration files, and removing the submodule from the Git cache.