1. Introduction
Git uses the underlying filesystem to organize and store internal structures such as commits, branches, and other refs (references). However, sometimes conflicts can arise between already existing objects and new ones.
In this tutorial, we explore how Git organizes its filesystem directory and ways to resolve potential conflicts. First, we briefly refresh our knowledge about the structure of a Git repository. After that, we look at ways that tampering with the main subdirectories of that structure can affect operations. Next, we create a sample repository and use it to show examples of ref (reference) issues in practice. Finally, we turn to potential solutions to unexpected refs.
We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15 and Git 2.39.2. Unless otherwise specified, it should work in most POSIX-compliant environments.
2. .git Filesystem Structure
Since it gets installed within the context of one, Git leverages the underlying filesystem to store and organize its data structures.
To demonstrate, let’s first create an empty repository:
$ git init
Now, we can check the contents of this supposedly empty Git project:
$ tree -d .git/
.git/
├── branches
├── hooks
├── info
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
10 directories
$ tree .git/
.git/
├── branches
├── config
├── description
├── HEAD
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
│ ├── fsmonitor-watchman.sample
│ ├── post-update.sample
│ ├── pre-applypatch.sample
│ ├── pre-commit.sample
│ ├── pre-merge-commit.sample
│ ├── prepare-commit-msg.sample
│ ├── pre-push.sample
│ ├── pre-rebase.sample
│ ├── pre-receive.sample
│ ├── push-to-checkout.sample
│ └── update.sample
├── info
│ └── exclude
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
10 directories, 17 files
Thus, we can already appreciate the different files and directories that make up an empty Git repository. This information is usually stored in a .git subdirectory, but bare repositories that don’t have a working tree employ the root directly.
First, let’s briefly explain the function of each subdirectory:
- logs: reflog, i.e., operations and timestamps related to all commits (exists only for repositories with data)
- branches: shorthands for git fetch, git pull, and git push URL specification (slightly deprecated)
- hooks: Git hooks used for automation of tasks around Git actions
- info: additional settings and special files
- objects: anything considered a Git object is stored here in a binary format
- refs: convenience pointers to branch and tag refs, both local and remote
As we can see, there are also many files in the hierarchy:
- config: main repository configuration
- description: metadata about the repository
- hooks/*.sample: sample hook example scripts
- info/exclude: high-level file exclusions
Further, some files only exist for repositories with data:
- index: all Git blobs as links to objects (exists only for repositories with data)
- HEAD: ref to current HEAD, e.g., ref: refs/heads/master
- ORIG_HEAD: last HEAD value (backup)
- FETCH_HEAD: ref to current FETCH_HEAD, in case of fetch operations
- AUTO_MERGE: preserve tree in case of conflicts
- MERGE_HEAD: hold commits being merged
- COMMIT_EDITMSG: commit message editor file
- TAG_EDITMSG: annotated tag message editor file (temporary)
- MERGE_MSG: merge message editor file
Of course, this is a non-comprehensive list.
Notably, data in the branches, hooks, and info subdirectories can usually only be modified manually. Further, although many of these files get created and modified with Git commands, they are still regular files.
However, creating, removing, and changing files under .git or any of its subdirectories without considering the proper structure and format can have consequences.
3. Main Subdirectory Tamper Consequences
If we were to replace any of the main subdirectories under .git, it could result in different issues.
3.1. logs Directory
To begin with, if we lose the .git/logs/ directory, we won’t have a reflog available until the references within are restored:
$ git reflog
ebebca8 (HEAD -> branch1) HEAD@{0}: checkout: moving from master to branch1
8a8a026 (tag: v0.1, master) HEAD@{1}: checkout: moving from branch1 to master
ebebca8 (HEAD -> branch1) HEAD@{2}: commit: file1
8a8a026 (tag: v0.1, master) HEAD@{3}: checkout: moving from master to branch1
8a8a026 (tag: v0.1, master) HEAD@{4}: commit (initial): file
$ mv .git/logs .git/logs.bak
$ git reflog
$
Still, any following activities would get logged without issues as long as Git can create and write to the logs directory.
However, if we replace it with a file, we lose the reflog command functionality permanently:
$ touch .git/logs
$ git checkout master
error: unable to append to '.git/logs/HEAD': Not a directory
Already on 'master'
Thus, we prevent log writes.
3.2. branches Directory
Since it’s fairly deprecated, losing the .git/branches subdirectory shouldn’t have any impact on most repositories.
3.3. hooks Directory
Although the hooks subdirectory can be important if we have custom hooks implemented, it doesn’t have much of a function otherwise.
If we delete or replace the directory, we only lose the default hook templates in most cases.
3.4. info Directory
Similarly, if we don’t use the exclude or other special file features within, we can often freely dispose of the info subdirectory.
3.5. objects Directory
Effectively, objects holds the whole commit tree and snapshots of all working directory states. This means that any file or directory lost within objects usually leads to general data loss.
Because of this, the objects subdirectory is critical.
3.6. refs Directory
Similar to objects, refs is very important. Unlike objects, there are ways to restore refs, since they don’t hold actual data, only metadata.
In fact, we can look at the refs subdirectory as a container for commit pointers, which enable Git to recognize certain commits by name rather than an identifier.
4. Sample Repository
To begin with, we create several objects in a new repository:
- file
- commit
- branch
- tag
Let’s enumerate them via the log subcommand:
$ git log --all --decorate --oneline --graph
* 420061f (HEAD -> master, tag: v0.1, branch1) file
Now, we should also have the respective underlying files:
$ tree .git/
.git/
├── branches
├── COMMIT_EDITMSG
├── config
├── description
├── HEAD
├── hooks
[...]
├── index
├── info
│ └── exclude
├── logs
│ ├── HEAD
│ └── refs
│ └── heads
│ ├── branch1
│ └── master
├── objects
│ ├── 2a
│ │ └── cf5a5f0c860dd25f42a4dc326febb6a942baad
│ ├── 42
│ │ └── 0061ffe3926e2137129144dc4e9d2b545ab9e3
│ ├── 47
│ │ └── d83249b05cf06491633be38ea8637c5b356acc
│ ├── 8b
│ │ └── 137891791fe96927ad78e64b0aad7bded08bdc
│ ├── info
│ └── pack
├── packed-refs
└── refs
├── heads
│ ├── branch1
│ └── master
└── tags
└── v0.1
17 directories, 30 files
As confirmed by the existence of an index file, the four objects under objects, and the heads subdirectory, the repository now has content.
5. Ref Tamper Consequences
Part of the functionality that Git offers includes synchronization between the local filesystem structure, local repository metadata, and remote information. Despite this, we have the ability to change any of the files that it stores.
Let’s see an example of how this might work within the sample repository we already created.
5.1. Manually Create Fake Ref File and Directory
Next, we create the empty filerefbranch file under .git/refs/heads:
$ touch .git/refs/heads/filerefbranch
Further, we make a directory named dirrefbranch in the same path:
$ mkdir .git/refs/heads/dirrefbranch
This fake metadata can cause issues for Git.
5.2. Attempt Branch Creation Over Manual Entries
Now, we can try to actually create the filerefbranch branch:
$ git branch filerefbranch
fatal: cannot lock ref 'refs/heads/filerefbranch': unable to resolve reference 'refs/heads/filerefbranch': reference broken
Since Git can’t lock or resolve the ref, it errors out. Although this is a fairly forced problem, we can end up having it in more mundane situations:
- bad permissions
- repository merges
incorrect filesystem setup
Now, let’s see how dirrefbranch behaves as the name of a new branch:
$ git branch dirrefbranch
$
In this case, we experience no issues, because Git can use and modify any path under refs that doesn’t contain files at any level as long as it has permissions to do so.
Yet, if the directory or any of its subdirectories contains a file, we would see an error:
$ git branch --delete dirrefbranch
Deleted branch dirrefbranch (was 1b859e5).
$ mkdir .git/refs/heads/dirrefbranch/ && touch .git/refs/heads/dirrefbranch/file
$ git branch dirrefbranch
fatal: cannot lock ref 'refs/heads/dirrefbranch': 'refs/heads/dirrefbranch/file' exists; cannot create 'refs/heads/dirrefbranch'
Due to the existence of file under dirrefbranch, Git can’t remove and recreate the directory as a reference.
5.3. Attempt Branch Creation Over Proper Branch Path
Consequently, should a branch already exist with a given name, we can’t create a branch with an upper-level component that has that name:
$ git branch branch1/subbranch1
fatal: cannot lock ref 'refs/heads/branch1/subbranch1': 'refs/heads/branch1' exists; cannot create 'refs/heads/branch1/subbranch1'
In this case, Git can’t replace branch1 with a directory element in the path since it’s already the name of a branch. In other words, if a branch x exists, x/anyname can’t be created. Similarly, this blocks branches further down the path like x/anyname/orpath.
5.4. Attempt Pull
Even if we attempt a pull, Git can’t override a bad ref file:
$ git pull
[...]
fatal: bad object refs/heads/filerefbranch
error: ../remote/ did not send all necessary objects
Let’s check how the directory behaves:
$ git pull
From ../remote
* [new branch] dirrefbranch -> origin/dirrefbranch
* [new branch] manualbranch -> origin/manualbranch
* [new branch] master -> origin/master
[...]
Again, a directory isn’t a problem as long as it’s empty.
6. Correct Ref Issues
In many cases, we can perform a comparison with a new repository to see which filesystem objects should be of what type. However, different approaches can correct the situation in case of issues.
6.1. gc Garbage Collection
Many problems with the current .git structure can be resolved on their own. From partial pulls and pushes, to incomplete merges, Git is often able to see which references don’t lead to commits and what commits don’t need to remain.
Although it usually happens automatically, we can invoke the garbage collector manually as well:
$ git gc
After this operation, we can recheck whether the previously-failing operation is now successful. Further, we can run gc after other operations to ensure the consistent state of the Git repository as much as possible.
6.2. prune
Similar to garbage collection, a prune removes dangling objects from the tree:
$ git remote prune origin
In this case, we run a prune operation on the origin, which removes the data about remote branches that no longer exist.
Similarly, we can perform a –prune during a fetch and repeat a failing pull, for instance:
$ git fetch --prune origin && git pull
This approach usually removes unexpected objects and brings local copies of remote branches in sync with their counterparts.
6.3. update-ref
The update-ref subcommand is used to safely update object names within refs. It usually relinks bad refs, creates new ones, and generally synchronizes the Git structure.
In this case, *we can use the command to [-d]elete a ref after verification*:
$ git update-ref --no-deref -d <REF_PATH>
Notably, we use –no-deref to avoid further indirections when removing.
Of course, if the provided REF_PATH isn’t a ref, update-ref won’t be able to help.
To know whether a given filesystem object is a ref, we can use rev-parse with its path.
7. Summary
In this article, we discussed the basic skeletal structure of a Git repository and ways we can tamper with it.
In conclusion, manual edits of .git are possible and sometimes necessary, but should be performed with extra caution, despite different ways to recover from issues.