Git 仓库结构和修复"无法锁定引用"等问题

1. Introduction

Git uses the underlying filesystem to organize and store internal structures such as commits, branches, and other refs (references). However, sometimes conflicts can arise between already existing objects and new ones.

In this tutorial, we explore how Git organizes its filesystem directory and ways to resolve potential conflicts. First, we briefly refresh our knowledge about the structure of a Git repository. After that, we look at ways that tampering with the main subdirectories of that structure can affect operations. Next, we create a sample repository and use it to show examples of ref (reference) issues in practice. Finally, we turn to potential solutions to unexpected refs.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15 and Git 2.39.2. Unless otherwise specified, it should work in most POSIX-compliant environments.

2. .git Filesystem Structure

Since it gets installed within the context of one, Git leverages the underlying filesystem to store and organize its data structures.

To demonstrate, let’s first create an empty repository:

$ git init

Now, we can check the contents of this supposedly empty Git project:

$ tree -d .git/
.git/
├── branches
├── hooks
├── info
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

10 directories
$ tree .git/
.git/
├── branches
├── config
├── description
├── HEAD
├── hooks
│   ├── applypatch-msg.sample
│   ├── commit-msg.sample
│   ├── fsmonitor-watchman.sample
│   ├── post-update.sample
│   ├── pre-applypatch.sample
│   ├── pre-commit.sample
│   ├── pre-merge-commit.sample
│   ├── prepare-commit-msg.sample
│   ├── pre-push.sample
│   ├── pre-rebase.sample
│   ├── pre-receive.sample
│   ├── push-to-checkout.sample
│   └── update.sample
├── info
│   └── exclude
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

10 directories, 17 files

Thus, we can already appreciate the different files and directories that make up an empty Git repository. This information is usually stored in a .git subdirectory, but bare repositories that don’t have a working tree employ the root directly.

First, let’s briefly explain the function of each subdirectory:

logs: reflog, i.e., operations and timestamps related to all commits (exists only for repositories with data)
branches: shorthands for git fetch, git pull, and git push URL specification (slightly deprecated)
hooks: Git hooks used for automation of tasks around Git actions
info: additional settings and special files
objects: anything considered a Git object is stored here in a binary format
refs: convenience pointers to branch and tag refs, both local and remote

As we can see, there are also many files in the hierarchy:

config: main repository configuration
description: metadata about the repository
hooks/*.sample: sample hook example scripts
info/exclude: high-level file exclusions

Further, some files only exist for repositories with data:

index: all Git blobs as links to objects (exists only for repositories with data)
HEAD: ref to current HEAD, e.g., ref: refs/heads/master
ORIG_HEAD: last HEAD value (backup)
FETCH_HEAD: ref to current FETCH_HEAD, in case of fetch operations
AUTO_MERGE: preserve tree in case of conflicts
MERGE_HEAD: hold commits being merged
COMMIT_EDITMSG: commit message editor file
TAG_EDITMSG: annotated tag message editor file (temporary)
MERGE_MSG: merge message editor file

Of course, this is a non-comprehensive list.

Notably, data in the branches, hooks, and info subdirectories can usually only be modified manually. Further, although many of these files get created and modified with Git commands, they are still regular files.

However, creating, removing, and changing files under .git or any of its subdirectories without considering the proper structure and format can have consequences.

3. Main Subdirectory Tamper Consequences

If we were to replace any of the main subdirectories under .git, it could result in different issues.

3.1. logs Directory

To begin with, if we lose the .git/logs/ directory, we won’t have a reflog available until the references within are restored:

$ git reflog
ebebca8 (HEAD -> branch1) HEAD@{0}: checkout: moving from master to branch1
8a8a026 (tag: v0.1, master) HEAD@{1}: checkout: moving from branch1 to master
ebebca8 (HEAD -> branch1) HEAD@{2}: commit: file1
8a8a026 (tag: v0.1, master) HEAD@{3}: checkout: moving from master to branch1
8a8a026 (tag: v0.1, master) HEAD@{4}: commit (initial): file
$ mv .git/logs .git/logs.bak
$ git reflog
$

Still, any following activities would get logged without issues as long as Git can create and write to the logs directory.

However, if we replace it with a file, we lose the reflog command functionality permanently:

$ touch .git/logs
$ git checkout master
error: unable to append to '.git/logs/HEAD': Not a directory
Already on 'master'

Thus, we prevent log writes.

3.2. branches Directory

Since it’s fairly deprecated, losing the .git/branches subdirectory shouldn’t have any impact on most repositories.

3.3. hooks Directory

Although the hooks subdirectory can be important if we have custom hooks implemented, it doesn’t have much of a function otherwise.

If we delete or replace the directory, we only lose the default hook templates in most cases.

3.4. info Directory

Similarly, if we don’t use the exclude or other special file features within, we can often freely dispose of the info subdirectory.

3.5. objects Directory

Effectively, objects holds the whole commit tree and snapshots of all working directory states. This means that any file or directory lost within objects usually leads to general data loss.

Because of this, the objects subdirectory is critical.

3.6. refs Directory

Similar to objects, refs is very important. Unlike objects, there are ways to restore refs, since they don’t hold actual data, only metadata.

In fact, we can look at the refs subdirectory as a container for commit pointers, which enable Git to recognize certain commits by name rather than an identifier.

4. Sample Repository

To begin with, we create several objects in a new repository:

file
commit
branch
tag

Let’s enumerate them via the log subcommand:

$ git log --all --decorate --oneline --graph
* 420061f (HEAD -> master, tag: v0.1, branch1) file

Now, we should also have the respective underlying files:

$ tree .git/
.git/
├── branches
├── COMMIT_EDITMSG
├── config
├── description
├── HEAD
├── hooks
[...]
├── index
├── info
│   └── exclude
├── logs
│   ├── HEAD
│   └── refs
│       └── heads
│           ├── branch1
│           └── master
├── objects
│   ├── 2a
│   │   └── cf5a5f0c860dd25f42a4dc326febb6a942baad
│   ├── 42
│   │   └── 0061ffe3926e2137129144dc4e9d2b545ab9e3
│   ├── 47
│   │   └── d83249b05cf06491633be38ea8637c5b356acc
│   ├── 8b
│   │   └── 137891791fe96927ad78e64b0aad7bded08bdc
│   ├── info
│   └── pack
├── packed-refs
└── refs
    ├── heads
    │   ├── branch1
    │   └── master
    └── tags
        └── v0.1

17 directories, 30 files

As confirmed by the existence of an index file, the four objects under objects, and the heads subdirectory, the repository now has content.

5. Ref Tamper Consequences

Part of the functionality that Git offers includes synchronization between the local filesystem structure, local repository metadata, and remote information. Despite this, we have the ability to change any of the files that it stores.

Let’s see an example of how this might work within the sample repository we already created.

5.1. Manually Create Fake Ref File and Directory

Next, we create the empty filerefbranch file under .git/refs/heads:

$ touch .git/refs/heads/filerefbranch

Further, we make a directory named dirrefbranch in the same path:

$ mkdir .git/refs/heads/dirrefbranch

This fake metadata can cause issues for Git.

5.2. Attempt Branch Creation Over Manual Entries

Now, we can try to actually create the filerefbranch branch:

$ git branch filerefbranch
fatal: cannot lock ref 'refs/heads/filerefbranch': unable to resolve reference 'refs/heads/filerefbranch': reference broken

Since Git can’t lock or resolve the ref, it errors out. Although this is a fairly forced problem, we can end up having it in more mundane situations:

- bad permissions
- repository merges
incorrect filesystem setup

Now, let’s see how dirrefbranch behaves as the name of a new branch:

$ git branch dirrefbranch
$

In this case, we experience no issues, because Git can use and modify any path under refs that doesn’t contain files at any level as long as it has permissions to do so.

Yet, if the directory or any of its subdirectories contains a file, we would see an error:

$ git branch --delete dirrefbranch
Deleted branch dirrefbranch (was 1b859e5).
$ mkdir .git/refs/heads/dirrefbranch/ && touch .git/refs/heads/dirrefbranch/file
$ git branch dirrefbranch
fatal: cannot lock ref 'refs/heads/dirrefbranch': 'refs/heads/dirrefbranch/file' exists; cannot create 'refs/heads/dirrefbranch'

Due to the existence of file under dirrefbranch, Git can’t remove and recreate the directory as a reference.

5.3. Attempt Branch Creation Over Proper Branch Path

Consequently, should a branch already exist with a given name, we can’t create a branch with an upper-level component that has that name:

$ git branch branch1/subbranch1
fatal: cannot lock ref 'refs/heads/branch1/subbranch1': 'refs/heads/branch1' exists; cannot create 'refs/heads/branch1/subbranch1'

In this case, Git can’t replace branch1 with a directory element in the path since it’s already the name of a branch. In other words, if a branch x exists, x/anyname can’t be created. Similarly, this blocks branches further down the path like x/anyname/orpath.

5.4. Attempt Pull

Even if we attempt a pull, Git can’t override a bad ref file:

$ git pull
[...]
fatal: bad object refs/heads/filerefbranch
error: ../remote/ did not send all necessary objects

Let’s check how the directory behaves:

$ git pull
From ../remote
 * [new branch]      dirrefbranch -> origin/dirrefbranch
 * [new branch]      manualbranch -> origin/manualbranch
 * [new branch]      master       -> origin/master
[...]

Again, a directory isn’t a problem as long as it’s empty.

6. Correct Ref Issues

In many cases, we can perform a comparison with a new repository to see which filesystem objects should be of what type. However, different approaches can correct the situation in case of issues.

6.1. gc Garbage Collection

Many problems with the current .git structure can be resolved on their own. From partial pulls and pushes, to incomplete merges, Git is often able to see which references don’t lead to commits and what commits don’t need to remain.

Although it usually happens automatically, we can invoke the garbage collector manually as well:

$ git gc

After this operation, we can recheck whether the previously-failing operation is now successful. Further, we can run gc after other operations to ensure the consistent state of the Git repository as much as possible.

6.2. prune

Similar to garbage collection, a prune removes dangling objects from the tree:

$ git remote prune origin

In this case, we run a prune operation on the origin, which removes the data about remote branches that no longer exist.

Similarly, we can perform a –prune during a fetch and repeat a failing pull, for instance:

$ git fetch --prune origin && git pull

This approach usually removes unexpected objects and brings local copies of remote branches in sync with their counterparts.

6.3. update-ref

The update-ref subcommand is used to safely update object names within refs. It usually relinks bad refs, creates new ones, and generally synchronizes the Git structure.

In this case, *we can use the command to [-d]elete a ref after verification*:

$ git update-ref --no-deref -d <REF_PATH>

Notably, we use –no-deref to avoid further indirections when removing.

Of course, if the provided REF_PATH isn’t a ref, update-ref won’t be able to help.

To know whether a given filesystem object is a ref, we can use rev-parse with its path.

7. Summary

In this article, we discussed the basic skeletal structure of a Git repository and ways we can tamper with it.

In conclusion, manual edits of .git are possible and sometimes necessary, but should be performed with extra caution, despite different ways to recover from issues.

Persistence

REST

Security