Link Search Menu Expand Document

Core Concepts

Table of Contents

Repositories: Distributed Version Control System

Commands like push and pull (which is really a shortcut to fetch, then merge) copy files between local and remote repositories.

Remotes, like the one in Azure DevOps, are not special. Internally they are exactly like your local clone. You can git init --bare a new repository on a shared drive, add it as a remote using the file path and push to it. You could even push directly into your coworker’s local remote if you had file access.

You can build any kind of file sharing workflow. Git is just a tool to copy files. Often centralized workflow is sufficient, but many projects like the Linux kernel use other workflows. The image below is the “Integration Manager” workflow described in the article “Distributed Workflows”. Each box is a different repository and the arrows show the direction files move. The developers work in their local private repos and push to their public repos. The manager pulls from the public dev repos into their private integration repo. The manager pushes completed integrations to the public “blessed repository”. Finally the other developers pull coworkers’ changes into their private repos to begin new work.

git-integration-manager-workflow.png

Git has no file locks. Instead, two changes to the same file are merged. The built in merge tools only work on text files, but you can install custom ones. If there are conflicts, you must resolve them. If changes are not planned well, conflicts can become extremely difficult to resolve. Your knowledge of what kind of merges work well, and communication with team members, is critical. Remember git is a tool and it takes work. If used well it delivers a lot of value, if used poorly it can cause serious issues.

File Areas: Worktree, Index, and Repository

The commands add, commit, checkout, and reset copy files between the “working”, “staging”, and “repository” (i.e. commit object) areas.

FileAreas.png

Notice in this image how the “Work Tree” and “Index” areas are not the same as the “Commit In Repository” area. If files in your work tree have not been added and committed, they are called untracked and are not protected by git’s immutable data structure. Many commands modify the work tree and may remove files without warning. Leaving untracked files in your work tree is the most common way to lose work in git. If you’re not ready to commit a file, use the stash command to save it, or make a temporary commit and tag it to remind you of what you need it for.

The Index

The index is commonly misunderstood, and one of the big things almost every git GUI does not properly support.

The --patch option works with several commands like add. It lets you put only some of the changes you are working on within a single file to the index, so you can make commits without mixing up unrelated work. For example, if you are working on a large change in Main.xaml, but notice a small bug to fix, and you can add and commit only the small part of Main.xaml related to the bug separately before your large change is completed.

The --cached option works with several commands like rm. It lets you operate on the index in cases that by default operate on the work tree.

Commit Level vs File Level Operations

Reset Demystified is an article in the official Pro Git book that was key to my understanding of git references and essential commands. I recommend that you read it in full. I often reference the table at the end. It’s so good, I even copied it below with clarifications.

Read the next section in about Refs and Objects for a brief introduction to refs like HEAD and branches.

  Moves HEAD or branch? Copies to Index? Copies to Work Dir? Work Dir Safe?
Commit Level        
reset --soft <commit> HEAD and branch No No Yes
reset <commit> HEAD and branch Yes No Yes
reset --hard <commit> HEAD and branch Yes Yes No
checkout <commit> HEAD only Yes Yes Yes
File Level        
reset <commit> <paths> Neither Yes No Yes
checkout <commit> <paths> Neither Yes Yes No

Refs and Objects: HEAD, Branch, Commit, Tree, and Blob

Refs are HEAD (not pictured), branches (orange), and tags. Objects are commits (yellow), trees (green), and blobs (gray). In a command they can be referred to by a ref that points at them, or their own specific hash number. Not pictured: HEAD points to a branch.

git-data-model-4.png

Data Structure

Internally, a git repository is just a bag of objects connected by pointers.

HEAD Points to a Branch

Many git commands let you omit what branch to operate on, defaulting to the branch HEAD points to.

Sometimes git can be in a “detached HEAD” state, where HEAD actually points directly to a commit hash instead of a branch. To fix this, just checkout a branch. Similarly, when resolving a merge conflict or editing during a rebase, HEAD does not point to the final branch until you have finished.

By default a repository has one Working Directory and one HEAD. Use Git Worktrees to checkout multiple branches at once, each with their own HEAD.

Branches are Just Pointers

Branches, tags, and other refs are simple pointers to a single commit, not like folders of commits.

Git doesn’t remember which branch a commit was made on, instead it is said that a commit is “reachable” from a branch by following the commit’s pointers to its parents. After merging, a commit is often “reachable” from multiple branches. The internal file in the .git folder that “stores” a branch only contains the hash of the commit it points to. That’s it!

In the image, “first commit” (hash fdf4fc) is reachable by both master and test branches.

Commits are Just Pointers with Metadata

Commits don’t contain the files themselves, only pointers to other commits, a pointer to a tree, and metadata like the author and message. The pointers between commits form a Merkle tree that you see when in the log graph when you use git log.

Trees are… You Guessed It, More Pointers

Finally, trees contain the pointers to the blobs that store the files. When there are files that remain unchanged over several commits git does not duplicate them, instead the new tree points to the same blob.

Trees are not often dealt with directly, so most users don’t think about them. However, ls-tree is a useful command you can use to view the folders and files in a commit without having to load it into a worktree with checkout. Try the following command to see only the folder structure of a specific commit in a repository. git ls-tree -r -d --name-only <branch>

Blobs

Blobs are the objects that store file content.

Commands like reset and checkout copy the content into the index or worktree depending on the form used.

Sometimes it is useful to view or retrieve only one file from a commit without having to overwrite your other file areas. The show command can be used with the branch:path syntax to specify the blob. To quickly view a text file in “Git Bash”, pipe the output of show to the less command: git show branch/name:path/to/file.txt | less Or, to output the content to a different file name, use a bash redirection operator: git show branch/name:path/to/file.txt > path/to/file-OtherBranchVersion.txt

Three Layers of Abstraction

Change

A commit is often conceptually described as a “change”. When we use git rebase, we talk about “applying” the changes to a different part of the repository. However this is only the first and highest layer of abstraction, and does not fully describe how git works.

Object

The image above shows the second layer of abstraction, where we can see that each commit stores all files in full size.

At first this seems inefficient. But there is space savings due to git’s design as a “content addressable” system where files that do not have changes have the same hash code, so multiple trees can safely point to the same file.

Pack

Finally, objects are only temporarily actually stored in full size. In the third and lowest level of abstraction, git moves older objects into Packfiles where they are efficiently stored using delta compression.

Tip: Commit Small Changes

Immutability Gives You Safety to Make Changes

All objects are “immutable”. They are never changed, git only creates new objects. Objects are not deleted during normal operations, only during garbage collection. For objects to be deleted, they must be (by default) 2 weeks old and not reachable from any ref or reflog entry (which expire when they are 30 days old).

When a commit becomes unreachable from any branches, thus disappearing from your log graph, it still exists. One command you can use to find lost commits is reflog.

For example:

  • You’re working on master and commit your latest work with: git commit -m "My brand new feature." Your log graph shows:
* 4d2476f (master) My brand new feature.
* 6fe53e1 Work in progress.
* 2486da5 Work in progress.
* 496d993 Create repository.
  • You change your mind about something, and want to fix it so you run git reset --hard HEAD~ to undo the commit. Oops! You shouldn’t have used the --hard flag, you really wanted the changes to stay in your working tree so you could edit them and commit again, but now your working tree has only the old files from before your change. In your log graph, it looks like you lost your work:
* 6fe53e1 (master) Work in progress.
* 2486da5 Work in progress.
* 496d993 Create repository.
  • To get it back, you can look at the output of git reflog master:
6fe53e1 (master) master@{0}: branch: Reset to 6fe53e1
4d2476f master@{1}: commit: My brand new feature.
6fe53e1 master@{2}: commit: Work in progress.
2486da5 master@{3}: commit: Work in progress.
2486da5 master@{4}: branch: Created from HEAD
  • To keep this example short, we’ll simply add a branch there again. You can use the abbreviated hash: git branch recoveredWork 4d2476f Or use the RefLog Shortnames syntax: git branch recoveredWork master@{1} The graph now shows both branches.
* 4d2476f (recoveredWork) My brand new feature.
* 6fe53e1 (master) Work in progress.
* 2486da5 Work in progress.
* 496d993 Create repository.

The reflog records entries of every type of action including merge, rebase, reset, and branch -f (force a branch to move).

Don’t be afraid to try something, it’s easy to go back if it doesn’t work out.

Tip: Immutability Gives You Safety But Don’t Make These Mistakes

Read More

You can read more in the Git Internals chapter of the Pro Git book.

Next

clone.sh