Git : Overcome your Fears

Finally understand how git works, stop using it blindly, and learn the most important commands to become productive right away.

Have you ever found yourself in this situation ?

You have contributed a few changes to a repository that somebody told you to clone, and now comes the time to pull and merge remote modifications, before you can push your code to the world. (If you don't understand this git babbling, please bear with me, you will very soon enjoy confusing others with such sentences as well)

But now you start sweating. Are you going to execute this pull command ? your hand shakes as you're about to hit the return key.

I've been there. That was about fifteen years ago, but I still remember this feeling.

At the same time, we can't do without a version control system, can we?

And git is simply the most widely used and the best of all. I don't want to start a flame war, it's just a fact. Why that, might you ask ?

Well, for at least four reasons :

It's extremely powerful, lightweight, and lightning fast
It's decentralized: you don't need to connect to a server to use it, just a filesystem.
It's actually really simple when you know what you're doing
It will save you when you think all is lost (if you know what you're doing)

But git is also a rather complex tool, and it can be quite difficult to learn. Especially if you follow the official documentation and try to learn all of git right away. Or if you rightly bail out of the documentation and try to just learn a few commands, hoping it will be enough.

In this article, I will make my best to teach you git the right way.

You will learn:

the few essential commands. If I don't use a command at least once per week, I promise I won't tell you about it, and you'll discover it when time is right for you.
that the internals of git are rather simple, and demystify this tool.

For now, we will focus on a local git repo that we're going to create from scratch.

And in the next post, we will see how to use remote repositories to save our work and collaborate with other people.

To get started, you just need access to a terminal on a computer with git installed. This tutorial is written for *nix users, but I'm sure you can figure it out if you're using Windows (or get yourself a proper computer ;-) )

I encourage you to follow the instructions and type all commands by yourself, instead of just reading this article. It's going to stick better.

Creating a git repository from scratch

You don't need to clone a remote repository to use git.

In fact, you can initiate a brand new local git repository in any directory to start tracking your changes. And that's something I actually do very often, because I just feel uncomfortable when I edit files in a directory that is not under version control. The risk is too big : if I lose my work, I will have to redo it, and I hate that.

So let's create a test repository, which we're going to use in the whole tutorial.

mkdir test_repo
cd test_repo
git init 
ls -a

You get:

.    ..   .git

The git init command initialized git in the test_repo directory. And it created the .git hidden directory.

It's always nice to look into hidden directories:

ls -a .git

total 24
-rw-r--r--   1 cbernet  staff   23 Mar 31 22:03 HEAD
-rw-r--r--   1 cbernet  staff  137 Mar 31 22:03 config
-rw-r--r--   1 cbernet  staff   73 Mar 31 22:03 description
drwxr-xr-x  14 cbernet  staff  448 Mar 31 22:03 hooks
drwxr-xr-x   3 cbernet  staff   96 Mar 31 22:03 info
drwxr-xr-x   4 cbernet  staff  128 Mar 31 22:03 objects
drwxr-xr-x   4 cbernet  staff  128 Mar 31 22:03 refs

All the information about your repo will be stored here. There is no remote server involved, just this simple directory.

An interesting consequence is that if you delete .git or its parent, test_repo, you will lose your entire history! The solution to this potentially dramatic and rather probable event is to push your changes to a remote repository. We will come back to that in the next article.

For now, let's start using our repo. First, let's see what is the status of the repo:

git status

On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)

That's not too interesting for now. Still, note that git always tries to help you by giving us hints of what to do next. Often, these hints are enough. Let's do what the hint says and create a file for ...

Our first git commit

We start by creating a simple file with a single line. This is easily done from the command line:

echo 'hello world' > file.txt
cat file.txt

hello world

We check the status again:

git status

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        file.txt

nothing added to commit but untracked files present (use "git add" to track)

The file appears as "untracked", which means that it's not currently tracked by git.

We keep following the instructions and add the file to git:

git add file.txt
git status

On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   file.txt

Aha ! the file is set to be committed, or staged. As explained, we could unstage it by just following the hint. But don't bother remembering this, you'll get the hint to help you when you need it.

So now we can commit, without forgetting to provide a useful commit message:

git commit -m 'initial version'

[master (root-commit) ab86e28] initial version
 1 file changed, 1 insertion(+)
 create mode 100644 file.txt

If that's the first time that you're setting up git, you get a message like this:

[master (root-commit) 7e58b70] test
 Committer: Colin Bernet <cbernet@lyocms23.lan>
Your name and email address were configured automatically based
on your username and hostname. Please check that they are accurate.
You can suppress this message by setting them explicitly. Run the
following command and follow the instructions in your editor to edit
your configuration file:

    git config --global --edit

After doing this, you may fix the identity used for this commit with:

    git commit --amend --reset-author

 1 file changed, 1 insertion(+)
 create mode 100644 file.txt

No worries, but please follow these instructions now, before going any further.

And we check the status again with git status, which gives:

On branch master
nothing to commit, working tree clean

Here is a summary about what we did, and a bit of terminology:

First, we created a file in the working directory, also called the working tree. This is just the directory in which you are working.
Then, we added the file to the staging area. This is where you put files to prepare the next commit. Typically, you're going to add several files to the staging area for a single commit, that is going to correspond to a feature, like "fix stupid bug".
Finally, we committed all changes to the repository in the staging area. After that, the working directory is clean again, which means that it's in sync with the repository.

Anatomy of a commit

To get information about your last commit, do:

git show

commit ab86e28862528ffdda5737f94242989cd9ef1f51 (HEAD -> master)
Author: Colin Bernet <colin.bernet@cern.ch>
Date:   Wed Mar 31 22:19:15 2021 +0200

    initial version

diff --git a/file.txt b/file.txt
new file mode 100644
index 0000000..3b18e51
--- /dev/null
+++ b/file.txt
@@ -0,0 +1 @@
+hello world

These is a lot of information here:

First, we see a very long hexadecimal number starting with 20ff75c4baff1a200. This is the commit ID. The commit ID uniquely identifies this commit. When referring to a commit, you don't need to type in the whole number, just the few first characters. Git will recognize it anyway.
Then, we see HEAD -> master. This means that we are on the branch master (don't worry, we will discuss branches just a bit later).
After that, we have some meta information about the commit : author, commit date and time, commit message.
And finally, we see the changes introduced by the commit: we created a new file, and we added a line to that file, hello world. This last part is a "diff", similar to what you would get with the diff command.

Now let's focus on the last part of this printout, which shows the changes introduced by the commit.

This part has a very specific format : it's a patch.

You know, sometimes when you upgrade a piece of software on your computer, you download and apply a patch. This patch is simply the difference between the new version and the old version of the software. If you apply the patch to the old version, you get the new version.

Patches are very convenient, because they make it possible to upgrade software without having to download the full new version, only the difference between the new version and the one you have, which represents a much smaller amount of data.

If you really want to understand git, you need to understand patching.

So let's create and apply a patch manually.

First, we create a new file:

echo "hello $USER" > user.txt
cat user.txt

Then create a patch file:

diff -u file.txt user.txt > patch.txt
cat patch.txt

We see that patch.txt now contains the changes between file.txt and user.txt , in the exact same format as in a git commit:

--- file.txt    2021-03-31 22:17:44.000000000 +0200
+++ user.txt    2021-04-02 08:20:05.000000000 +0200
@@ -1 +1 @@
-hello world
+hello cbernet

We can now apply this patch to file.txt to change its contents to what is in user.txt :

patch -u file.txt patch.txt 
cat file.txt

Now file.txt contains the line hello cbernet , and not anymore hello world .

We can patch a single file as we did above, and we can also patch a whole directory in one go.

In git, a commit only contains a patch, and a reference to the mother commits.

For example, just consider this sequence of three commits:

When you decide to go to the version corresponding to commit 3, here is what happens:

git reconstructs the sequence of commits that leads to commit 3, using the references to the mother commits.
then, it applies the patches of each commit in the right order, starting from scratch : 1, then 2, and finally 3.

At this point, your working directory is in sync with the version of commit 3.

What is a git branch ?

A git branch is a simple pointer to a commit ! nothing more.

Let's check the current status of our repository with git status:

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   file.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        patch.txt
        user.txt

We have modified file.txt with our manual patch above.

Now, check the commit history with git log:

commit ab86e28862528ffdda5737f94242989cd9ef1f51 (HEAD -> master)
Author: Colin Bernet <colin.bernet@cern.ch>
Date:   Wed Mar 31 22:19:15 2021 +0200

    initial version

Currently, our history only contains one commit. Let's commit our modifications to file.txt:

git commit -am 'greeting user'

In this command, we have used two options:

-m for the commit message
-a which means "add all modified files to the staging area before committing". It is just a convenient shortcut that saves us a call to git add.

Now check the history again:

commit cfc738b3bea930d4f14d6d1b93ff3af9d4880a37 (HEAD -> master)
Author: Colin Bernet <colin.bernet@cern.ch>
Date:   Fri Apr 2 09:20:24 2021 +0200

    greeting user

commit ab86e28862528ffdda5737f94242989cd9ef1f51
Author: Colin Bernet <colin.bernet@cern.ch>
Date:   Wed Mar 31 22:19:15 2021 +0200

    initial version

We see that a new commit has indeed appeared.

Most importantly, note that the branch master has moved from commit ab86e28862 to commit cfc738b3.

Here is what happened during the commit :

commit cfc738b was created
branch master , which is just a pointer, was moved to this new commit.

And what is HEAD ? just a reference to the branch currently in use.

Managing git branches

Since master is just a pointer, we can delete it without destroying any commit. It really doesn't play any special role despite its glorious name.

I'd like to show you that, but we can't delete the branch we're currently sitting on.

So we start by creating and moving to a new branch:

git checkout -b tmp

Then we check history again with git log:

commit 1aa0c80b03e172526f5dfc1b9e89c2265810cc3d (HEAD -> tmp, master)
Author: Colin Bernet <colin.bernet@cern.ch>
Date:   Fri Apr 2 09:58:28 2021 +0200

    greeting user

commit ab86e28862528ffdda5737f94242989cd9ef1f51
Author: Colin Bernet <colin.bernet@cern.ch>
Date:   Wed Mar 31 22:19:15 2021 +0200

    initial version

We see that the new tmp branch points to the same commit a master and that we are now sitting on tmp , since HEAD points to tmp .

Another way to look at your branches is to use:

git branch

This gives the list of all branches and indicates the current branch with a star:

master
* tmp

Now you can delete master with:

git branch -d master

Check that master was indeed deleted with git branch or git log , and then recreate and move back to master :

git checkout -b master

See? nothing bad happened.

To conclude this section, let's see how git keeps track of branches.

They are stored in .git/refs/heads :

ls .git/refs/heads

> 

master   tmp

There is a file for each branch in this directory.

If we look into the master file, we see that it only contains the commit ID that master is pointing to :

1aa0c80b03e172526f5dfc1b9e89c2265810cc3d

Git configuration

So far, we have only used basic git commands. That served our needs so far, but most commands can be very much improved with a few options.

Fifteen years ago, my colleague at CERN Giulio Eulisse passed me down his git config.

It has been immensely useful and now, it is my turn to pass it down to you. I hope you'll make good use of it.

Your git configuration file is in your home directory, in the file ~/.gitconfig .

Here is (a portion) of mine:

[core]
        excludesfile = /Users/cbernet/.gitignore_global
        editor = nano
[user]
        name = Colin Bernet
        email = colin.bernet@cern.ch
        github = cbernet
[color]
        ui = true
[color "status"]
        added = yellow
        changed = green
        untracked = cyan
[alias]
        co = checkout
        b = branch -vv
        l = log --graph --all --abbrev-commit --date=relative --format=format:'%C(bold blue)%h%C(reset) %C(green)%ar %C(blue)%an%C(bold blue)%d%C(reset)%n        %C(white)%s%n'
        lt = log --graph --abbrev-commit --date=relative --format=format:'%C(bold blue)%h%C(reset) %C(green)%ar %C(blue)%an%C(bold blue)%d%C(reset)%n        %C(white)%s%n'
        l1 = log --pretty=oneline --decorate
        s = status

There are several sections in this file:

core :
- define a global gitignore file, which will be discussed below
- define which editor I want to use with git
user :
- user preferedences
color :
- make things prettier
color "status"
- define colors for the status command
alias: command aliases. I define these aliases to make git easier to use :
- co : just a shortcut for git checkout, which is used very often and takes too long to type
- b : shortcut to git branch, with more information
- l : nice history log with a tree view, for the whole repo
- lt : same, but only for the current branch
- l1 : one-liner log
- s : shortcut for status

Now, I suggest you to merge this config with yours, and to try the aliases. The command git l, in particular, will make it much easier for you to understand what comes next.

If you like these aliases, feel free to go and thank Giulio on twitter!

The gitignore file

In my git config, I specify a global gitignore file for all my git repositories. It contains the following lines:

*.pyc
*~
.idea
secrets

This file tells git to ignore all files and directories matching one of these patterns:

*.pyc : ignore compiled python files
*~ : ignore temporary save files
.idea : ignore the configuration data from PyCharm, my usual IDE
secrets : ignore the secrets directories in which I typically put the access keys of my applications. These should never be committed to a version control system.

You could define a global .gitignore that suits your needs, or put a .gitignore file in each repository. I typically do both.

The git repository tree

Until now, we built a rather simple tree with a single branch of only two commits.

Real-life projects are more complicated, with different features being developed at the same time on different branches.

In this section, we're going to make things a bit more complex.

First, I would like to remind you the status of our repository at the moment (`git l`) :

* 1aa0c80 8 hours ago Colin Bernet (HEAD -> master, tmp)
|         greeting user
| 
* ab86e28 2 days ago Colin Bernet
          initial version

Let's assume that you want to start developing a new feature based on the first commit. We start by creating a new branch on this commit:

git co -b new_feature ab86e28

As expected, you're now sitting on branch new_feature, which points to the commit on which you want to base your development:

* 1aa0c80 8 hours ago Colin Bernet (tmp, master)
|         greeting user
| 
* ab86e28 2 days ago Colin Bernet (HEAD -> new_feature)
          initial version

Now let's do some development. We simply add a new line to file.txt:

echo 'hello Joe' >> file.txt
cat file.txt

This gives:

hello world
hello Joe

Check the status of your repository with git s , and commit with

git commit -am 'greeting Joe'

Then look at your tree with git l:

* 824db2c 3 seconds ago Colin Bernet (HEAD -> new_feature)
|         greeting Joe
|   
| * 1aa0c80 8 hours ago Colin Bernet (tmp, master)
|/          greeting user
| 
* ab86e28 2 days ago Colin Bernet
          initial version

With Giulio's alias, we get a very clear view of what's going on.

The commits are ordered by commit time, and we see that the new_feature and master branches have diverged.

Now, do the following :

go to the master branch with git co master , and check the contents of file.txt
come back to new_feature, check again the contents of this file.

Don't be afraid to use branches:

you should create a separate branch for each development objective you may have
you can switch branches easily if you want to work on several features at the same time

After you've developed your new feature, you will want to integrate your changes into master. Again, this branch is just a normal branch. But by convention, we take it as the "official" development path.

Let's do this.

Git merge (and your first conflict)

To integrate the changes of new_feature into master, we need to merge new_feature into master . To do this :

move to the master branch
merge new_feature into the current branch

git co master
git merge new_feature

Argh! we get a conflict !! :-)

Auto-merging file.txt
CONFLICT (content): Merge conflict in file.txt
Automatic merge failed; fix conflicts and then commit the result.

Don't worry, conflicts are perfectly normal, let's see how to solve it. The first step is to check the status with `git s`

On branch master
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)

Unmerged paths:
  (use "git add <file>..." to mark resolution)
        both modified:   file.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        patch.txt
        user.txt

no changes added to commit (use "git add" and/or "git commit -a")

Again, git gives us fairly precise instructions. We could abort the merge, or resolve the conflict by hand. The files in conflict are indicated (only file.txt in this case, but you could get conflicts in several files).

We're going to resolve the conflict. For this, we open the file in conflict with our favorite editor, and we see:

<<<<<<< HEAD
hello cbernet
=======
hello world
hello Joe
>>>>>>> new_feature

Above the ==== , we see what we have on HEAD, meaning the branch we're sitting on, that is master. Under the separator, we see what's on the new_feature branch.

Now it's up to you. You decide what you want to keep based on what you know. In any case, you need to remove the conflict markers. For example, you could edit the file to:

hello cbernet
hello Joe

That's it, you resolved the conflict. Then, just follow the instructions provided by git:

git add file.txt

On branch master
All conflicts fixed but you are still merging.
  (use "git commit" to conclude merge)

Changes to be committed:
        modified:   file.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        patch.txt
        user.txt

git commit

And you're done !

Conflicts are normal. Just keep cool.

Finally, we can check the history with git l:

*   f1ca7e2 3 seconds ago Colin Bernet (HEAD -> master)
|\          Merge branch 'new_feature'
| | 
| * 824db2c 2 hours ago Colin Bernet (new_feature)
| |         greeting Joe
| | 
* | 1aa0c80 9 hours ago Colin Bernet (tmp)
|/          greeting user
| 
* ab86e28 2 days ago Colin Bernet
          initial version

The new merge commit (f1ca7e2) has two ancestors, and contains all commits in both history lines.

As we said before, a branch is simply a pointer to a commit. So if you delete the branch, the commit still exists.

Now that we have merged, we can get rid of obsolete branches, which is good practice.

git b -d new_feature tmp 
git b

* master f1ca7e2 Merge branch 'new_feature'

Git tags

A git tag is a pointer to a commit, just like git branches.

But unlike branches, tags don't move.

They are a way to bookmark important states in the development of a project. People use tags to mark the commit corresponding to a release of the software

Here is what our repository tree currently looks like:

*   f1ca7e2 14 hours ago Colin Bernet (HEAD -> master)
|\          Merge branch 'new_feature'
| | 
| * 824db2c 16 hours ago Colin Bernet
| |         greeting Joe
| | 
* | 1aa0c80 24 hours ago Colin Bernet
|/          greeting user
| 
* ab86e28 3 days ago Colin Bernet
          initial version

Let's assume that you want to release your software at the version corresponding to commit ab86e28. You could tag it like this (using any name you want):

git tag v0.0.1 ab86e28
git l

*   f1ca7e2 15 hours ago Colin Bernet (HEAD -> master)
|\          Merge branch 'new_feature'
| | 
| * 824db2c 16 hours ago Colin Bernet
| |         greeting Joe
| | 
* | 1aa0c80 24 hours ago Colin Bernet
|/          greeting user
| 
* ab86e28 3 days ago Colin Bernet (tag: v0.0.1)
          initial version

Just like for branches, git keeps track of tags with simple files containing only the corresponding commit ID. You can find these files in .git/refs/tags/.

To remove a tag, do:Git remote : How to Collaborate

git tag -d v0.0.1

Recap: essential git commands and concepts

Congratulations, you're at the end of the first part of my git tutorial.

So far, you have learnt how to work with a local git repository.

In particular, you learnt about these essential commands, make sure to remember them:

Basics

git init : initialize git repository
git status of git s : look at the current status of your repository
git diff : check the diffs between your working directory and your repository
git add <file> : add a file to the staging area, for the next commit
git commit -m <message> : commit
git show : look at the current commit
git log : print commit history
- git l : get a nice tree view of the history
- git lt : same, but only for current branch
- git l1 : history with a single line per commit

Branches & tags

git checkout -b <branch> : create a branch and move to this branch
git branch or git b : print all branches and show current branch
git branch -d <branch> : delete branch
git merge <branch_to_merge> : merge branch into current branch
git tag <tag> [commit] : create a tag at current commit, or at specified commit.

In the next posts, we will discuss :

Git remote : How to Collaborate
Solutions for the usual git mess ups (like undoing a merge, the detached head state, rewriting history, and so on)

Please let me know what you think in the comments! I’ll try and answer all questions.

And if you liked this article, you can subscribe to my mailing list to be notified of new posts (no more than one mail per week I promise.)

Back Home

Learn about Data Science and Machine Learning!

You can join my mailing list for new posts and exclusive content: