Have you ever found yourself in this situation ?
You have contributed a few changes to a repository that somebody told you to clone, and now comes the time to pull and merge remote modifications, before you can push your code to the world. (If you don't understand this git babbling, please bear with me, you will very soon enjoy confusing others with such sentences as well)
But now you start sweating. Are you going to execute this pull command ? your hand shakes as you're about to hit the return key.
I've been there. That was about fifteen years ago, but I still remember this feeling.
At the same time, we can't do without a version control system, can we?
And git is simply the most widely used and the best of all. I don't want to start a flame war, it's just a fact. Why that, might you ask ?
Well, for at least four reasons :
But git is also a rather complex tool, and it can be quite difficult to learn. Especially if you follow the official documentation and try to learn all of git right away. Or if you rightly bail out of the documentation and try to just learn a few commands, hoping it will be enough.
In this article, I will make my best to teach you git the right way.
You will learn:
For now, we will focus on a local git repo that we're going to create from scratch.
And in the next post, we will see how to use remote repositories to save our work and collaborate with other people.
To get started, you just need access to a terminal on a computer with git installed. This tutorial is written for *nix users, but I'm sure you can figure it out if you're using Windows (or get yourself a proper computer ;-) )
I encourage you to follow the instructions and type all commands by yourself, instead of just reading this article. It's going to stick better.
You don't need to clone a remote repository to use git.
In fact, you can initiate a brand new local git repository in any directory to start tracking your changes. And that's something I actually do very often, because I just feel uncomfortable when I edit files in a directory that is not under version control. The risk is too big : if I lose my work, I will have to redo it, and I hate that.
So let's create a test repository, which we're going to use in the whole tutorial.
mkdir test_repo cd test_repo git init ls -a
. .. .git
git init command initialized git in the
test_repo directory. And it created the
.git hidden directory.
It's always nice to look into hidden directories:
ls -a .git
total 24 -rw-r--r-- 1 cbernet staff 23 Mar 31 22:03 HEAD -rw-r--r-- 1 cbernet staff 137 Mar 31 22:03 config -rw-r--r-- 1 cbernet staff 73 Mar 31 22:03 description drwxr-xr-x 14 cbernet staff 448 Mar 31 22:03 hooks drwxr-xr-x 3 cbernet staff 96 Mar 31 22:03 info drwxr-xr-x 4 cbernet staff 128 Mar 31 22:03 objects drwxr-xr-x 4 cbernet staff 128 Mar 31 22:03 refs
All the information about your repo will be stored here. There is no remote server involved, just this simple directory.
An interesting consequence is that if you delete
.git or its parent,
test_repo, you will lose your entire history! The solution to this potentially dramatic and rather probable event is to push your changes to a remote repository. We will come back to that in the next article.
For now, let's start using our repo. First, let's see what is the status of the repo:
On branch master No commits yet nothing to commit (create/copy files and use "git add" to track)
That's not too interesting for now. Still, note that git always tries to help you by giving us hints of what to do next. Often, these hints are enough. Let's do what the hint says and create a file for ...
We start by creating a simple file with a single line. This is easily done from the command line:
echo 'hello world' > file.txt cat file.txt
We check the status again:
On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) file.txt nothing added to commit but untracked files present (use "git add" to track)
The file appears as "untracked", which means that it's not currently tracked by git.
We keep following the instructions and add the file to git:
git add file.txt git status
On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: file.txt
Aha ! the file is set to be committed, or staged. As explained, we could unstage it by just following the hint. But don't bother remembering this, you'll get the hint to help you when you need it.
So now we can commit, without forgetting to provide a useful commit message:
git commit -m 'initial version'
[master (root-commit) ab86e28] initial version 1 file changed, 1 insertion(+) create mode 100644 file.txt
If that's the first time that you're setting up git, you get a message like this:
[master (root-commit) 7e58b70] test Committer: Colin Bernet <firstname.lastname@example.org> Your name and email address were configured automatically based on your username and hostname. Please check that they are accurate. You can suppress this message by setting them explicitly. Run the following command and follow the instructions in your editor to edit your configuration file: git config --global --edit After doing this, you may fix the identity used for this commit with: git commit --amend --reset-author 1 file changed, 1 insertion(+) create mode 100644 file.txt
No worries, but please follow these instructions now, before going any further.
And we check the status again with
git status, which gives:
On branch master nothing to commit, working tree clean
Here is a summary about what we did, and a bit of terminology:
To get information about your last commit, do:
commit ab86e28862528ffdda5737f94242989cd9ef1f51 (HEAD -> master) Author: Colin Bernet <email@example.com> Date: Wed Mar 31 22:19:15 2021 +0200 initial version diff --git a/file.txt b/file.txt new file mode 100644 index 0000000..3b18e51 --- /dev/null +++ b/file.txt @@ -0,0 +1 @@ +hello world
These is a lot of information here:
20ff75c4baff1a200. This is the commit ID. The commit ID uniquely identifies this commit. When referring to a commit, you don't need to type in the whole number, just the few first characters. Git will recognize it anyway.
HEAD -> master. This means that we are on the branch
master(don't worry, we will discuss branches just a bit later).
hello world. This last part is a "diff", similar to what you would get with the diff command.
Now let's focus on the last part of this printout, which shows the changes introduced by the commit.
This part has a very specific format : it's a patch.
You know, sometimes when you upgrade a piece of software on your computer, you download and apply a patch. This patch is simply the difference between the new version and the old version of the software. If you apply the patch to the old version, you get the new version.
Patches are very convenient, because they make it possible to upgrade software without having to download the full new version, only the difference between the new version and the one you have, which represents a much smaller amount of data.
If you really want to understand git, you need to understand patching.
So let's create and apply a patch manually.
First, we create a new file:
echo "hello $USER" > user.txt cat user.txt
Then create a patch file:
diff -u file.txt user.txt > patch.txt cat patch.txt
We see that
patch.txt now contains the changes between
user.txt , in the exact same format as in a git commit:
--- file.txt 2021-03-31 22:17:44.000000000 +0200 +++ user.txt 2021-04-02 08:20:05.000000000 +0200 @@ -1 +1 @@ -hello world +hello cbernet
We can now apply this patch to
file.txt to change its contents to what is in
patch -u file.txt patch.txt cat file.txt
file.txt contains the line
hello cbernet , and not anymore
hello world .
We can patch a single file as we did above, and we can also patch a whole directory in one go.
In git, a commit only contains a patch, and a reference to the mother commits.
For example, just consider this sequence of three commits:
When you decide to go to the version corresponding to commit 3, here is what happens:
At this point, your working directory is in sync with the version of commit 3.
A git branch is a simple pointer to a commit ! nothing more.
Let's check the current status of our repository with
On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: file.txt Untracked files: (use "git add <file>..." to include in what will be committed) patch.txt user.txt
We have modified
file.txt with our manual patch above.
Now, check the commit history with
commit ab86e28862528ffdda5737f94242989cd9ef1f51 (HEAD -> master) Author: Colin Bernet <firstname.lastname@example.org> Date: Wed Mar 31 22:19:15 2021 +0200 initial version
Currently, our history only contains one commit. Let's commit our modifications to
git commit -am 'greeting user'
In this command, we have used two options:
-mfor the commit message
-awhich means "add all modified files to the staging area before committing". It is just a convenient shortcut that saves us a call to
Now check the history again:
commit cfc738b3bea930d4f14d6d1b93ff3af9d4880a37 (HEAD -> master) Author: Colin Bernet <email@example.com> Date: Fri Apr 2 09:20:24 2021 +0200 greeting user commit ab86e28862528ffdda5737f94242989cd9ef1f51 Author: Colin Bernet <firstname.lastname@example.org> Date: Wed Mar 31 22:19:15 2021 +0200 initial version
We see that a new commit has indeed appeared.
Most importantly, note that the branch
master has moved from commit
ab86e28862 to commit
Here is what happened during the commit :
master, which is just a pointer, was moved to this new commit.
And what is
HEAD ? just a reference to the branch currently in use.
master is just a pointer, we can delete it without destroying any commit. It really doesn't play any special role despite its glorious name.
I'd like to show you that, but we can't delete the branch we're currently sitting on.
So we start by creating and moving to a new branch:
git checkout -b tmp
Then we check history again with
commit 1aa0c80b03e172526f5dfc1b9e89c2265810cc3d (HEAD -> tmp, master) Author: Colin Bernet <email@example.com> Date: Fri Apr 2 09:58:28 2021 +0200 greeting user commit ab86e28862528ffdda5737f94242989cd9ef1f51 Author: Colin Bernet <firstname.lastname@example.org> Date: Wed Mar 31 22:19:15 2021 +0200 initial version
We see that the new
tmp branch points to the same commit a
master and that we are now sitting on
tmp , since
HEAD points to
Another way to look at your branches is to use:
This gives the list of all branches and indicates the current branch with a star:
master * tmp
Now you can delete master with:
git branch -d master
Check that master was indeed deleted with
git branch or
git log , and then recreate and move back to
git checkout -b master
See? nothing bad happened.
To conclude this section, let's see how git keeps track of branches.
They are stored in
ls .git/refs/heads > master tmp
There is a file for each branch in this directory.
If we look into the
master file, we see that it only contains the commit ID that
master is pointing to :
So far, we have only used basic git commands. That served our needs so far, but most commands can be very much improved with a few options.
Fifteen years ago, my colleague at CERN Giulio Eulisse passed me down his git config.
It has been immensely useful and now, it is my turn to pass it down to you. I hope you'll make good use of it.
Your git configuration file is in your home directory, in the file
Here is (a portion) of mine:
[core] excludesfile = /Users/cbernet/.gitignore_global editor = nano [user] name = Colin Bernet email = email@example.com github = cbernet [color] ui = true [color "status"] added = yellow changed = green untracked = cyan [alias] co = checkout b = branch -vv l = log --graph --all --abbrev-commit --date=relative --format=format:'%C(bold blue)%h%C(reset) %C(green)%ar %C(blue)%an%C(bold blue)%d%C(reset)%n %C(white)%s%n' lt = log --graph --abbrev-commit --date=relative --format=format:'%C(bold blue)%h%C(reset) %C(green)%ar %C(blue)%an%C(bold blue)%d%C(reset)%n %C(white)%s%n' l1 = log --pretty=oneline --decorate s = status
There are several sections in this file:
co: just a shortcut for
git checkout, which is used very often and takes too long to type
b: shortcut to
git branch, with more information
l: nice history log with a tree view, for the whole repo
lt: same, but only for the current branch
l1: one-liner log
s: shortcut for status
Now, I suggest you to merge this config with yours, and to try the aliases. The command
git l, in particular, will make it much easier for you to understand what comes next.
If you like these aliases, feel free to go and thank Giulio on twitter!
In my git config, I specify a global gitignore file for all my git repositories. It contains the following lines:
*.pyc *~ .idea secrets
This file tells git to ignore all files and directories matching one of these patterns:
You could define a global .gitignore that suits your needs, or put a .gitignore file in each repository. I typically do both.
Until now, we built a rather simple tree with a single branch of only two commits.
Real-life projects are more complicated, with different features being developed at the same time on different branches.
In this section, we're going to make things a bit more complex.
First, I would like to remind you the status of our repository at the moment (`git l`) :
* 1aa0c80 8 hours ago Colin Bernet (HEAD -> master, tmp) | greeting user | * ab86e28 2 days ago Colin Bernet initial version
Let's assume that you want to start developing a new feature based on the first commit. We start by creating a new branch on this commit:
git co -b new_feature ab86e28
As expected, you're now sitting on branch
new_feature, which points to the commit on which you want to base your development:
* 1aa0c80 8 hours ago Colin Bernet (tmp, master) | greeting user | * ab86e28 2 days ago Colin Bernet (HEAD -> new_feature) initial version
Now let's do some development. We simply add a new line to
echo 'hello Joe' >> file.txt cat file.txt
hello world hello Joe
Check the status of your repository with
git s , and commit with
git commit -am 'greeting Joe'
Then look at your tree with
* 824db2c 3 seconds ago Colin Bernet (HEAD -> new_feature) | greeting Joe | | * 1aa0c80 8 hours ago Colin Bernet (tmp, master) |/ greeting user | * ab86e28 2 days ago Colin Bernet initial version
With Giulio's alias, we get a very clear view of what's going on.
The commits are ordered by commit time, and we see that the
master branches have diverged.
Now, do the following :
git co master, and check the contents of
new_feature, check again the contents of this file.
Don't be afraid to use branches:
After you've developed your new feature, you will want to integrate your changes into
master. Again, this branch is just a normal branch. But by convention, we take it as the "official" development path.
Let's do this.
To integrate the changes of
master, we need to merge
master . To do this :
git co master git merge new_feature
Argh! we get a conflict !! :-)
Auto-merging file.txt CONFLICT (content): Merge conflict in file.txt Automatic merge failed; fix conflicts and then commit the result.
Don't worry, conflicts are perfectly normal, let's see how to solve it. The first step is to check the status with `git s`
On branch master You have unmerged paths. (fix conflicts and run "git commit") (use "git merge --abort" to abort the merge) Unmerged paths: (use "git add <file>..." to mark resolution) both modified: file.txt Untracked files: (use "git add <file>..." to include in what will be committed) patch.txt user.txt no changes added to commit (use "git add" and/or "git commit -a")
Again, git gives us fairly precise instructions. We could abort the merge, or resolve the conflict by hand. The files in conflict are indicated (only
file.txt in this case, but you could get conflicts in several files).
We're going to resolve the conflict. For this, we open the file in conflict with our favorite editor, and we see:
<<<<<<< HEAD hello cbernet ======= hello world hello Joe >>>>>>> new_feature
==== , we see what we have on
HEAD, meaning the branch we're sitting on, that is
master. Under the separator, we see what's on the
Now it's up to you. You decide what you want to keep based on what you know. In any case, you need to remove the conflict markers. For example, you could edit the file to:
hello cbernet hello Joe
That's it, you resolved the conflict. Then, just follow the instructions provided by git:
git add file.txt
On branch master All conflicts fixed but you are still merging. (use "git commit" to conclude merge) Changes to be committed: modified: file.txt Untracked files: (use "git add <file>..." to include in what will be committed) patch.txt user.txt
And you're done !
Conflicts are normal. Just keep cool.
Finally, we can check the history with
* f1ca7e2 3 seconds ago Colin Bernet (HEAD -> master) |\ Merge branch 'new_feature' | | | * 824db2c 2 hours ago Colin Bernet (new_feature) | | greeting Joe | | * | 1aa0c80 9 hours ago Colin Bernet (tmp) |/ greeting user | * ab86e28 2 days ago Colin Bernet initial version
The new merge commit (f1ca7e2) has two ancestors, and contains all commits in both history lines.
As we said before, a branch is simply a pointer to a commit. So if you delete the branch, the commit still exists.
Now that we have merged, we can get rid of obsolete branches, which is good practice.
git b -d new_feature tmp git b
* master f1ca7e2 Merge branch 'new_feature'
A git tag is a pointer to a commit, just like git branches.
But unlike branches, tags don't move.
They are a way to bookmark important states in the development of a project. People use tags to mark the commit corresponding to a release of the software
Here is what our repository tree currently looks like:
* f1ca7e2 14 hours ago Colin Bernet (HEAD -> master) |\ Merge branch 'new_feature' | | | * 824db2c 16 hours ago Colin Bernet | | greeting Joe | | * | 1aa0c80 24 hours ago Colin Bernet |/ greeting user | * ab86e28 3 days ago Colin Bernet initial version
Let's assume that you want to release your software at the version corresponding to commit
ab86e28. You could tag it like this (using any name you want):
git tag v0.0.1 ab86e28 git l
* f1ca7e2 15 hours ago Colin Bernet (HEAD -> master) |\ Merge branch 'new_feature' | | | * 824db2c 16 hours ago Colin Bernet | | greeting Joe | | * | 1aa0c80 24 hours ago Colin Bernet |/ greeting user | * ab86e28 3 days ago Colin Bernet (tag: v0.0.1) initial version
Just like for branches, git keeps track of tags with simple files containing only the corresponding commit ID. You can find these files in
To remove a tag, do:
git tag -d v0.0.1
Congratulations, you're at the end of the first part of my git tutorial.
So far, you have learnt how to work with a local git repository.
In particular, you learnt about these essential commands, make sure to remember them:
git init: initialize git repository
git s: look at the current status of your repository
git diff: check the diffs between your working directory and your repository
git add <file>: add a file to the staging area, for the next commit
git commit -m <message>: commit
git show: look at the current commit
git log: print commit history
git l: get a nice tree view of the history
git lt: same, but only for current branch
git l1: history with a single line per commit
Branches & tags
git checkout -b <branch>: create a branch and move to this branch
git b: print all branches and show current branch
git branch -d <branch>: delete branch
git merge <branch_to_merge>: merge branch into current branch
git tag <tag> [commit]: create a tag at current commit, or at specified commit.
In the next posts, we will discuss :
Please let me know what you think in the comments! I’ll try and answer all questions.
And if you liked this article, you can subscribe to my newsletter to be notified of new posts (no more than one mail per week I promise.)
You can join my newsletter to learn more about machine learning and data: