Git remote : How to Collaborate

main_image.jpg

Introduction

Collaborating with git can be quite intimidating at first...

This is due to several reasons :

  • Git is decentralized. To collaborate with others on a single project, you need to interact with several repositories, most often three: your local repository, your remote, and the official repository.  It's already difficult to follow what's going on in a single repository, and it's even more difficult with three repositories, especially if you don't fully control them.
  • It's easy to mess up your local version of the code when integrating the work of others, if you don't know what you're doing.
  • Software collaboration is governed by a few basic rules, which unfortunately often remain unspoken. You need to know them.

After reading this guide, you will get a better picture of all this, and you will be able to collaborate smoothly with fellow developers on great projects !

This article is the second part of my series about Git. If you haven't read the first part, Git : Overcome your Fears, please take a look now to make sure you understand the basic concepts of Git.

For this tutorial, I have chosen GitHub as a remote git platform because it is accessible to everybody, and because most open-source projects are on GitHub.

But if you're using a different platform such as GitLab, don't worry, all platforms are very similar.

Here is the outline:

  • What are remote repositories ?
  • Forks and pull requests
  • Set up your GitHub account
  • Forking and cloning a GitHub repository
  • Adding a remote repositories
  • Updating your version of the code
  • Pushing your modifications to your remote
  • Contributing with a pull request
  • Guidelines for a successful pull request

What are remote repositories ? 

There's no magic here : a remote repository is simply another git repository.

The main difference with your local repository is that the remote is hosted somewhere on the internet, and not on your local machine. Most often, remotes are either hosted on github.com or on a private GitLab server belonging to an organization.

Apart from that, remotes are standard git repositories: they have commits, branches, tags, and so on.

Remotes are needed to share code.

For example, let's assume that Guido and Bjarne want to collaborate on some new software project called nohtyp. They have set up a remote repository on GitHub that would host the official version of the code.

git workflow one remote

To provide a new feature, Guido has:

  • implemented the feature on his local machine;
  • committed the code to his local repository, maybe in several successive commits;
  • "pushed" the new commits to the official repository.

At the same time, Bjarne has implemented another new feature. To contribute, he needs to:

  • fetch the new commits from Guido from the official repository, which appeared while Bjarne was developing;
  • merge them into his code, which results into a new merge commit. This is important, so that his version now contains the new code from Guido as well;
  • push the new commits to the official repository.

With this simple workflow, Bjarne and Guido can exchange code and build their project together, without ever connecting to the other person's machine. Also, each of them is still in full control of his own local repository.

But this workflow should never be used, because it is not safe. Indeed, it would require giving all developers the right to push into the official repository. And with the right to push, any developer could wreak havoc into the repository, e.g. by deleting important branches.

In the next section, we will see how this security issue is avoided by using forks and pull requests.

Forks and Pull Requests 

If we forbid people to push to the official repository, how can they contribute ?

With forks and pull requests !

Here is a modified git workflow :

standard git collaboration workflow

This time, each developer has his own remote repository on GitHub, to which he is free to push.

Here is how these remote repositories were created:

When a new developer wants to start contributing to the project he starts by "forking" the official repo to his GitHub account. Essentially, this just means that a copy of the official repo is created on his account, and that he takes full ownership of the copy. In this process, GitHub keeps track of the link between the fork and the mother repository for later.

After the copy is done, the fork and the official repo can (and will) diverge.

Please note that the GitHub repository URL indicates the name of the repository, and the GitHub account that owns the repository.

The official repository is typically owned either by an individual or a GitHub organization. Here, the organization is called "bg". The official repository is administered by people with extended rights (and maybe Guido and Bjarne are administrators).

With this architecture, the development workflow is the following.

Guido:

  • pushes the new commits to his remote repo, just like before (1)
  • from the remote repo, he sends a pull request (PR) to the official repo (2). We'll discuss this new operation below.

Official repository administrators:

  • review and test the pull request, to make sure everything is OK. They might request additional changes from Guido before they can accept the PR
  • merge the PR to upgrade the official version of the software

Bjarne:

  • directly fetches the official version from the official repo (3)
  • merges this version with his own code
  • pushes the result of the merge to his remote repo (4)
  • sends a pull request to the official repo (5), that will be reviewed and merged by the administrators.

So what is a pull request ?

Instead of pushing his new commits from his remote to the official repository, Guido tells the administrators of the official repository that he would like to get his new commits in the repo through the PR.

This is a much less intrusive operation, since the administrators can review and refuse the PR if they so wish.

The details will become clear below when you send your first pull request.

But before that, let's get our tools ready.

Set up your GitHub account 

Create your GitHub account if you don't already have one.

To interact with your remote repositories, you will need to connect to github with ssh from your local machine.

Follow the official ssh instructions from GitHub to set this up. This is going to take you a bit of time now, but it's really needed. 

Test your ssh connection to GitHub:

$ ssh git@github.com
PTY allocation request failed on channel 0
Hi cbernet! You've successfully authenticated, but GitHub does not provide shell access.
Connection to github.com closed.

Finally, check your git configuration, which is in the file .gitconfig in your home directory.

You should make sure that your name, your email, and your github username are correct:

[user]
    name = Colin Bernet
    email = contact@thedatafrog.com
    github = cbernet

It's important to do that, because this information will be included in all of your commits so that your collaborators can see that a commit is from you.

Forking and cloning a GitHub repository

For this exercise, I've provided a test github repository, which will serve as your "official" repository. And I'll be the repository administrator!

In this way you can exercise with your first pull requests in a safe and friendly environment 😉

I actually hope this guide won't attract too many people ... or it will be a lot of work for me.

First, log in to your github account.

Then, go to https://github.com/cbernet/datafrog_git_test

Now, fork the repository by clicking on the fork button at the top right:

fork.png

This will create your own copy of this repository and bring you to its page.

Note how the page url changed from https://github.com/cbernet/datafrog_git_test to https://github.com/<your_github_username>/datafrog_git_test.

Now that this is done, you will clone your fork to your local machine. This will :

  • create your local repository from your fork
  • establish the connection between the local repository and your fork as a remote

To do this, click on the Code button and select ssh. Then copy the URL use it to clone the repo like this :

git clone git@github.com:<your_github_username>/datafrog_git_test.git

Finally, enter the working directory of your local repository:

cd datafrog_git_test/

In this directory, you will find a simple script, hello.py, that you're going to modify later:

def great(people):
    for p in people:
        print(f'hello {p}')


if __name__ == '__main__':

    everybody = [
        'colin'
    ]
    great(everybody)

Ok, I should have written greet and not great :-) it would be a pain in the neck to correct this at this stage, so I'll leave it like this. Apologies!

Run the script:

python hello.py

Many people are going to collaborate on this script, so don't be surprised if it evolved. The code shown above is the original version, from commit bdfe1a4.

Now check the history with git l (the alias that we defined in Git : Overcome your Fears :)

git l
* bdfe1a4 17 seconds ago Colin Bernet (HEAD -> main, origin/main, origin/HEAD)
|        add colin
|
* b4fb697 17 minutes ago Colin Bernet
          Initial commit

Adding a remote repository

List the remotes connected to your local repository:

$ git remote -v
origin    git@github.com:<your_github_username>/datafrog_git_test.git (fetch)
origin    git@github.com:<your_github_username>/datafrog_git_test.git (push)

We see that git@github.com:<your_github_username>/datafrog_git_test.git is connected as a remote, with local name origin.

This was done automatically by git clone. 

You can also establish remote connections easily after the fact.

Add the official repository as another remote, with name colin:

git remote add colin git@github.com:cbernet/datafrog_git_test.git

If you print your remotes again, you now see:

colin    git@github.com:cbernet/datafrog_git_test.git (fetch)
colin    git@github.com:cbernet/datafrog_git_test.git (push)
origin    https://github.com/<your_github_username>/datafrog_git_test.git (fetch)
origin    https://github.com/<your_github_username>/datafrog_git_test.git (push)

You could remove this remote with git remote rm colin, but don't do it, we will need it.

And there is nothing special about the origin remote, you could remove it as well, and re-add it with a different name if you prefer, as long as you know its url (you can pick it up on its GitHub page).

Updating your version of the code

Before starting to work on a new feature, I suggest to always update to the latest version of the official code.

Since you just forked and cloned, your version of the code is probably identical to the version of the official repository. But you can't be sure, so let's go through the update process.

First, we fetch the state of the official repository:

git fetch colin

This command retrieves:

  • all the remote branches. Locally, the name of a remote branch contains the name of its repository, as in `origin/main`
  • all commits that are part of the history of these branches, so that the version at the tip of the branch can be reconstructed locally
  • all tags that point to one of these commits

Here is the git fetch documentation.


Important note :

git fetch does not modify the state of your local repository. It only gathers remote information. So this command is completely safe. Don't be afraid to use it often, so that you know what's going on remotely!


Before doing anything, check the history with git l to see what you're going to merge into your version.

Now that you have all the necessary information from the remote, you can merge the remote branch into your current branch:

git merge colin/main

After that, your current branch contains all commits of origin/main in its history. This is good, you can now start building your new feature on top of the official version.

You could also start developing your new feature before merging, it makes no difference.


Important note 2:

some people use git pull, which is the equivalent of a fetch followed by a merge. For example: 

git pull colin main

does the following under the hood:

git fetch colin                    # fetch only the main branch, with its commits and tags
git merge colin/main        # merge into current branch

I only do that if I exactly know what's on the remote branch. Usually, this is the case if I pull from my own remote. When I want to get code from another remote that I don't control, I always do a fetch, and I only merge when I know what I'm going to be merging.


After merging, test the code again. If the package contains unit tests (and it should), run them. If there are executable scripts, run them. Here, we can run our small script:

python hello.py

We're now ready to start developing a new feature.

Pushing your modifications to your remote 

In this section, you will modify hello.py to add your name.

Open this file with your editor, and add your name to the list everybody, as shown below:

    everybody = [
        'colin', 
        'your_name'
    ]
    great(everybody)

Test your changes by running the script. If the package features unit tests, run them.

Advice: if you can, make sure that the tests pass before each commit.

If the tests succeed, you can commit:

git commit -am 'added my name'

Check your commit history with git l :

* 45ae389 2 seconds ago First Last (HEAD -> main)
|        added my name
|
* bdfe1a4 4 days ago Colin Bernet (origin/main, origin/HEAD, colin/main)
|        add colin
|
* b4fb697 4 days ago Colin Bernet
          Initial commit

We see that the last commit is one commit above origin/main, and one commit above colin/main.

Push the main branch to your fork (the origin remote):

git push origin main

And use git l again to check the history:

* 45ae389 31 minutes ago First Last (HEAD -> main, origin/main, origin/HEAD)
|        added my name
|
* bdfe1a4 4 days ago Colin Bernet (colin/main)
|        add colin
|
* b4fb697 4 days ago Colin Bernet
          Initial commit

The push moved origin/main in your fork to the same commit as the local branch main. In the process, the needed commit (45ae389) was copied to your fork.

At this point, the of code of origin/main in the remote is exactly the same as the one of the local branch main, since both branches point to the same commit.

But the code of the official repository (colin/main) did not change.

We want to make our changes official, so it's time for a pull request to the official repository!

Contributing with a pull request

Go to your fork on github. You should see something like this:

pr1.png

In particular you can see the status of the main branch in the fork with respect to cbernet/main, which is the same branch in the official repository, the mother of your fork.

Click on Contribute and open a pull request.

This opens a new window:

pr2_compare.png

Take the time to review carefully all the information on this page:

  • we see that we are doing a pull request of the main branch of thedatafrog/datafrog_git_test (the source branch) to the main branch of cbernet/datafrog_git_test (the destination branch). You could change all this, for example if you wanted to do a PR to the fork of a colleague who is also working on this project.
  • the list of commits included in the PR is printed.
  • the changes that will be introduced are shown.

Then click on "Create pull request".

This brings you to a new form in which you must give a title to the pull request and provide some information as shown below. Then click on create pull request.

pr3_details.png

At this point:

  • you're brought to the page of the pull request in the official repository. This is where the pull request will be discussed, and finally either closed or rejected by the administrators
  • the administrators of the official repo get a notification that a new pull request appeared. They will review it and let you know on the PR page.

That's it, you're done!

The administrators might ask you to make changes to your pull request. To do this, you can simply update the source branch in your fork by:

  • committing changes locally to the local main branch
  • pushing this branch to your fork to update the source branch

When you do that, your new commits will be added to the PR automatically.

Guidelines for a successful pull request

You should always remember that repository administrators are people like you and I.

They are certainly quite busy, and they often maintain open source packages in their spare time, as a contribution to the community. Also, it could be that they don't even know you, let alone trust you.

So the first thing to do is to make sure that your PR is even wanted.

In any case, check the official repository to see if there are any instructions explaining how to contribute. If there are, follow them, or your PR will be ignored.

In general, I would suggest to do the following :

  • check the issues in the official repo. It could be that your point is already being discussed and addressed. If that's the case, you can enter the discussion and offer your help.
  • If necessary, open a new issue to discuss your point with the community, and possibly offer to help with a PR.
  • if that's accepted, you can start developing for your PR.

When you're done, and before you submit the PR, merge the latest changes from the official repo into your local repo, and check carefully that everything is ok :

  • the software must compile
  • it must run
  • if there are unit tests:
    • they should all pass. And you're not allowed to modify existing tests, unless you have a very good reason!
    • you should make sure that your new code is covered by the tests. Write new tests if necessary. If you don't know how to do that, check my post Fighting Bugs in Python.
  • if you're coding in python, make sure that you follow PEP8. To check this, you could use flake8 or rely on the style checker of your IDE.

After submitting the PR, be kind with the admins, and answer their questions as well as you can. Be reactive.

Conclusion and outlook

In this article, you have learnt how to collaborate with others using git remotes. You have :

  • set up your github account,
  • created your first pull request,
  • seen how to ensure the best chances of success for your pull requests.

Although we used github as a remote git platform, the process is going to be exactly the same on other platforms such as gitlab. The only thing that is going to change is the web interface of the platform.

In the next article, I will tell you about a few advanced features of git, stay tuned!


Please let me know what you think in the comments! I’ll try and answer all questions.

And if you liked this article, you can subscribe to my newsletter to be notified of new posts (no more than one mail per week I promise.)

Back Home