Git tutorial: Get started with Git version control

Learn how Git manages versions and how to install the necessary software to access Git servers where your software project will be stored

Git tutorial: Get started with Git version control
Thinkstock

This article introduces you to Git, including how to install the necessary software to access Git servers where your software project will be stored.

Version control concepts

To understand Git and the concept of version control, looking at version control from an historical perspective is helpful. There have been three generations of version control software.

The first generation

The first generation was very simple. Developers worked on the same physical system and “checked out” one file at a time.

This generation of version control software made use of a technique called file locking. When a developer checked out a file, it was locked so no other developer could edit the file. 

Examples of first-generation version control software include Revision Control System (RCS) and Source Code Control System (SCCS).

The second generation

The problems with the first generation included the following:

  • Only one developer could work on a file at a time. This resulted in a bottleneck in the development process.

  • Developers had to log in directly to the system that contained the version control software.

These problems were solved in the second generation of version control software. In the second generation, files are stored on a centralized server in a repository. Developers can check out separate copies of a file. When the developer completes work on a file, the file is checked in to the repository. 

If two developers check out the same version of a file, then the potential for issues exists. This is handled by a process called a merge.

What is a merge? Suppose two developers, Bob and Sue, check out version 5 of a file named abc.txt. After Bob completes his work, he checks the file back in. Typically, this results in a new version of the file, version 6.

Sometime later, Sue checks in her file. This new file must incorporate her changes and Bob’s changes. This is accomplished through the process of a merge.

Depending on the version control software that you use, there could be different ways to handle this merge. In some cases, such as when Bob and Sue have worked on completely different parts of the file, the merge process is very simple. However, in cases in which Sue and Bob worked on the same lines of code in the file, the merge process can be more complex. In those cases, Sue will have to make decisions, such as whether Bob’s code or her code will be in the new version of the file.

After the merge process completes, the process of committing the file to the repository takes place. To commit a file essentially means to create a new version in the repository; in this case, version 7 of the file.

Examples of second-generation version control software include Concurrent Versions System (CVS) and Subversion.

The third generation

The third generation is referred to as distributed version control systems (DVCSs). As with the second generation, a central repository server contains all of the files for the project. However, developers don’t check out individual files from the repository. Instead, the entire project is checked out, allowing the developer to work on the complete set of files rather than just individual files. 

Another (very big) difference between the second and third generation of version control software has to do with how the merge and commit process works. As previously mentioned, the steps in the second generation are to perform a merge and then commit the new version to the repository.

With third-generation version control software, files are checked in and then they are merged. 

For example, let’s say two developers check out a file that is based on the third version. If one developer checks that file in, resulting in a version 4 of the file, the second developer must first merge the changes from his checked-out copy with the changes of version 4 (and, potentially, other versions). After the merge is complete, the new version can be committed to the repository as version 5.

If you focus on what is in the repository (the center part of each phase), you see that there is a very straight line of development (ver1, ver2, ver3, ver4, ver5, and so on). This simple approach to software development poses some potential problems:

  • Requiring a developer to merge before committing often results in developers’ not wanting to commit their changes on a regular basis. The merge process can be a pain and developers might decide to just wait until later and do one merge rather than a bunch of regular merges. This has a negative impact on software development as suddenly huge chunks of code are added to a file. Additionally, you want to encourage developers to commit changes to the repository, just like you want to encourage someone who is writing a document to save on a regular basis.
  • Very important: Version 5 in this example is not necessarily the work that the developer originally completed. During the merging process, the developer might discard some of his work to complete the merge process. This isn’t ideal because it results in the loss of potentially good code.

A better, although arguably more complex, technique can be used. It is called directed acyclic graph (DAG).

Picture the same scenario as above, where two developers check out version 3 of a file. Here, if one developer checks that file in, it still results in a version 4 of the file. However, the second check-in process results in a version 5 file that is not based on version 4, but rather independent of version 4. In the next stage of the process, versions 4 and 5 of the file are merged to create a version 6.

Although this process is more complex (and, potentially, much more complex if you have a large number of developers), it does provide some advantages over a single line of development:

  • Developers can commit their changes on a regular basis and not have to worry about merging until a later time.
  • The merging process could be delegated to a specific developer who has a better idea of the entire project or code than the other developers have.
  • At any time, the project manager can go back and see exactly what work each individual developer created.

Certainly an argument exists for both methods. However, keep in mind that this article focuses on Git, which uses the directed acyclic graph method of third-generation version control systems.

Installing Git

You might already have Git on your system because it is sometimes installed by default (or another administrator might have installed it). If you have access to the system as a regular user, you can execute the following command to determine whether you have Git installed:

ocs@ubuntu:~$ which git
/usr/bin/git

If Git is installed, then the path to the git command is provided, as shown in the preceding command. If it isn’t installed, then you either get no output or an error like the following:

[ocs@centos ~]# which git
/usr/bin/which: no git in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/local/sbin:/usr/
bin:/usr/sbin:/bin:/sbin:/root/bin)

As an administrator on a Debian-based system, you can use the dpkg command to determine whether the Git package has been installed:

root@ubuntu:~# dpkg -l git
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/
➥Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name     Version       Architecture  Description
+++-========-=============-=============-========================================
ii  git      1:1.9.1-1ubun amd64         fast, scalable, distributed
➥revision con

As an administrator on a Red Hat–based system, you could use the rpm command to determine whether the git package has been installed:

[root@centos ~]# rpm -q git
git-1.8.3.1-6.el7_2.1.x86_64

If Git isn’t installed on your system, you must either log in as the root user or use sudo or su to install the software. If you are logged in as the root user on a Debian-based system, you can use the following command to install Git:

apt-get install git

If you are logged in as the root user on a Red Hat–based system, you can use the following command to install Git:

yum install git

Git concepts and features

One of the challenges to using Git is just understanding the concepts behind it. If you don’t understand the concepts, then all the commands just seem like some sort of black magic. This section focuses on the critical Git concepts as well as introduces you to some of the basic commands.

Git stages

It is very important to remember that you check out an entire project and that most of the work you do will be local to the system that you are working on. The files that you check out will be placed in a directory under your home directory.

To get a copy of a project from a Git repository, you use a process called cloning. Cloning doesn’t just create a copy of all the files from the repository; it actually performs three primary functions:

  • Creates a local repository of the project under the project_name/.git directory in your home directory. The files of the project in this location are considered to be checked out from the central repository.
  • Creates a directory where you can directly see the files. This is called the working area. Changes made in the working area are not immediately version controlled.
  • Creates a staging area. The staging area is designed to store changes to files before you commit them to the local repository.

This means that if you were to clone a project called Jacumba, the entire project would be stored in the Jacumba/.git directory under your home directory. You should not try to modify these directly. Instead, look directly in the ~/Jacumba directory tol see the files from the project. These are the files that you should change.

Suppose you made a change to a file, but you have to work on some other files before you were ready to commit changes to the local repository. In that case, you would stage the file that you have finished working on. This would prepare it to be committed to the local repository.

After you make all changes and stage all files, then you commit them to the local repository. 

Realize that committing the staged files only sends them to the local repository. This means that only you have access to the changes that have been made. The process of checking in the new versions to the central repository is called a push.

Choosing your Git repository host

First, the good news: Many organizations provide Git hosting—at the time of this writing, there are more than two dozen choices. This means you have many options to choose from. That’s the good news … and the bad news.

It is only bad news because it means you really need to spend some time researching the pros and cons of the hosting organizations. For example, most don’t charge for basic hosting but do charge for large-scale projects. Some only provide public repositories (anyone can see your repository) whereas others let you create private repositories. There are many other features to consider.

One feature that might be high on your list is a web interface. Although you can do just about all repository operations locally on your system, being able to perform some operations via a web interface can be very useful. Explore the interface that is provided before making your choice.

At the very least, I recommend considering the following:

Note that I chose Gitlab.com for the examples below. Any of the hosts in the preceding list would have worked just as well; I chose Gitlab.com simply because it happened to be the one I used on my last Git project.

Configuring Git

Now that you have gotten through all the theory, it is time to actually do something with Git. This next section assumes the following:

  • You have installed the git or git-all software package on your system.
  • You have created an account on a Git hosting service.

The first thing you want to do is perform some basic setup. Whenever you perform a commit operation, your name and email address will be included in the metadata. To set this information, execute the following commands:

ocs@ubuntu:~$ git config --global user.name "Bo Rothwell"
ocs@ubuntu:~$ git config --global user.email "bo@onecoursesource.com"

Obviously you will replace "Bo Rothwell" with your name and "bo@OneCourseSource.com" with your email address. The next step is to clone your project from the Git hosting service. Note that before cloning, only one file is in the user's home directory:

ocs@ubuntu:~$ ls
first.sh

The following cloned a project named ocs:

ocs@ubuntu:~$ git clone https://gitlab.com/borothwell/ocs.gi
Cloning into 'ocs'...
Username for 'https://gitlab.com': borothwell
Password for 'https://borothwell@gitlab.com':
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
Checking connectivity... done.

After successful execution, notice a new directory in the user’s home directory:

ocs@ubuntu:~$ ls
first.sh  ocs

If you switch to the new directory, you can see what was cloned from the repository (only one file so far exists in the repository):

ocs@ubuntu:~$ cd ocs
ocs@ubuntu:~/ocs$ ls
README.md
1 2 Page 1
Page 1 of 2