How to use git as a client for the CVS server

Toon Verstraelen Toon.Ver... at UGent.be
Thu Oct 28 16:10:21 UTC 2010


Hi all,

I've explained the subject to a few colleagues so far, and it may be of 
interest for others. This (long) mail is also an attempt to avoid explaining 
the same thing over and over again. Have fun.

cheers,

Toon




1. Summary
==========


Disclaimer
----------

This is is not an attempt to convince the entire CP2K community to ditch CVS 
and use git instead. This mail just explains how to avoid using CVS, and is 
written for those who share the believe that CVS is a crappy program to keep 
track of a source code history. I've included a few benchmarks as a light form 
advocacy, but in the end it is all up to you.


Problem
-------

CVS is slow for large projects like CP2K and has little and impractical 
features to to play with branches and experimental versions.


Solution
--------

Maintain a local git mirror of the CVS history. This is also a convenient 
place to keep tracks of your own patches before they go into the official CVS. 
Once you have a set of patches for a stable implementation of a new feature, 
use 'git rebase' to apply these patches to the latests revision from CVS. Then 
send them to someone who has write access to the CVS thing. (Preferentially 
that person can also do some reviewing.)

There are many alternatives to CVS. Git is the fastest and has good support 
for branches and distributed development. It is designed to manage large 
projects such as the Linux kernel.


Links
-----

Git homepage: http://git-scm.com/
Git web interface: https://git.wiki.kernel.org/index.php/Gitweb
Bare-bones GUI: http://www.kernel.org/pub/software/scm/git/docs/gitk.html
Free public git hosting: http://github.com/ http://gitorious.org/



2. Details
==========

I'll discuss the cvs-to-git conversion in the end. For now, we'll just start 
from my public cp2k mirror on github.com. I assume you know how to install git 
and gitk on your OS and I also assume that OS is a unix.


Cloning a repository
--------------------

Download the entire history of the CP2K source code:

git clone git://github.com/tovrstra/cp2k.git

Some notes:
- A directory cp2k is created with the latest master branch.
- You also get the history with all patches. This is stored in cp2k/.git/.


Fire up gitk
------------

cd cp2k
gitk &

Some notes:
- The graphical interface is very light and suitable for remote X.
- With the '--all' options one sees all branches.
- There are some menu items in gitk to prepare commits etc, but I recommend 
using the conventional command-line interface of git instead.


Switch branches
---------------

My cp2k repository also has a branch where some CVS-specific parts in the 
Makefile are replaced by their git counterparts. One switches the working 
directory to this branch as follows:

git checkout cp2k-git

Compilation is done as usual. One gets an overview of all branches as follows:

git branch


Include branch name in the shell prompt
---------------------------------------

Playing with different branches quickly becomes confusing. One may use the 
following PS1 variable to include the branch name in the shell prompt.

GITPS1='$(__git_ps1 ":%s")'
export PS1="\u@\h \w${GITPS1}> "

or with fancy colors (designed for a dark background)

GITPS1='$(__git_ps1 ":%s")'
GREEN="\[\033[1;32m\]"
BLUE="\[\033[1;34m\]"
YELLOW="\[\033[1;33m\]"
RS="\[\033[00m\]"
export PS1="${GREEN}\u@\h${RS} ${BLUE}\w${RS}${YELLOW}${GITPS1}${BLUE}>${RS} "

The prompt will look like this:

toon at molmod49 ~/cp2k:cp2k-git>

This is not mandatory, but it makes working with branches a lot easier.


Configure git
-------------

Add these sections to ~/.gitconfig (with your own personal info).

[user]
     name = Toon Verstraelen
     email = Toon.Ver... at UGent.be

[color]
     diff = always
     status = always
     interactive = always
     branch = always

This is not mandatory, but again very convenient.


Create a new branch to store your patches
-----------------------------------------

Instead of adding patches to the cp2k-git branch, it is safer to add them to a 
new private branch. All commits remain local anyway untill you run 'git push 
some-repo'. The new branch initially only exists in your local copy of the 
cp2k repository.

toon at molmod49 ~/cp2k:cp2k-git> git branch myhack
toon at molmod49 ~/cp2k:cp2k-git> git checkout myhack
toon at molmod49 ~/cp2k:myhack>

This can also be done in one step.

toon at molmod49 ~/cp2k:cp2k-git> git checkout -b myhack
toon at molmod49 ~/cp2k:myhack>


Add a patch
-----------

The example here is just a fix for a trivial typo in the input documentation.
On line 232 of the file input_cp2k_mm.F there is a white-space missing in the 
end of the string. It is also convenient to wrap lines at 80 characters 
because this is the default width of a text terminal. After making the 
changes, they can be reviewed:

toon at molmod49 ~/cp2k:myhack> git diff
diff --git a/src/input_cp2k_mm.F b/src/input_cp2k_mm.F
index 26b37f5..200e000 100644
--- a/src/input_cp2k_mm.F
+++ b/src/input_cp2k_mm.F
@@ -320,8 +320,8 @@ CONTAINS
         CALL keyword_release(keyword,error=error)
         !Universal scattering potential at very short distances
         CALL keyword_create(keyword, name="ZBL_SCATTERING",&
-            description="A short range repulsive potential is added, to 
simulate"//&
-            "collisions and scattering.",&
+            description="A short range repulsive potential is added, to "//&
+            "simulate collisions and scattering.",&
              usage="ZBL_SCATTERING 
T",default_l_val=.FALSE.,lone_keyword_l_val=.TRUE.,&
              error=error)
         CALL section_add_keyword(section,keyword,error=error)

The minus-lines are colored in red and the plus-lines in green. After testing 
the patch -- a simple compilation is sufficient for this -- one can commit the 
changes to the repository. This is typically done in two stages. One first 
adds the changes to an intermediate stage, called the index. Once the index is 
OK, it is actually committed. This two-step approach is convenient when 
working with more complex patches.

Add the file to the index:

toon at molmod49 ~/cp2k:myhack> git add src/input_cp2k_mm.F

Commit it:

toon at molmod49 ~/cp2k:myhack> git commit

An editor will appear in which writes a few notes. The first line is a short 
summary, optionally followed by an empty line and a longer discussion.

The two steps can be done in one command if there are only modifications to 
existing files:

toon at molmod49 ~/cp2k:myhack> git commit src/input_cp2k_mm.F

or

toon at molmod49 ~/cp2k:myhack> git commit -a

If there is only one line in the commit message, it can be given on the 
command line:

toon at molmod49 ~/cp2k:myhack> git commit -a -m 'Fixed typo'

Keep repeating this with all the things you want to change. Keep commits as 
small as possible and test them. One can look back at the commit history with 
gitk or 'git log'.


Rebase patches to the latest master
-----------------------------------

In practice it takes some time before a set of patches is finished and often 
the CVS master branches evolves in the meantime. I occasionally synchronize my 
git repo and apply the cp2kgit patch on top. It is recommended to apply your 
patches to the latest version too. This is typically a painful job, but with 
'git rebase' it becomes trivial.

First update your local mirror of the repository:

toon at molmod49 ~/cp2k:myhack> git checkout cp2k-git
toon at molmod49 ~/cp2k:cp2k-git> git pull origin cp2k-git:cp2k-git
(some progress output)

Some notes:
- origin refers to the git hub repository. It is the default shorthand for the 
repository that was used with 'git clone'.
- cp2k-git:cp2k-git is optional. It means that the remote cp2k-git branch is 
used to update the local cp2k-git branch.

Then rebase your patches:

toon at molmod49 ~/cp2k:cp2k-git> git checkout myhack
toon at molmod49 ~/cp2k:myhack> git rebase cp2k-git

In this case the patch is so small that you will probably not have to 
intervene manually, unless somebody changed exactly the same two lines or the 
the six surrounding lines. In more complex cases 'git rebase' will stop when 
it encounters a doubtful situation. Some instructions are given such that you 
can easily modify the problematic patch and continue the rebase process.


Sending patches by email
------------------------

Once a set of patches is ready, they can be prepared for email as follows:

toon at molmod49 ~/cp2k:myhack> git format-patch -1
0001-Fixed-typo.patch

The -1 option indicates the number of patch files to be created. Put these 
files an a compressed archive and send the archive to somebody with CVS write
access. They'll know what to do with it.


A few benchmarks
----------------

diffs
^^^^^

The diff is executed after making the changes in the above example.

time git diff &> /dev/null

real	0m0.017s
user	0m0.000s
sys	0m0.010s

time cvs diff &> /dev/null

real	0m1.484s
user	0m0.040s
sys	0m0.040s

This is just a small patch, but with large patches the benchmarks become more 
dramatic. Because most CP2K developers dig on speed, I guess two order of 
magnitude will be appreciated. Similar speedups can be found with other 
commands that git and CVS have in common.

Cloning
^^^^^^^

This is a special benchmark. A complete repository clone is not really 
supported in CVS. Therefore I compare a 'git clone' with the a CVS checkout 
instead. The latter is a much lighter operation.

time cvs -z3 -d:pserver:anonymous at cvs.cp2k.berlios.de:/cvsroot/cp2k co cp2k
(lots of output)
real	0m13.987s
user	0m2.040s
sys	0m1.480s

time git clone git://github.com/tovrstra/cp2k.git
(some output)
real	0m20.315s
user	0m8.910s
sys	0m1.240s

The comparison is a bit difficult as it is mainly determined by the hosting 
server. Note that git downloads all revision, while CVS only gives you the 
latest version. This is just to show that there is no practical problem with 
cloning entire repositories in git. The storage is also remarkably compact

du -sh cp2k/.git/
59M

This is less than the tar file with the CVSROOT.


CVS to git migration
--------------------

This does not always go smooth. It turns out that CVS does not keep an 
accurate history of all patches and metadata, and that it may be difficult to 
convert all this information to a revision system with a proper storage 
backend. The tigris community has developed a complex batch script that tries 
to make the best out of it. More info can be found here:

http://cvs2svn.tigris.org/

They also have a cvs2git script. In case of CP2K it is used as follows:

wget http://download.berlios.de/cvstarballs/cp2k-cvsroot.tar.gz
tar -xvzf cp2k-cvsroot.tar.gz
mkdir cvs
mkdir git2
cd git2
cvs2git ../cvs/cp2k --blobfile=cp2kblob --dumpfile=cp2kdump --username=fubar
mkdir cp2k
cd cp2k
git init
cat ../cp2kblob ../cp2kdump | git fast-import


CVS to git migration bis
------------------------

One can also use the 'git cvsimport' script. It is somewhat simpler and can 
also update a git mirror with the latest changes in a CVS repository. The 
first time one has to do a full conversion:

wget http://download.berlios.de/cvstarballs/cp2k-cvsroot.tar.gz
tar -xvzf cp2k-cvsroot.tar.gz
mkdir cvs
mkdir git
mv cp2k cvs
cd git
git cvsimport -d ${PWD}/../cvs/cp2k cp2k

Later on, one can do updates:

git cvsimport -d:pserver:anonymous at cvs.cp2k.berlios.de:/cvsroot/cp2k cp2k

The update step may rarely fail. It is recommended to update in a dedicated 
working directory and to use 'git push' to send the new patches to a separate 
mirror. There seems to be little general interest for this update feature. 
Most people convert just once and never look back.


A downside of git
-----------------

Git uses sha hashes to label commits (and other things). This has many 
technical advantages, but it is not as intuitive as a the simple version 
numbers that CVS or SVN use. Good one-line summaries and proper use of 'git 
tag' solve this issue mostly.





More information about the CP2K-user mailing list