How to use git as a client for the CVS server
Toon Verstraelen
Toon.Ver... at UGent.be
Thu Oct 28 16:10:21 UTC 2010
Hi all,
I've explained the subject to a few colleagues so far, and it may be of
interest for others. This (long) mail is also an attempt to avoid explaining
the same thing over and over again. Have fun.
cheers,
Toon
1. Summary
==========
Disclaimer
----------
This is is not an attempt to convince the entire CP2K community to ditch CVS
and use git instead. This mail just explains how to avoid using CVS, and is
written for those who share the believe that CVS is a crappy program to keep
track of a source code history. I've included a few benchmarks as a light form
advocacy, but in the end it is all up to you.
Problem
-------
CVS is slow for large projects like CP2K and has little and impractical
features to to play with branches and experimental versions.
Solution
--------
Maintain a local git mirror of the CVS history. This is also a convenient
place to keep tracks of your own patches before they go into the official CVS.
Once you have a set of patches for a stable implementation of a new feature,
use 'git rebase' to apply these patches to the latests revision from CVS. Then
send them to someone who has write access to the CVS thing. (Preferentially
that person can also do some reviewing.)
There are many alternatives to CVS. Git is the fastest and has good support
for branches and distributed development. It is designed to manage large
projects such as the Linux kernel.
Links
-----
Git homepage: http://git-scm.com/
Git web interface: https://git.wiki.kernel.org/index.php/Gitweb
Bare-bones GUI: http://www.kernel.org/pub/software/scm/git/docs/gitk.html
Free public git hosting: http://github.com/ http://gitorious.org/
2. Details
==========
I'll discuss the cvs-to-git conversion in the end. For now, we'll just start
from my public cp2k mirror on github.com. I assume you know how to install git
and gitk on your OS and I also assume that OS is a unix.
Cloning a repository
--------------------
Download the entire history of the CP2K source code:
git clone git://github.com/tovrstra/cp2k.git
Some notes:
- A directory cp2k is created with the latest master branch.
- You also get the history with all patches. This is stored in cp2k/.git/.
Fire up gitk
------------
cd cp2k
gitk &
Some notes:
- The graphical interface is very light and suitable for remote X.
- With the '--all' options one sees all branches.
- There are some menu items in gitk to prepare commits etc, but I recommend
using the conventional command-line interface of git instead.
Switch branches
---------------
My cp2k repository also has a branch where some CVS-specific parts in the
Makefile are replaced by their git counterparts. One switches the working
directory to this branch as follows:
git checkout cp2k-git
Compilation is done as usual. One gets an overview of all branches as follows:
git branch
Include branch name in the shell prompt
---------------------------------------
Playing with different branches quickly becomes confusing. One may use the
following PS1 variable to include the branch name in the shell prompt.
GITPS1='$(__git_ps1 ":%s")'
export PS1="\u@\h \w${GITPS1}> "
or with fancy colors (designed for a dark background)
GITPS1='$(__git_ps1 ":%s")'
GREEN="\[\033[1;32m\]"
BLUE="\[\033[1;34m\]"
YELLOW="\[\033[1;33m\]"
RS="\[\033[00m\]"
export PS1="${GREEN}\u@\h${RS} ${BLUE}\w${RS}${YELLOW}${GITPS1}${BLUE}>${RS} "
The prompt will look like this:
toon at molmod49 ~/cp2k:cp2k-git>
This is not mandatory, but it makes working with branches a lot easier.
Configure git
-------------
Add these sections to ~/.gitconfig (with your own personal info).
[user]
name = Toon Verstraelen
email = Toon.Ver... at UGent.be
[color]
diff = always
status = always
interactive = always
branch = always
This is not mandatory, but again very convenient.
Create a new branch to store your patches
-----------------------------------------
Instead of adding patches to the cp2k-git branch, it is safer to add them to a
new private branch. All commits remain local anyway untill you run 'git push
some-repo'. The new branch initially only exists in your local copy of the
cp2k repository.
toon at molmod49 ~/cp2k:cp2k-git> git branch myhack
toon at molmod49 ~/cp2k:cp2k-git> git checkout myhack
toon at molmod49 ~/cp2k:myhack>
This can also be done in one step.
toon at molmod49 ~/cp2k:cp2k-git> git checkout -b myhack
toon at molmod49 ~/cp2k:myhack>
Add a patch
-----------
The example here is just a fix for a trivial typo in the input documentation.
On line 232 of the file input_cp2k_mm.F there is a white-space missing in the
end of the string. It is also convenient to wrap lines at 80 characters
because this is the default width of a text terminal. After making the
changes, they can be reviewed:
toon at molmod49 ~/cp2k:myhack> git diff
diff --git a/src/input_cp2k_mm.F b/src/input_cp2k_mm.F
index 26b37f5..200e000 100644
--- a/src/input_cp2k_mm.F
+++ b/src/input_cp2k_mm.F
@@ -320,8 +320,8 @@ CONTAINS
CALL keyword_release(keyword,error=error)
!Universal scattering potential at very short distances
CALL keyword_create(keyword, name="ZBL_SCATTERING",&
- description="A short range repulsive potential is added, to
simulate"//&
- "collisions and scattering.",&
+ description="A short range repulsive potential is added, to "//&
+ "simulate collisions and scattering.",&
usage="ZBL_SCATTERING
T",default_l_val=.FALSE.,lone_keyword_l_val=.TRUE.,&
error=error)
CALL section_add_keyword(section,keyword,error=error)
The minus-lines are colored in red and the plus-lines in green. After testing
the patch -- a simple compilation is sufficient for this -- one can commit the
changes to the repository. This is typically done in two stages. One first
adds the changes to an intermediate stage, called the index. Once the index is
OK, it is actually committed. This two-step approach is convenient when
working with more complex patches.
Add the file to the index:
toon at molmod49 ~/cp2k:myhack> git add src/input_cp2k_mm.F
Commit it:
toon at molmod49 ~/cp2k:myhack> git commit
An editor will appear in which writes a few notes. The first line is a short
summary, optionally followed by an empty line and a longer discussion.
The two steps can be done in one command if there are only modifications to
existing files:
toon at molmod49 ~/cp2k:myhack> git commit src/input_cp2k_mm.F
or
toon at molmod49 ~/cp2k:myhack> git commit -a
If there is only one line in the commit message, it can be given on the
command line:
toon at molmod49 ~/cp2k:myhack> git commit -a -m 'Fixed typo'
Keep repeating this with all the things you want to change. Keep commits as
small as possible and test them. One can look back at the commit history with
gitk or 'git log'.
Rebase patches to the latest master
-----------------------------------
In practice it takes some time before a set of patches is finished and often
the CVS master branches evolves in the meantime. I occasionally synchronize my
git repo and apply the cp2kgit patch on top. It is recommended to apply your
patches to the latest version too. This is typically a painful job, but with
'git rebase' it becomes trivial.
First update your local mirror of the repository:
toon at molmod49 ~/cp2k:myhack> git checkout cp2k-git
toon at molmod49 ~/cp2k:cp2k-git> git pull origin cp2k-git:cp2k-git
(some progress output)
Some notes:
- origin refers to the git hub repository. It is the default shorthand for the
repository that was used with 'git clone'.
- cp2k-git:cp2k-git is optional. It means that the remote cp2k-git branch is
used to update the local cp2k-git branch.
Then rebase your patches:
toon at molmod49 ~/cp2k:cp2k-git> git checkout myhack
toon at molmod49 ~/cp2k:myhack> git rebase cp2k-git
In this case the patch is so small that you will probably not have to
intervene manually, unless somebody changed exactly the same two lines or the
the six surrounding lines. In more complex cases 'git rebase' will stop when
it encounters a doubtful situation. Some instructions are given such that you
can easily modify the problematic patch and continue the rebase process.
Sending patches by email
------------------------
Once a set of patches is ready, they can be prepared for email as follows:
toon at molmod49 ~/cp2k:myhack> git format-patch -1
0001-Fixed-typo.patch
The -1 option indicates the number of patch files to be created. Put these
files an a compressed archive and send the archive to somebody with CVS write
access. They'll know what to do with it.
A few benchmarks
----------------
diffs
^^^^^
The diff is executed after making the changes in the above example.
time git diff &> /dev/null
real 0m0.017s
user 0m0.000s
sys 0m0.010s
time cvs diff &> /dev/null
real 0m1.484s
user 0m0.040s
sys 0m0.040s
This is just a small patch, but with large patches the benchmarks become more
dramatic. Because most CP2K developers dig on speed, I guess two order of
magnitude will be appreciated. Similar speedups can be found with other
commands that git and CVS have in common.
Cloning
^^^^^^^
This is a special benchmark. A complete repository clone is not really
supported in CVS. Therefore I compare a 'git clone' with the a CVS checkout
instead. The latter is a much lighter operation.
time cvs -z3 -d:pserver:anonymous at cvs.cp2k.berlios.de:/cvsroot/cp2k co cp2k
(lots of output)
real 0m13.987s
user 0m2.040s
sys 0m1.480s
time git clone git://github.com/tovrstra/cp2k.git
(some output)
real 0m20.315s
user 0m8.910s
sys 0m1.240s
The comparison is a bit difficult as it is mainly determined by the hosting
server. Note that git downloads all revision, while CVS only gives you the
latest version. This is just to show that there is no practical problem with
cloning entire repositories in git. The storage is also remarkably compact
du -sh cp2k/.git/
59M
This is less than the tar file with the CVSROOT.
CVS to git migration
--------------------
This does not always go smooth. It turns out that CVS does not keep an
accurate history of all patches and metadata, and that it may be difficult to
convert all this information to a revision system with a proper storage
backend. The tigris community has developed a complex batch script that tries
to make the best out of it. More info can be found here:
http://cvs2svn.tigris.org/
They also have a cvs2git script. In case of CP2K it is used as follows:
wget http://download.berlios.de/cvstarballs/cp2k-cvsroot.tar.gz
tar -xvzf cp2k-cvsroot.tar.gz
mkdir cvs
mkdir git2
cd git2
cvs2git ../cvs/cp2k --blobfile=cp2kblob --dumpfile=cp2kdump --username=fubar
mkdir cp2k
cd cp2k
git init
cat ../cp2kblob ../cp2kdump | git fast-import
CVS to git migration bis
------------------------
One can also use the 'git cvsimport' script. It is somewhat simpler and can
also update a git mirror with the latest changes in a CVS repository. The
first time one has to do a full conversion:
wget http://download.berlios.de/cvstarballs/cp2k-cvsroot.tar.gz
tar -xvzf cp2k-cvsroot.tar.gz
mkdir cvs
mkdir git
mv cp2k cvs
cd git
git cvsimport -d ${PWD}/../cvs/cp2k cp2k
Later on, one can do updates:
git cvsimport -d:pserver:anonymous at cvs.cp2k.berlios.de:/cvsroot/cp2k cp2k
The update step may rarely fail. It is recommended to update in a dedicated
working directory and to use 'git push' to send the new patches to a separate
mirror. There seems to be little general interest for this update feature.
Most people convert just once and never look back.
A downside of git
-----------------
Git uses sha hashes to label commits (and other things). This has many
technical advantages, but it is not as intuitive as a the simple version
numbers that CVS or SVN use. Good one-line summaries and proper use of 'git
tag' solve this issue mostly.
More information about the CP2K-user
mailing list