If you have been following me, I’ve mentioned numerous time about this software called Git.
You’ll find more information about what it is on their web page mentioned above or perhaps, you can checkout its Wikipedia article, if you prefer. Simply put, Git is distributed version control system.
From mid-2008, I have been using Git to manage my work related data. I’m being sole user of the repository I use and manage, but even with that application, I find this very useful.
Why Git is So Cool?
I have been using Git to manage anything ranging from 1KB of text data to couple-hundred megabytes of submitted video game build files, and it has been very reliable and fast. Subversion somewhat did this right, but this wasn’t scalable for my use (I will explain about its scalability later), and Mercurial just didn’t work very well when it comes to large file introduced on the repository. (at least on Windows.) Git would allow me to check in any of those files with ease.
It’s Reliable and Scalable!
I started using Git on a single machine, then eventually cloned repository for backup, then I also cloned it for off-site backup. With Git (and many of distributed version control system) one repository is not superior to other repository. If you clone the repository, the entire history of that repository will be copied locally as well. It just involves simple command and securely done over SSH as well. This is real beauty of distributed version control system, crash on any system participating is non-event. If I have to replace, say hard drive on one machine, I’d just replace the hard drive, clone from one of existing machine, and I’m back to to business. Because I work at home as well, I have at least four copies of repositories are available anytime, in which two are acting as “hub” repository, which makes it very reliable. (Not only data on four machines have to be destroyed, all backup of those repository, which is created routinely on each hub repository, need to be destroyed to me to lose data under this system.) Git also provides me high reliability in terms of data integrity. If data contained in repository is corrupted, I will know immediately as any data contained in the repository would be checked against SHA1. This characteristic, just like many version control systems, also helps save bandwidth, as anything that needs to be updated will be transferred.
Because I deal with different kind of data coming in from a lot of different places, I put each of those into its own branch. With this system, anything submitted externally are placed on its own branch, then merged to my personal branch. This way, I can keep healthy separation of my history and others. Since a lot of files are deal are binaries, merging rarely happens with me. But being able to deal with branch very casually (with many of other systems I’ve tried, including Subversion and CVS, branch is very messy process that I didn’t want to deal with…) makes me to organize great deal of information with ease. Today, I dealt with three different branches for example. Gitk program that comes with Git displays really nice and satisfying tree of my project, too. (it displays the graph resembling subway map; it is in the way, the subway map, leading to the same destination — each commit being the station.
Well, so those are my rhetoric to Git, and for any one I’m looking for some solution to version control, I’d highly recommend Git. Even if you’d be sole user of the repository like me, it will be still useful. I have been losing quite a bit of data every time I have had problem and had to reinstall the OS, but since I have started using Git, I lost none of my work data across reconstruction of my working environment. It’s because Git makes checking into other machines (hence making backup) simple and flexible. (Recovery goes very quick, too; even with 5 months of mass data, it’s about 2.8GB — it is also possible to “shallow-copy” the repository, if quick access is needed.)