VCS for non-developers

Section: Programming

The following is an explanation of Version Control Systems (VCS) that I first wrote to explain the background and reasons for Version Control to my brother. It aims to show how we got to where we have done, and why version control is a good idea for even simple work.

It is important to note that two similar but different concepts are used here:

Versioning
Recording what changes have been made to a file
Backup
Keeping a separate copy that lets you recover your work if it is deleted

The distinction is important - one does not necessarily imply the other (although all of the systems discussed support both in some way).

Version Control systems are common in software development (at least in software development that has any value, and even in low-value development that is done properly) but are less common outside software development. They can still have their uses, though, or people who are not software developers can occasionally write scripts and utilities to make their lives easier. In these cases, version control is a good idea.

The History of Version Control

Once upon a time, dinosaurs walked the Earth and people didn't do any version control (or they did it by copying the folder and calling it "project-old", then "project-newer", etc). Projects were lost, time was wasted, and everything was generally out of control.

Cave men then developed stone tools, iron tools, and eventually a "proper" version control system. One of the fairly early ones was CVS. It was hideous and lacking, but it did the job better than copying folders. It also ran on a server. That gave it the advantage of doing both versioning (so you could go back on a change, or see who made what change at what time) and backup (because it was on another machine). Its biggest failing was that each file was individually versioned, so the "latest" version of your code might have one file at v1.3 and another might be at v1.6. This made it more difficult to reliably get all of the files to match.

Having files from different times in the history of the work might not seem bad, but it is like having the first draft of chapters one, three and nine from a book and then having the rest of the book made of the final, edited and reworked chapters - nothing ties up and it doesn't always make sense!

Next came a group of people who wanted to write "a better CVS" and so came Subversion. It was better, but it was like building a better chocolate teapot - what you really want is a better teapot, not a better chocolate teapot! It had sensible ideas, like "global version numbers" that meant you could get "revision 29" and get all files from the same point, but it was still centralised.

In the more modern world where people were working at different locations then centralised servers meant that:

  1. working offline/away from the office meant working without version control
  2. working on something experimental ended up with:
    • no version control
    • its own version control that was completely separate from the original history (so you couldn't look back before the point you started experimenting)
    • its own branch that lived forever in the master repository

People eventually realised that centralising version control wasn't necessary and that decentralising it so that everyone had their own local repository with its own local history was good. Thus was born DVCS (Distributed VCS) like Git and Mercurial.

The now - Distributed Version Control Systems

In the DVCS world, there is no "master server" as there was with CVS/Subversion. As you saw, the default is that you only have a local copy. That gives you versioning, but not backup. If you make a change that goes wrong and you can't work out how to undo what you did then you can recover from it by going back in your history and getting the previous version. If you delete a file by accident then you can recover it by going back in your history and getting the last version you stored. If you want to know what you did and when you changed something then you can look back in the history.

All of that history is stored with your files - Mercurial stores it all in a folder called ".hg", which is in the top folder of your code (wherever you "initialised" the project). That gives you the full history of all changes that you made, but if you delete the .hg folder then all is lost. You need another copy of the code for a backup.

As there isn't a single "master" server, any of the copies of the repository can be treated as the Master. In many instances, people do put the master repository on a server (like all of my projects do) but that is entirely arbitrary - I could put copies somewhere else and declare that those other copies are the Master at any point and people could adapt.

The other big advantage (mainly for teams) is that DVCS necessitated improved merging of code. You're less likely to need it, but because everyone can be making their own changes locally then DVCS meant that it had to merge the code for you more reliably when you incorporated someone else's changes from their own repository.

Conclusion

So, the next time you start a small coding project, ask yourself: Do I actually care about this work? If yes, put it in DVCS and (ideally) make a remote "backup" copy. If no, why are you doing it?

With the right tools (such as TortoiseHg for Windows users) then using version control can be really simple, and it'll pay you back when you want to know when something went broke/wrong, when you want to know how you got to where you are now, or when things go wrong and you have to undo changes (including deleting files!).

Further reading

  • HgInit - a Mercurial tutorial that uses a recipe rather than source code as an example
  • Mecurial: The Definitive Guide - a full O'Reilly book, available for free (legally) from the author

Navigation