Next: Getting started
Up: Darcs User Manual
Previous: Contents
Contents
Subsections
Darcs is a revision control system, along the lines of CVS or arch. That
means that it keeps track of various revisions and branches of your
project, allows for changes to propagate from one branch to another. Darcs
is intended to be an ``advanced'' revision control system. Darcs has two
particularly distinctive features which differ from other revision control
systems: 1) each copy of the source is a fully functional branch, and 2)
underlying darcs is a consistent and powerful theory of patches.
The primary simplifying notion of darcs is that every copy of your
source code is a full repository. This is dramatically different from CVS,
in which the normal usage is for there to be one central repository from
which source code will be checked out. It is closer to the notion of arch,
since the `normal' use of arch is for each developer to create his own
repository. However, darcs makes it even easier, since simply checking out
the code is all it takes to create a new repository. This has several
advantages, since you can harness the full power of darcs in any scratch
copy of your code, without committing your possibly destabilizing changes to
a central repository.
The development of a simplified theory of patches is what originally
motivated me to create darcs. This patch formalism means that darcs patches
have a set of properties, which make possible manipulations that couldn't be
done in other revision control systems. First, every patch is invertible.
Secondly, sequential patches (i.e. patches that are created in sequence, one
after the other) can be reordered, although this reordering can fail, which
means the second patch is dependent on the first. Thirdly, patches which are
in parallel (i.e. both patches were created by modifying identical trees)
can be merged, and the result of a set of merges is independent of the order
in which the merges are performed. This last property is critical to darcs'
philosophy, as it means that a particular version of a source tree is fully
defined by the list of patches that are in it, i.e. there is no issue
regarding the order in which merges are performed. For a more thorough
discussion of darcs' theory of patches, see Appendix
.
Besides being ``advanced'' as discussed above, darcs is actually also quite
simple. Versioning tools can be seen as three layers. At the foundation is
the ability to manipulate changes. On top of that must be placed some kind
of database system to keep track of the changes. Finally, at the very top is
some sort of distribution system for getting changes from one place to
another.
Really, only the first of these three layers is of particular interest to
me, so the other two are done as simply as possible. At the database
layer, darcs just has an ordered list of patches along with the patches
themselves, each stored as an individual file. Darcs' distribution system
is strongly inspired by that of arch. Like arch, darcs uses a dumb server,
typically apache or just a local or network file system when pulling
patches. darcs has built-in support for using ssh
to write to a remote file
system. A darcs executable is called on the remote system to apply the patches.
Arbitrary other transport protocols are supported, through an environment
variable describing a command that will run darcs on the remote system.
See the documentation for DARCS_APPLY_FOO in Chapter
for details.
The recommended method is to send patches through gpg-signed email
messages, which has the advantage of being mostly asynchronous.
In the last paragraph, I explained revision control systems in terms of
three layers. One can also look at them as having two distinct uses. One
is to provide a history of previous versions. The other is to keep track
of changes that are made to the repository, and to allow these changes to
be merged and moved from one repository to another. These two uses are
distinct, and almost orthogonal, in the sense that a tool can support one
of the two uses optimally while providing no support for the other. Darcs
is not intended to maintain a history of versions, although it is possible
to kludge together such a revision history, either by making each new patch
depend on all previous patches, or by tagging regularly. In a sense, this
is what the tag feature is for, but the intention is that tagging will be
used only to mark particularly notable versions (e.g. released versions, or
perhaps versions that pass a time consuming test suite).
Other revision control systems are centered upon the job of keeping track
of a history of versions, with the ability to merge changes being added as
it was seen that this would be desirable. But the fundamental object
remained the versions themselves.
In such a system, a patch (I am using patch here to mean an encapsulated
set of changes) is uniquely determined by two trees. Merging changes that
are in two trees consists of finding a common parent tree, computing the
diffs of each tree with their parent, and then cleverly combining those two
diffs and applying the combined diff to the parent tree, possibly at some
point in the process allowing human intervention, to allow for fixing up
problems in the merge such as conflicts.
In the world of darcs, the source tree is not the fundamental
object, but rather the patch is the fundamental object. Rather than a
patch being defined in terms of the difference between two trees, a tree is
defined as the result of applying a given set of patches to an empty tree.
Moreover, these patches may be reordered (unless there are dependencies
between the patches involved) without changing the tree. As a result,
there is no need to find a common parent when performing a merge. Or, if
you like, their common parent is defined by the set of common patches, and
may not correspond to any version in the version history.
One useful consequence of darcs' patch-oriented philosophy is that since a
patch need not be uniquely defined by a pair of trees (old and new), we can
have several ways of representing the same change, which differ only in how
they commute and what the result of merging them is. Of course, creating
such a patch will require some sort of user input. This is a Good Thing,
since the user creating the patch should be the one forced to think
about what he really wants to change, rather than the users merging the
patch. An example of this is the token replace patch (See
Section
). This feature makes it possible to create a
patch, for example, which changes every instance of the variable
``stupidly_named_var'' to ``better_var_name'', while leaving
``other_stupidly_named_var'' untouched. When this patch is merged with
any other patch involving the ``stupidly_named_var'', that instance will
also be modified to ``better_var_name''. This is in contrast to a more
conventional merging method which would not only fail to change new
instances of the variable, but would also involve conflicts when merging
with any patch that modifies lines containing the variable. By more using
additional information about the programmer's intent, darcs is thus able to
make the process of changing a variable name the trivial task that it
really is, which is really just a trivial search and replace, modulo
tokenizing the code appropriately.
The patch formalism discussed in Appendix
is what makes darcs'
approach possible. In order for a tree to consist of a set of patches,
there must be a deterministic merge of any set of patches, regardless of the
order in which they must be merged. This requires that one be able to
reorder patches. While I don't know that the patches are required to be
invertible as well, my implementation certainly requires invertibility. In
particular, invertibility is required to make use of
Theorem
, which is used extensively in the manipulation of
merges.
In darcs, the equivalent of a cvs ``commit'' is called record, because it
doesn't put the change into any remote or centralized repository. Changes
are always recorded locally, meaning no net access is required in order to
work on your project and record changes as you make them. Moreover, this
means that there is no need for a separate ``disconnected operation'' mode.
You can choose to perform an interactive record, in which case darcs will
prompt you for each change you have made and ask if you wish to record it.
Of course, you can tell darcs to record all the changes in a given file, or
to skip all the changes in a given file, or go back to a previous change,
or whatever. There is also an experimental graphical interface, which
allows you to view and choose changes even more easily, and in whichever
order you like.
As a corollary to the ``local'' nature of the record operation, if a change
hasn't yet been published to the world--that is, if the local repository
isn't accessible by others--you can safely unrecord a change (even if it
wasn't the most recently recorded change) and then re-record it
differently, for example if you forgot to add a file, introduced a bug or
realized that what you recorded as a single change was really two separate
changes.
Most darcs commands support an interactive interface. The ``revert''
command, for example, which undoes unrecorded changes has the same
interface as record, so you can easily revert just a single change. Pull,
push, send and apply all allow you to view and interactively select which
changes you wish to pull, push, send or apply.
Darcs has support for integrating a test suite with a repository. If you
choose to use this, you can define a test command (e.g. ``make check'') and
have darcs run that command on a clean copy of the project either prior to
recording a change or prior to applying changes--and to reject changes
that cause the test to fail.
Darcs does not require a specialized server in order to make a repository
available for read access. You can use http, ftp, or even just a plain old
ssh server to access your darcs repository.
Darcs doesn't try to manage write access. That's your business. Supported
push methods include direct ssh access (if you're willing to give
direct ssh access away), using sudo to allow users who already have shell
access to only apply changes to the repository, or verification of
gpg-signed changes sent by email against a list of allowed keys. In
addition, there is good support for submission of patches by email that
are not automatically applied, but can easily be applied with a shell escape
from a mail reader (this is how I deal with contributions to darcs).
Every darcs repository is created equal (well, with the exception of a
``partial'' repository, which doesn't contain a full history...), and every
working directory has an associated repository. As a result, there is a
symmetry between ``uploading'' and ``downloading'' changes--you can use
the same commands (push or pull) for either purpose.
Darcs has a CGI script that allows browsing of the repositories.
Darcs runs on UNIX (or UNIX-like) systems (which includes Mac OS X) as well
as on Microsoft Windows.
Renames or moves of files and directories, of course are handled properly,
so when you rename a file or move it to a different directory, its history
is unbroken, and merges with repositories that don't have the file renamed
will work as expected.
You can use the ``darcs replace'' command to modify all occurrences of a
particular token (defined by a configurable set of characters that are
allowed in ``tokens'') in a file. This has the advantage that merges with
changes that introduce new copies of the old token will have the effect of
changing it to the new token--which comes in handy when changing a
variable or function name that is used throughout a project.
You can easily configure the default flags passed to any command on either
a per-repository or a per-user basis or a combination thereof.
Next: Getting started
Up: Darcs User Manual
Previous: Contents
Contents