Sunday, November 08, 2009

svn: BAD

svn is just broken.

Google for git commit. First link points to the current documentation. Dito for svn commit. First hits point to outdated versions of the manual (usually 1.0, maximum 1.5).

Checkout a module. Unfortunately, the URL to check out usually ends on trunk so the default sandbox directory name isn't really helpful. Likewise, it took 'til 1.6 that they found out that typing the repository URL in many commands is a waste, and allowed ^/repo-relative paths not only in svn:externals but also in commands.

Just for kicks try an svn ls http://... operation over GPRS. Will take on the order of half a minute because GPRS round trip time is about a second, and svn ls does a good dozen http requests to the server, all serialized, but not reusing the connection. It is actually a good idea to prefix commands with ssh elsewhere; the ssh login is faster, and the svn command is then done on fast links. (Doesn't work for commands involving a sandbox, of course.)

svn import can only import a given tree into a new location within the svn repo; it can't do updates, like they are needed to import newer versions of vendor software and keep the file identity that will be needed to keep future merges smooth. There is a script in contrib that can do that, but it never got integrated into the import command proper.

Likewise, don't even expect a command that can create an archive (tar/zip) from the versioned files in the current directory (git archive), or that can delete all unversioned files in the sandbox (git clean). Or that tells you a human-readable name for the current revision based on the last tag (git describe).

And, of course, there is the big thing: the hyped merge support of svn 1.5 is broken by design. Isn't fixed in 1.6, and probably can't be fixed without changing the repository format. (I suspect the whole mergeinfo property stuff is just wrong.) In the meanwhile, and in a much shorter timespan than from the announcement of svm merge support to its eventual functional delivery, git appeared, and just did it right.

I don't like that. We've been waiting for svn 1.5 because of the merge support, and then started to switch. At the same time git appeared on my radar; I could use git cvsimport and git filter-branch to do our conversion from cvs to svn, in a way that allows to pull future changes from cvs to svn as well, and without manual intervention. Unfortunately I lost all liking for svn in the process, because git turned out to be just much better tooled, capable, and flexible. And faster in its development, its version number is bound to pass that of svn soon.

Thursday, October 22, 2009

The awesomeness of no rename

git does not store renames. All it does is store each version of the whole tree of the managed project. The diff and merge tools are those that look at the trees and file contents and notice that a file has been renamed.

For example, I had a project where a program was converted to C++, piecewise. One file was originally foo.c, then foo.cpp was created by copying and fixing the original C source. foo.c was not removed until later, so for a few commits both existed simultaneously. Indeed this was in CVS, and I just imported the stuff to git, to work on newer stuff there.

Now it was time to merge the line originating in CVS into the work branch. git merge just looked at the changes that needed to be merged all at once, and saw that foo.c was gone and foo.cpp was new over that stretch, and that they were 95% similar, so it assumed this was a rename, and properly merged the rename into the work branch.

That is the power of not doing or saving renames.

Monday, September 28, 2009

git and cvs: Coexistence

There is a relatively trivial way to use git on trees managed by cvs which I use often to carry around cvs projects for new work and enjoy the distributed nature (which comes quite handy when on a train). Recipe: Check out the target tree with cvs, go into the root, git init, and cvs-files | xargs git add. One git commit -m initial and you are ready to go.

Now you can clone around as you like, and do cvs updates in the gitted sandbox, and commit those into git, and back. Only thing is that on the way back to cvs you manually need to track file additions/deletions.

The script cvs-files looks like:

#!/bin/sh
find * -name CVS -type d | while read cdir
do
dir=`dirname $cdir`
if test X$dir = X. ; then
dir=""
else
dir="$dir/"
fi
sed -ne 's:^/\([^/][^/]*\)/.*$:\1:p' /dev/null ${dir}CVS/Entries* | \
sort -u | while read name
do
if test -r $dir$name; then
echo $dir$name
fi
done
done

It just goes through the CVS/Entries files to find out what's under CVS control to initially put those into git.

Tuesday, April 07, 2009

There isn't enough coffee in the universe

EVS is me going "Screw it, I'm not playing any more" and writing a
system that can talk to anything, given enough time and coffee.


And this mail came after a loooong day filled with nothing actually, unless you count trying in vain to get some GeForce running with openSuSE as work. (Which would have been fun if the driver would not only support portrait mode but also draw correctly and use the panel's natural resolution. eclipse look quite good on a portrait screen.)

Anyway, how can one get the idea that all version control systems are basically created equal, and that that is enough to write an universal server that can talk each protocol equally and fully? Even with cvs you can trivially create a repository than can't be converted to, say, git, and vice versa create a repository in git that can't be represented in cvs. In the latter case the revisions would all be there, but the merge information would be missing.

If you take out all the edge cases, that is, sufficiently many of them, then it may be made to work. But neither would many existing repositories match those requirement, nor would the resulting functionality be very interesting.

Wednesday, February 25, 2009

MacOS surprise: ssh-agent

I'm new on my macbook. Anyway, since leopard ssh asks for passphrases using a dialog instead of on the command line. This is not necessarily a bad idea, for example a git push from within git gui has no good place to ask.

On the other hand, using ssh-agent for that is even more convenient, and since this dialog allows you to store the password on the keyring I assumed that that is the preferred way and no one would bother to start an ssh agent under the whole of the leopard gui session. Turn out I was wrong. And my workaround of using xterms started from another xterm with an agent in there was quite unnecessary.

Thursday, January 01, 2009

ant: BAD

Who doesn't know history is condemned to repeat it. Or not even that, as ant managed. In my option, ant started with a bad idea and then made it worse. In other words, it is Broken As Designed.

The quotes are form the aforementioned page, section 'Apache Ant'.

Apache Ant is a Java-based build tool. In theory, it is kind of like Make, but without Make's wrinkles.

Yeah, the wrinkles have been replaced by pointy brackets, and all the deep problems haven't even been addressed at all.

Why another build tool when there is already make, gnumake, nmake, jam, and others? Because all those tools have limitations that Ant's original author couldn't live with when developing software across multiple platforms.

Well, at least I don't want to live with ant. As little as possible. Granted, make isn't usable for building java projects either, but then neither is ant, without working around the buggy javac task.

Make-like tools are inherently shell-based -- they evaluate a set of dependencies, then execute commands not unlike what you would issue in a shell.

That's the point! Why invent another scripting language when you can have one for free? Instead ant goes along inventing a new scripting language (which may not even be turing complete), and it takes them years to come to a point where ant can at least do what the good old shell always could.

And still, to do a simple grep Whatever | sed -e s/XX/YY/ you need to write java programs, compile them, feed the jars to ant, and after a few hour get it actually working. Not to menting the fact that those classes can't easily be part of your project because you need them before starting ant. (Ok, that's not actually true, but in an ugly way.)

This means that you can easily extend these tools by using or writing any program for the OS that you are working on. However, this also means that you limit yourself to the OS, or at least the OS type such as Unix, that you are working on.

Gladly so. Nowadays you can easily get VMs oder cygwin if you should really need to work elsewhere.

At least it's much better than making everything complicated, everywhere. The average trivial build.xml contains about 60 lines, of which three do actually vary with the project, and the rest is always the same. Don't repeat yourself, huh? At least we hope it's the same, and there aren't any copy&paste bugs lurking.

Makefiles are inherently evil as well. Anybody who has worked on them for any time has run into the dreaded tab problem. "Is my command not executing because I have a space in front of my tab!!!" said the original author of Ant way too many times.

The original author of ant obviously had the wrong editor, or was missing a little script to check for that. I've written my share of makefiles, and I can't remember ever running into that problem.

If only the original author of ant ran into the real problems that make poses; then ant wouldn't have been such a terrible misdesign.

Ant is different. Instead of a model where it is extended with shell-based commands, Ant is extended using Java classes. Instead of writing shell commands, the configuration files are XML-based, calling out a target tree where various tasks get executed. Each task is run by an object that implements a particular Task interface.

So, instead of writing little one-liners I need to write code in a programming language that itself is known as verbose, write additional build targets to get those compiled, and finally knit it all together in XML, yet another source of bloat to the whole end. (And for the nitpickers, ant uses two extra syntactic elements: comma-separated lists and property value replacements. Guess why they don't do that as XML elements.) ant has a serious Dr. No syndrome.

Granted, this removes some of the expressive power that is inherent by being able to construct a shell command such as `find . -name foo -exec rm {}`, but it gives you the ability to be cross platform -- to work anywhere and everywhere.

...and making that a lot of work. Ant really does not fall into the category 'make the easy things easy and the hard things possible'.

And hey, if you really need to execute a shell command, Ant has an <exec> task that allows different commands to be executed based on the OS that it is executing on.

Now that't a cop-out. If our uber-tool happens to not support what we want we can still go platform-specific. Thanks; due to general disgust with XML I wrote a little shell script (of all things) that does everything I need to do in the projects I need to manage, including compiling and collecting used libraries of my own. /bin/sh isn't exactly the best way to do, but the whole thing is just five times larger than the (broken) build.xml I got for those projects (and shorter than this blog entry), and the actual build script for one of those condenses to

#!/bin/sh
. "`dirname "$0"`"/../proj-a/tools/buildtool.sh
depdir ../proj-a
javacomp
mkjar proj-b

and is also a shell script.

Now what?



Ok, more details on how ant is misdesigned. The javac task is broken:
It only compiles java files that have no corresponding class file or
whose class file is older than the java file. It does not erase class
files for which there is no longer a source file, nor does it
recompile dependencies between the classes. Especially the latter
makes for interesting bugs. Workaround: Always clean before compiling,
either manually or by the 'dependencies'.

And these 'dependencies'. They aren't, really. The individual tasks sometimes execute conditionally depending on whether the destination is newer than the source, but you need to state dependencies explicitly, no matter whether ant could deduce the dependency itself. For example, when a javac task produces what a jar task consumes ant won't make them dependent automatically. Otherwise, when you put a target into the dependencies list of another, it is executed, no matter what. As opposed to the following make fragment

prog : main.c gen.c
cc -o prog main.c gen.c
tool : tool.c
cc -o tool tool.c
gen.c : tool gen.in
./tool gen.c

which tells that gen.c needs to be generated by tool which in turn needs to be compiled from tool.c. The point here is that when you call make, the tool won't be recompiled unless you modified its source, and gen.c won't be generated unless you modified either the generator or its input file.

Ant stays blissfully unaware of any of this dependency management, and thusly degenerates to the fixed execution of a number of scripts (called targets) with a number of commands (called tasks) each. If you want your task not to do work when none is needed, don't expect support from ant. Ant is not really a build tool, it is a simple script executor, however their creators talk about declarative operations and the ant way which seems pretty long-winded to me.

The 'declarative way' does not keep people from relying on the fact that the dependencies of a target are executed in the order they are specified. A dependencies="clean, compile" to work around the javac problem is all but uncommon, and clearly will break when ant decides to run the dependencies in inverse oerder.

To be fair, the dependency problem isn't exactly trivial, especially without support from the actual compiler. make doesn't do a good job for java, either. But on the other hand side we'd expect an industrial-strength build system to have invested not just a little thought?

Then ant completely ignored the lesson of the X windowing system. Those guys actually improved on make by using the C preprocessor. Their Imakefile system inspired another (proprietary) system that could just say

CProgramFromSources (prog) {
CSource (gen)
CSource (main)
}
CProgram (tool)
gen.c : tool gen.in
./tool gen.c

with one important difference: The macros expand so that not only make prog does the expected thing, but also that make clean removed all the temporaries, except for gen.c which was done with a plain make rule. We need to add

clean ::
rm -f gen.c

to make that work, too. This also shows another overlooked make feature: You can actually combine a target from multiple separate commands and dependencies. It's just not possible to have multiple clean targets in ant, making it even more error-prone to do the right cleanout.

Ant simply aims too low by one or two levels of abstraction.

Then what?



Unfortunately ant has gotten a lot of traction in the java community. Proves again that you need to be the first, not the best. Compounded by the fact that most programmers don't care about the build system any more than needed to make it apparently work. And everything and the kitchen sink is available as an ant task, so it's not just programmers to turn around.

And I'm not exactly in a position to get a lot of traction for a change. The most promising way is to fight the system from within; perhaps by actually having some preprocessor (again) generating a tmp/build.xml to then be included.

The most depressing thing is that ant will make the majority of people think that this is the state of the art in build system design. Far from it.

Wednesday, December 31, 2008

Factoring out helper classes

The job is to get a source and a destination path for a java compile. Because it's mostly the same, we want defaults for source and class directory. Because we occasionally have more than one separate hierarchy to compile, we want to provide a base path on which the defaults are applied. Thus this is the wet (aka non-dry) code:

AttrList al = new AttrList (l.getParam ());

String base = al.pull ("dir");
if (base == null) base = "";
else base += "/";

String src = al.pull ("src");
if (src == null) src = "src";
src = base + src;

String cls = al.pull ("classes");
if (cls == null) cls = "classes";
cls = base + cls;

We first get the base path and fiddle it a bit, especially when not given. Then we obtain source and destination, set it to default when not there, and prepend the base (which we fiddled so we can always do so even when it does not change the path).

The repetition looks ugly. So: Factor it out. We can't purely do it with a helper function, so we need a class (this is java, everything needs to be done in a class):

class Basifier {
String base;
public Basifier (String param) {
if (param == null) base = "";
else base = param + "/";
}
public String basify (String arg, String def) {
return base + (arg != null ? arg : def)
}

and use it:

AttrList al = new AttrList (l.getParam ());
final Basifier base = new Basifier (al.pull ("dir"));
final String src = base.basify (al.pull ("src"), "src");
final String cls = base.basify (al.pull ("classes"), "classes");

Not too bad, except that we need to define a helper class and instantiate it for each invocation of ours; and we note that the useful lifetime of the object is tied exactly to the lifetime of the invocation of our funktion. In my optinion the latter is a sign that we are doing something wrong: If two things always have the same lifetime, they should really be parts of one thing.

If java had local functions and closures, we could spare the separate class:

String base = al.pull ("dir");
if (base == null) base = "";
else base += "/";

public String basify (String arg, String def) {
return base + (arg != null ? arg : def)
}

final String src = basify (al.pull ("src"), "src");
final String cls = basify (al.pull ("classes"), "classes");

We save the need for the extra object, and the need to reference it. Arguably the helper class has the advantage that the basification is encapsulated there, but it is not just that but also the default value handling for the other parameters that is done in there. Besides, the basification may also be usable in other locations.

One interesting sidepoint is that the basifier is almost, but not quite, implementable as an anonmous class:
    
final Whatever base = new Object () {
String base;
{
base = al.pull ("dir");
if (base == null) base = "";
else base += "/";
}
public String basify (String arg, String def) {
return base + (arg != null ? arg : def)
}
};
final String src = base.basify (al.pull ("src"), "src");
final String cls = base.basify (al.pull ("classes"), "classes");

The only thing that does not work is that we cannot name the type of the anonymous class for declaring base, and we need that type to be able to invoke basify on that object. In scala that actually works by type inference; the variable is just declared as a variable, and the type is assumed to be the type of the initializing expression.

Is the class Basify something that is worthy of a separate existence, or is just a little bit of code that gets factored out, and because of the needed scope of the data it works with, it is forced to become a separate class. It may also be done as a static inner class; only the option of a local function is out.

Basically in the latter case the function invocation serves as a kind of object; we could view the local function as a member function of the invocation. If that is so, we might also view the function invocation as some kind of a class, and then we might actually make it to extend another class (or implement an interface):

public void methodAsObject (String p, String q) extends Basifier {
super (p);
final String here = basify (q, "def");
..
}

It only does not work because we don't have the constructor parameter at that time.

Over the fence



By the way, other languages just do it differently. In C++ you decide whether a local variable is just a reference to an object

Basifier *base = new Basifier (param);

or whether you actually want the object itself to be a local variable:

Basifier base (param);

In the latter case the lifetime of the object is directly tied to the function invocation. This can't work in safe garbage-collected languages because you can't remove the object before you know there are no more pointers to it, and this is incompatible with with stack allocation.

Oops



So, this was planned to be a rant against excessive objectification and turns out to motivate me to actually use the basifier class. Will eventually show up here; look for "javac".