Wednesday, December 31, 2008

Factoring out helper classes

The job is to get a source and a destination path for a java compile. Because it's mostly the same, we want defaults for source and class directory. Because we occasionally have more than one separate hierarchy to compile, we want to provide a base path on which the defaults are applied. Thus this is the wet (aka non-dry) code:

AttrList al = new AttrList (l.getParam ());

String base = al.pull ("dir");
if (base == null) base = "";
else base += "/";

String src = al.pull ("src");
if (src == null) src = "src";
src = base + src;

String cls = al.pull ("classes");
if (cls == null) cls = "classes";
cls = base + cls;

We first get the base path and fiddle it a bit, especially when not given. Then we obtain source and destination, set it to default when not there, and prepend the base (which we fiddled so we can always do so even when it does not change the path).

The repetition looks ugly. So: Factor it out. We can't purely do it with a helper function, so we need a class (this is java, everything needs to be done in a class):

class Basifier {
String base;
public Basifier (String param) {
if (param == null) base = "";
else base = param + "/";
public String basify (String arg, String def) {
return base + (arg != null ? arg : def)

and use it:

AttrList al = new AttrList (l.getParam ());
final Basifier base = new Basifier (al.pull ("dir"));
final String src = base.basify (al.pull ("src"), "src");
final String cls = base.basify (al.pull ("classes"), "classes");

Not too bad, except that we need to define a helper class and instantiate it for each invocation of ours; and we note that the useful lifetime of the object is tied exactly to the lifetime of the invocation of our funktion. In my optinion the latter is a sign that we are doing something wrong: If two things always have the same lifetime, they should really be parts of one thing.

If java had local functions and closures, we could spare the separate class:

String base = al.pull ("dir");
if (base == null) base = "";
else base += "/";

public String basify (String arg, String def) {
return base + (arg != null ? arg : def)

final String src = basify (al.pull ("src"), "src");
final String cls = basify (al.pull ("classes"), "classes");

We save the need for the extra object, and the need to reference it. Arguably the helper class has the advantage that the basification is encapsulated there, but it is not just that but also the default value handling for the other parameters that is done in there. Besides, the basification may also be usable in other locations.

One interesting sidepoint is that the basifier is almost, but not quite, implementable as an anonmous class:
final Whatever base = new Object () {
String base;
base = al.pull ("dir");
if (base == null) base = "";
else base += "/";
public String basify (String arg, String def) {
return base + (arg != null ? arg : def)
final String src = base.basify (al.pull ("src"), "src");
final String cls = base.basify (al.pull ("classes"), "classes");

The only thing that does not work is that we cannot name the type of the anonymous class for declaring base, and we need that type to be able to invoke basify on that object. In scala that actually works by type inference; the variable is just declared as a variable, and the type is assumed to be the type of the initializing expression.

Is the class Basify something that is worthy of a separate existence, or is just a little bit of code that gets factored out, and because of the needed scope of the data it works with, it is forced to become a separate class. It may also be done as a static inner class; only the option of a local function is out.

Basically in the latter case the function invocation serves as a kind of object; we could view the local function as a member function of the invocation. If that is so, we might also view the function invocation as some kind of a class, and then we might actually make it to extend another class (or implement an interface):

public void methodAsObject (String p, String q) extends Basifier {
super (p);
final String here = basify (q, "def");

It only does not work because we don't have the constructor parameter at that time.

Over the fence

By the way, other languages just do it differently. In C++ you decide whether a local variable is just a reference to an object

Basifier *base = new Basifier (param);

or whether you actually want the object itself to be a local variable:

Basifier base (param);

In the latter case the lifetime of the object is directly tied to the function invocation. This can't work in safe garbage-collected languages because you can't remove the object before you know there are no more pointers to it, and this is incompatible with with stack allocation.


So, this was planned to be a rant against excessive objectification and turns out to motivate me to actually use the basifier class. Will eventually show up here; look for "javac".

Tuesday, December 30, 2008

svn: BAD

Executive summary: Read about svn merge --reintegrate and weep.

We waited long for subversion 1.5 to come out. It was hyped to have proper merge support and thus be a real step forward for cvs users. Actually we were using cvsnt which already had some merge support until they broke part of it, and we kept using a patched version.

Anyway, 1.5 finally came out this summer, and since we did not only want to keep having merge support but the history-preserving renaming of subversion as well, the step was done. (We were not migrating but only using svn for new projects.)

Then I started reading the updated svnbook, and wondered: Why on earth is there an option --reintegrate? It turns out that you can only open a feature branch and merge repeatedly into it, to keep it in sync with new stuff on the trunk. But you can only once merge back into the trunk or wherever you came, and then the feature branch must be killed because further merging won't work.
And then it turns out that svn 1.5 was released with this half-working merge support on purpose because it was overdue already and they managed to represent the merge info in the repository in a way that makes a correct implementation impossible, and thusly, it won't happen before 1.6 or so.

And in nearly the same time frame, another star appeared and ran circles around svn in terms of ease of use (just count the number of times you need to type absolute urls in svn command), possible workflows, breadth of utilites. And it's a nice svn client, too.

Tuesday, December 23, 2008

git svn without root

git svn is my favorite way of working with svn. Uses less space than a native svn sandbox, still contains the complete bloody history, and doesn't fool grep-find with the pristine copies.

Installing both the right way isn't quite easy. For git you just need the standard install, but you need to install svn including its perl bindings. And you can control the installation location of svn with the usual --prefix=$HOME/local to configure, but this does not change the place where make install-swig-pl wants to place the shared libraries and other perl binding code. You can only change that by doing some steps manually:

tar xzf subversion-deps-1.5.4.tar.gz
cd subversion-1.5.4
CFLAGS=-fPIC ./configure --prefix=$HOME/local
make install
make swig-pl-lib
make install-swig-pl-lib
cd subversion/bindings/swig/perl/native
perl Makefile.PL PREFIX=$HOME/local
make install

Basic point: You need to invoke the perl make separately, and tell it where to install. Seems not to be so easy to include an option for that into the toplevel configure?

Also note the need to manually say CFLAGS=-fPIC even though the target system (Novell SuSE) isn't exactly unusual, and configure should figure this out.

Anyway, the perl library path (PERL5LIB) seems to already include $HOME/local, so after above commands, git svn works. If you use a different location you need to fix PERL5LIB.

Saturday, December 20, 2008

git on svn

git can push onto webdav-enabled http servers.

svn is a webdav-enabled http server.

Unfortunately, it does not quite work. Cloning via http://svnhost/whatever.git/ works, but I can't push back there. It says error: Error: no DAV locking support, and I tend to believe that. svn can lock, but apparently not over http and svn urls, and equally apparently locking is not enabled on our server. And I can't change that, and setting up my own svn server just to test that is out of the question.

Not to mention the usefulness of this exercise. Operating a git repository versioned in svn is...strange. It just would enable one to really use git and still claim to work with svn.

Tuesday, December 16, 2008

Under attack

I have a DSL link home, and by now I operate a regular unix system for the router, and it's the first system I have with an internet-facing sshd. Now, every once in a while some bot comes along and seems to try out a lot of account/password combinations, not withstanding the fact that I only enabled public key authentication.

Idea: Whenever massive login attempts are detected, just reflect further incoming connection attempts from the same address back to that address. Thus the bot just attacks its own machine.

Question: How to do this without hacking the sshd itself? Possibly temporary static nat rules?

Saturday, November 08, 2008

git usage

cvs was easy to take on, even though I didn't know any version control system beforehand.

git was harder, even though I did know some systems by then.

The difference: cvs has basically one way to work with it, while with git you are much freeer in how you want to work. With freedom comes the problem of choice and the need of experience. For example git practically doesn't let you destroy any history, but sometimes it can be tough to find out how to recover from a particular mistake.

Also in the mix: When I started cvs, I had nothing. When I started with git, I had quite some history to import and to deal with.

As long as you don't do branches (and with plain cvs, you better shouldn't), cvs is just a way of sharing a common tree without risking accidential overwrites. With git there is a wealth of things you can do, and of those you can do nearly right. I don't think that I am yet in a position to foist git over a bunch of developers and to give them enough training so that they can get along without big mishaps or frustration. Especially not for windows guys.

So mostly I still use git-svn, live with the company decision to use svn, and sometimes use git hackery for special effects that nobody else needs to know about that git was even involved.

Friday, November 07, 2008

Comment notification

Blogger bites too. I just noticed that there are some comments here, and I remember turning on email notifications for comments, and relied on these.

Consequence: Only today did I even notice the existence of any comments on this blog, for which I do want to thank you, dear readers!

(Note to self: Rework the post labels.)

Priorities and signs

This was almost going to be titled 'signs are hard'. I hat a problem that a simple timer queue implementation did not quite work. Problem: Some timers seemed to vanish from the java.util.PriorityQueue, or were in the wrong order. I tried, for lack of other ideas to change the sign of the compare method, to see if I got that wrong. Letting the handler thread wait for and pick the latest timer is not a good idea. But alas, that wasn't it, and there goes the title.

Google ("java PriorityQueue remove"): Somewhat unexpectedly, the first result is not the documentation of the queue but rather exactly the report for the bug that is biting me: remove does remove the first element it finds that compares equal to its argument, not that exact object: Voila: wrong element removed; that timer never fires, I'm not getting #ticks.

Good. So now I know what's wrong and can finally implement a strategy to do fewer removes in the first place. Many timers here serve as a timeout and leave the queue not at the front but by remove. And the timeout is constant. So if a timer is set to a later time I can just leave it in the queue and reposition it when it finally appears up front.

Also there is a decision to be made whether I leave the workaround in. It's fixed somewhere in 1.6 (b51?), and I can't rely on the JVM always being that new.

Wednesday, September 24, 2008

cvs to svn via git

It actually works. The problem: I wanted to take the history of a partial cvs repository into svn. Directly using the tools was out for two points: First, one needs access to the svn repository itself (at least I think so and didn't bother to check because of) second I wanted to patch the paths of the java packages contained therein. Apparently eclipse doesn't quite want to deal with, so it needs to converted to even though the company only owns the former domain.

Anyway, I did not figure out yet how to use git-cvsimport and git-svn both on the same repository sensibly; and I think git-svn rewriting of the commits doesn't quite make that feasible. But the problem already starts with getting both histories to have a common ancestor. Unfortunately git does not have a null commit as the universal base.

Ok, final approach: Create project base directories (the ttb) in svn, do an git svn clone on that. Separately, use git-cvsimport to get the history from cvs into git. (Caveat: The approach does only handle a single linear history well.) Use git format-patch --root commit to get a series of patches, run those through sed -e (which patches directory names as well as imports and package declarations). Then apply (git-am) the resulting commits in the git-svn repository, and git-svn dcommit them. Done.

Except that my link to the svn repository isn't exacly fast at the moment (roundtrip at about a second), and the 60 commits took two hours. svn does not seem to like slow links; a simple tag operation (svn cp trunk tag/some) on a small thingy took me more than a minute over GPRS once.

Here's the complete commands:
mkdir myproj-cvs
cd myproj-cvs
git-cvsimport -p x -v -k -a -d :pserver:krey@localhost:/opt/cvs mystuff/myproj
git-format-patch -o out --root `cat .git/refs/heads/master`
mkdir mod
cd out
for i in *.patch; do sed -e s:company_name:companyname:g $i >../mod/$i; done
cd ../..
svn mkdir http://localhost:4080/repos/mystuff/myproj -m 'base dir'
svn mkdir http://localhost:4080/repos/mystuff/myproj/trunk -m 'trunk dir'
git-svn clone -s http://localhost:4080/repos/mystuff/myproj
cd myproj
git am ~/myproj-cvs/mod/*
git svn dcommit

Monday, September 22, 2008

Inband signalling is evil

In telephone inband signalling is, or rather, was to control circuit setup and teardown by tone signals within the speech band that is also transmitted from speaker to listener. Had the unfortunate effect that you use a specific whistle on the phone and lose the connection. Soon got exploited in more creative ways.

Anyway, in C++'s std::string often the same thing happens. When in C, you have a char*, and you can null it to mark the no value case. Not so in C++. You need a special value, and usually this is the empty string "". Fine and dandy as long as you can be sure that that value will never actually occur. Once this happens all the code will already be riddled with if (val != "") and you never find them all. Bad luck, just like with Y2K.

Neither do you have Maybe, the Haskell way of avoiding null pointer exceptions completely.

Friday, February 15, 2008

Dr. No vs. anonymous functions

You may remember the first James Bond movie where the hero gets into contaminated areas and thus gets the decontamination treat. Now, this doesn't just happen but is shown in quite some details. As if to show off 'hey, look, we know all this stuff' instead of simply assuming it.

Now there is a similar thing with the syntax of anonymous functions. In some languages it is like function (x) return 2 * x; end while others say the same as \x{2 * x}. The shorter syntax stand less in the way and makes it possible to write, say, list.filter (\x{ == quest}).map (\x{x.zipcode}) in one line, while the former notation will cause massive keyword clutter hiding the actual operations.

There is a case in that the longer syntax may have been judiciously chosen to make clear that the use of anonymous functions (plus closures) is not encouraged in this language. I don't assume that it is a case of show-off, as in Cobol which needs to make clear that it can DISPLAY or COMPUTE something, as if this wasn't to be expected of a computer language. But the question remains, whether, in any specific language the form of anonymous functions wasn't just chosen because of its similarity to named functions. In named functions this is no big deal as their opening has a line for itself anyway. Anonymous functions need to be unobtrusive to be really useful, while the more verbose forms make many uses unwieldy. The question is whether the language designers really wanted to discourage that use or whether they didn't think quite that far in that direction, not having actively used functional style themselves.

Monday, February 04, 2008

The other ternary operator

There is an interesting stain on OO languages like C++ and Java, namely with member function invocation. Normally, complex expression can be taken apart, and the intermediate value stored in a variable. int a = 2 * (3 +1); can equally be written int helper = 3 +1; int a = 2 * helper;. However, this breaks with obj.meth (). When you try instead to say helper = obj.meth; helper ();, you will run into a problem: What is the type of helper? It should be a function pointer/reference. Java does not even have functions; in C++ you are equally at a loss for a proper type. (I'm not going into that actually the parameter types aka signature form a part of the method name.)

Indeed the compilers simply go to treat the combination a.b(c) as a single operator. In Java the method and member name spaces are actually separate; a method can have the same name as a member. (Does that make it a Java-2?)

The only language that makes this explicit is Lua. Member invocation is obj:meth(arg), which is the same as obj.meth(obj,arg) (except that obj is only evaluated once).

I don't know of any language that does it properly. Even though a member function exists as code only once, logically there is a distinct function for each object. The member selection needs to bind the function code address and the object pointer together, into something that is quite similar to a closure.

Custom syntax: Macros and closures

I'm lots interested in getting custom syntax for this and that where I get annoyed by C++ or especially Java simply not letting me doing any. For example, in Java you need to write all these EJB-style accessors yourself. In ruby you just say attr_reader :attr1, :attr2, and when reading other people's code you can be sure that there isn't a typo that let getAttr1 return attr2 by accident.

Just as I was embracing the idea that macros are absolutely necessary for much stuff,
ruby came along and brought block arguments. And tada, you can do map and everything without macros just as regular function. Some guys even ported the idea to java. So I became unsure whether macros are really necessary. Ok, basically I still believe that because (programmatic) macros can do anything; I think they are the best way to integrate custom code generators.

And then came along Raganwald and wanted something like ph = find_whatever()&&.author&&.phone: Call find_whatever, and if it returns something, get the author, and if there is one, get his phone number. Otherwise the whole expression is nil.

Now it is amazing that you can get ruby to do something like that at all, even though it cheats on two points: It does not use the intended syntax (obj&&.name) but (note the conspicious second dot), and it does not actually take the shortcut but instead goes through the rest with a dummy object.

But he does not just to avoid the helper variables, but also to get a left-to-right reading. You just can't do that in lisp, for instance. (Ok, it's going to be the inverse.)

But why this hit me: While I have somes idea how I would bring macros into language with syntax, I have no idea yet how I would make &&. definable. The problems here:
  • When you get to the &&. you are already in the middle of the expression the macro would need to replace, so there would be a need to allow operators be a kind of macro

  • Member selection is not a binary, but a postfix operator! While it looks like one, the right hand side is not an expression, but a compile-time constant. (In a sense there are many different operators, one for each possible member name.)

I guess I need to think about that some to enable user-defined operators not only as functions, but also as macros.

Saturday, February 02, 2008

Burned by override: Limits of refactoring by compiler

In OO programming (here: java) there is the difference between redefining a method in a subclass, and adding a new method in a subclass. It also is quite easy to accidentally add one instead of override one by having a subtly different method signature.

Then there is 'refactoring by compiler' as I call it. Just do the intended interface change at one place and be guided by the compiler errors to all the places that needs adatptions to that. In sufficiently static languages that does actually work (and it is a bit hard to get a program that compiles but does not work).

Anyway in this case I changed, compiled, fixed, and the program did not work in very strange ways. Well, it was the case from above: I started my refactoring by changing the signature of a overridden method, thereby making it into a new one that never got invoked. The macro did no longer think it was one, it went downhill.

Good thing scala requires to declare whether a method is an override. Java does too, but my personal java-fu is from times too old to actively use @override (or to sprinkle code with finals, for that matter).

Back to coding...