Saturday, February 11, 2012

method_missing, C style

Given: We need to interface to communication in S-expressions. In C. We are given a library that does the communication and S-expr representation in some C data structure. We expect commands like

(get "file" 2500 250)

that is, read from a given file at given position and length, and return the data.

Now, doing this based on elementary functions like sexprIsCons() and sexprGetCar () is pretty unwieldy. It would be much nicer to simply do

const char *file;
int pos, len;
if (fs_ps_get_SII (sx, file, pos, len)) {
unsigned char *bp;
int blen;
get_data (file, pos, len, &bp, &blen); // TODO: error handling
return fs_mk_SBy (file, bp, blen);

We'd like to have one function that parses the incoming S-expression into some variables, and another to compose the reply. (The fs_ prefix is used since this is C and its usual method of namespacing.)

Now, we could go along and implement those functions by hand. But, since we even encoded the exact data types we expect into the function names, we could just as well generate them. The namespace prefix and the _mk_ (make) and _ps_ (parse) prefixes suffice as a discriminator against catching other names, and thus we can just scan the whole source file(s) for these patterns and generate the according functions. VoilĂ , auto-generation of missing functions.

Note: The _ps_ functions are actually macros; otherwise we'd need to pass in the addresses instead of the variables themselves.

Exercise for the reader: Also implement command collection so that all functions named fs_cmd_get_XXX are directly put into a command table. Then a server would need only the command implementations itself and the global setup code.

Sunday, March 06, 2011

Skewed distribution

Distributing items (IP packets, phone calls, http requests) randomly onto a set of machines (say, four) is easy (in C here):

int slot = random_nonnegative () % 4;

The number space returned by the random number generator is big enough that the items are essentially evenly distributed even if the number of slots wasn't a divider of the size of the random number space.

I want a skewed distribution, however. The slots shall be loaded unevenly. Thus:

int slot = random_nonnegative () % 4;
slot = random_nonnegative () % (slot + 1);

That is, if slot is 3 after the first line, then we have an even change to get to any of the slots in the second line. If it is 0 we can only end up there. In essence there are four ways to get to slot 0, three to slot 1, two to slot 2, and one to slot 3. Zeroth-order intuition would thus say that four out of ten (1+2+3+4) items would end up in slot zero.

Except that the experimental results don't match. And for a reason. For slot being 3 in the first line, a fourth of the item go to each slot; for slot being 2, a third goes to each of the three first slots, and so on. Thus slot 3 receives a quarter of a quarter, or 1/16th of all items (and not one tenth); slot 2 receives a quarter of a quarter likewise, plus a third of a quarter via slot being 2 on first line; slot 0 finally gets 1/16+1/12+1/8+1/4, which is 3/48+4/48+6/48+12/84, which is 25/48, or about 52%. Which is a bit more than the 40% of our first guess.


char *target = targetarray [
random_nonneg () % (
random_nonneg () % (sizeof (targetarray) /
sizeof (targetarray [0]))
+ 1)];

for the golfing old-style C hacker.)

Wednesday, December 23, 2009

svn needs IDE integration

It's funny; all the world craves for svn integration for their IDE and doesn't want git because of lack of such integration. On the other hand I am quite happy without the integration, but then I was raised on a command line.

And then I noticed that, for the usual IDEs (java dev, that is), you need IDE integration for svn for a very simple reason: refactoring. If you rename or repackage a class, the source file changes location, and svn needs to record that as a file rename, and you need to tell it so. Now, if the IDE has moved the file, you can't do again with svn, and that would be the only way to make svn to know the move. (You could stop the IE, move the file back and then with svn move it forth again, but...shudder.)

Thus the IDE must be able to talk directly with svn. It's not just a question of avoiding a suboptimal command line, it's a necessity. With git, on the other hand, there is no such need as it does not directly track renames anyway and is much smarter about them and merging.

Sunday, November 15, 2009

svn: more bad

Ranting about tortoisesvn and it committing new files with CR/LF in them, I finally found that there is apparently no way to make svn behave the right way (in our sense) of doing svn:eol-style=native automatically. You need to do a setup like this. Yet another misdesign. (And if you work on two projects where these defaults differ, then you are out of luck.)

By the way, somehow nobody did think that a by-line diff of svn:externals would be a good thing. We usually have one directory with six or so externals, and when you change one of them svn diff just shows the whole property text as changed, and not individual lines. Tortoisesvn is smarter here (as in other places), reinforcing my claim that with svn you need GUI support because the command line interface isn't exactly pretty.

Sunday, November 08, 2009

svn: BAD

svn is just broken.

Google for git commit. First link points to the current documentation. Dito for svn commit. First hits point to outdated versions of the manual (usually 1.0, maximum 1.5).

Checkout a module. Unfortunately, the URL to check out usually ends on trunk so the default sandbox directory name isn't really helpful. Likewise, it took 'til 1.6 that they found out that typing the repository URL in many commands is a waste, and allowed ^/repo-relative paths not only in svn:externals but also in commands.

Just for kicks try an svn ls http://... operation over GPRS. Will take on the order of half a minute because GPRS round trip time is about a second, and svn ls does a good dozen http requests to the server, all serialized, but not reusing the connection. It is actually a good idea to prefix commands with ssh elsewhere; the ssh login is faster, and the svn command is then done on fast links. (Doesn't work for commands involving a sandbox, of course.)

svn import can only import a given tree into a new location within the svn repo; it can't do updates, like they are needed to import newer versions of vendor software and keep the file identity that will be needed to keep future merges smooth. There is a script in contrib that can do that, but it never got integrated into the import command proper.

Likewise, don't even expect a command that can create an archive (tar/zip) from the versioned files in the current directory (git archive), or that can delete all unversioned files in the sandbox (git clean). Or that tells you a human-readable name for the current revision based on the last tag (git describe).

And, of course, there is the big thing: the hyped merge support of svn 1.5 is broken by design. Isn't fixed in 1.6, and probably can't be fixed without changing the repository format. (I suspect the whole mergeinfo property stuff is just wrong.) In the meanwhile, and in a much shorter timespan than from the announcement of svm merge support to its eventual functional delivery, git appeared, and just did it right.

I don't like that. We've been waiting for svn 1.5 because of the merge support, and then started to switch. At the same time git appeared on my radar; I could use git cvsimport and git filter-branch to do our conversion from cvs to svn, in a way that allows to pull future changes from cvs to svn as well, and without manual intervention. Unfortunately I lost all liking for svn in the process, because git turned out to be just much better tooled, capable, and flexible. And faster in its development, its version number is bound to pass that of svn soon.

Thursday, October 22, 2009

The awesomeness of no rename

git does not store renames. All it does is store each version of the whole tree of the managed project. The diff and merge tools are those that look at the trees and file contents and notice that a file has been renamed.

For example, I had a project where a program was converted to C++, piecewise. One file was originally foo.c, then foo.cpp was created by copying and fixing the original C source. foo.c was not removed until later, so for a few commits both existed simultaneously. Indeed this was in CVS, and I just imported the stuff to git, to work on newer stuff there.

Now it was time to merge the line originating in CVS into the work branch. git merge just looked at the changes that needed to be merged all at once, and saw that foo.c was gone and foo.cpp was new over that stretch, and that they were 95% similar, so it assumed this was a rename, and properly merged the rename into the work branch.

That is the power of not doing or saving renames.

Monday, September 28, 2009

git and cvs: Coexistence

There is a relatively trivial way to use git on trees managed by cvs which I use often to carry around cvs projects for new work and enjoy the distributed nature (which comes quite handy when on a train). Recipe: Check out the target tree with cvs, go into the root, git init, and cvs-files | xargs git add. One git commit -m initial and you are ready to go.

Now you can clone around as you like, and do cvs updates in the gitted sandbox, and commit those into git, and back. Only thing is that on the way back to cvs you manually need to track file additions/deletions.

The script cvs-files looks like:

find * -name CVS -type d | while read cdir
dir=`dirname $cdir`
if test X$dir = X. ; then
sed -ne 's:^/\([^/][^/]*\)/.*$:\1:p' /dev/null ${dir}CVS/Entries* | \
sort -u | while read name
if test -r $dir$name; then
echo $dir$name

It just goes through the CVS/Entries files to find out what's under CVS control to initially put those into git.