Side Notes: 2011

Wednesday, November 23, 2011

Cygwin git-svn messed up

I upgraded my Cygwin from version 1.5 to 1.7 (finally), and found that git-svn command was broken

$ git svn rebase
Can't locate SVN/Core.pm in @INC (@INC contains:
/usr/lib/perl5/site_perl/5.10
/usr/lib/perl5/site_perl/5.14/i686-cygwin
/usr/lib/perl5/site_perl/5.14
/usr/lib/perl5/vendor_perl/5.14/i686-cygwin
/usr/lib/perl5/vendor_perl/5.14
/usr/lib/perl5/5.14/i686-cygwin
/usr/lib/perl5/5.14
/usr/lib/perl5/vendor_perl/5.10
/usr/lib/perl5/site_perl/5.8 .) at /usr/lib/git-core/git-svn line 42.

Usually this error indicates that subversion-perl package is not installed, but that was not the case as we can see it from cygcheck output:

$ cygcheck -c subversion-perl
Cygwin Package Information
Package              Version        Status
subversion-perl      1.7.1-1        OK

As it turned out, the problem was in the package dependency management. From the error above we saw that perl was looking for SVN/Core.pm in /usr/lib/perl5/vendor_perl/5.14/i686-cygwin directory. But the latest subversion-perl installed it in the different place

$ cygcheck -l subversion-perl
...
/usr/lib/perl5/vendor_perl/5.10/i686-cygwin/SVN/Core.pm
...

The problem was solved by downgrading perl from 5.14.x-x to 5.10.x-x.

If you have the same problem, check which version of perl package you have installed

$ cygcheck -f /usr/bin/perl
perl-5.10.1-5

It must be the same as the version used by subversion-perl.

Saturday, November 05, 2011

Ford Marbles

I found these marvelous renderings of Ford circles on flickr. I can't help but share them here.

As Thomae's function, Ford circles is another visual representation of rational numbers. You can investigate them here with interactive Wolfram demo.

Friday, September 16, 2011

Modulo who?

When programmer and mathematician are talking about modulus or modulo, there is often a confusion what this term means. For programmer modulo means an operator that finds the remainder of division of one number by another, e.g. 5 mod 2 = 1. For mathematician modulo is a congruence relation between two numbers: a and b are said to be congruent modulo n, written a ≡ b (mod n), if their difference a − b is an integer multiple of n.

These two definitions are not equivalent. The former is a special case of the latter: if b mod n = a then a ≡ b (mod n). The inverse is not true in general case. 5 mod 2 = 1, and 1 ≡ 5 (mod 2) because 1 - 5 = -4 is integer multiple of 2. Now 5 ≡ 1 (mod 2) because 5 - 1=4 is evenly divisible by 2, but 1 mod 2 = 1, not 5.

The biggest confusion happens when programmer and mathematician start arguing about Gauss' famous golden theorem where both definitions of modulus can be used.

Saturday, August 06, 2011

Thomae's function

Thomae's function (a.k.a. Riemann function) is defined on the interval (0, 1) as follows

Here is the graph of this function with some points highlighted as plus symbols for better view.

This function has interesting property: it's continuous at all irrational numbers. It's easy to see this if you notice that for any positive ε there is finite number of points above the line y = ε. That means for any irrational number x₀ you can always construct a δ-neighbourhood that doesn't contain any point from the area above the line y = ε.

To generate the data file with point coordinates I used Common Lisp program:

(defun rational-numbers (max-denominator)
  (let ((result (list)))
    (loop for q from 2 to max-denominator do
      (loop for p from 1 to (1- q) do
        (pushnew (/ p q) result)))
    result))
 
(defun thomae-rational-points (abscissae)
  (mapcar (lambda (x) (list x (/ 1 (denominator x)))) abscissae))
 
(defun thomae (max-denominator)
  (let ((points (thomae-rational-points (rational-numbers max-denominator))))
    (with-open-file (stream "thomae.dat" :direction :output)
      (loop for point in points do
        (format stream "~4$ ~4$~%" (first point) (second point))))))
 
(thomae 500)

To create the images I used gnuplot commands:

plot "thomae.dat" using 1:2 with dots
plot "thomae.dat" using 1:2 with points

and Photoshop.

Thursday, June 30, 2011

Rod Johnson on Entrepreneurialism

One thing I think that you really need to be careful of as well, particularly if you, like me, are a programmer, is don’t get carried away writing code. Typically in my experience anyone who is a good programmer is pretty passionate about it, love writing code, get addicted to the process of writing code, fell pretty good about their code basis. As soon as you get down that path you are not thinking straight anymore and now you are increasing your emotional investment, you are having lots of fun writing interesting code and you are no longer in a place mentally where you are going to be trying to find some reason that you shouldn’t write that code. That has been a big lesson for me that the quicker I get to coding, the longer it takes me to ask the kind of questions I should ask upfront.
...
It is really, really hard to decide not to do things. One of the biggest killers of companies is trying to do too much. If you try to take on too many things you will assuredly fail, even if every one of those things is a good thing to do. It is incredibly hard to realize that a particular thing is a good idea, but you are not going to do it.
...
I think the biggest way to decide frankly if you are trading off business priorities is do the boring stuff like, look at the total addressable market, go and talk to customers, figure out what they will pay for. You really need to be guided by what the revenue is likely to be, and make sure you don’t just do something just because it’s cool.

Monday, June 27, 2011

Math and Physics of Benderama

The last episode of Futurama has interesting formula involved. The entire plot is based on the Professor's latest invention — Banach-Tarski Dupla-Shrinker — the machine that produces two copies of any object at a 60% scale. It was just a matter of time when Bender found a good usage of this machine: to replicate himself. Then two small copies of Bender replicated themselves making four smaller copies, and so forth. At some point the Professor horrified the crew that if they don't stop this unlimited growth, the total mass of all Benders will eventually be so big that the entire Earth will be consumed during the process of replication. As a proof he demonstrated this formula of the mass of all generations of Bender

This is a perfect toy for a science geek. The first obvious question it brings: is this formula mathematically correct? As it turns out, it is not. Considering the scale of 60%, the cubic dependency of volume on linear dimension, and the constant density of all copies, the formula should be the following

As you can see the total mass of infinite number of Benders actually converges to approximately 1.76 M₀. So from Math perspective there is nothing to worry about. But what if our assumption of constant density is invalid. Would it be a problem from Physics perspective? Let's see.

Knowing that every new copy has a size of 0.6 of the original it was made from, we have the following formula for the size of Bender in the n^th generation

This exponential function becomes very small pretty soon. In the 154^th generation it already reaches the Planck length, after which the further replication is physically impossible. If we calculate the total mass of 154 Bender's generations using the Professor's formula, we get H(154) × 238 kg ≈ 1,337.56 kg, which is nothing comparing to the Earth mass.

So we have to admit that from both Math and Physics perspective the Professor was wrong, and there was no real threat to the Earth.

Although the Professor's formula doesn't describe the replication process adequately, it's still a beautiful piece of Math because it's a formula of harmonic series. If you want to know why harmonic series is beautiful and which real processes it describes, read this nice article of John H. Webb.

And don't miss the next episode of Futurama this Thursday :-)

Wednesday, June 08, 2011

Functional Groovy switch statement

In the previous post I showed how to replace chained if-else statements in Groovy with one concise switch. It was done for the special case of if-stement where every branch was evaluated using the same condition function. Today I want to make a generalization of that technique by allowing to use different conditionals.

Suppose your code looks like this:

if (param % 2 == 0) {
    'even'
} else if (param % 3 == 0) {
    'threeven'
} else if (0 < param) {
    'positive'
} else {
    'negative'
}

As soon as every condition operates on the same parameter, you can replace the entire chain with a switch. In this scenario param becomes a switch parameter and conditions become case parameters of Closure type. The only thing we need to do is to override Closure.isCase() method as I described in the previous post. The safest way to do it is to create a category class:

class CaseCategory {
    static boolean isCase(Closure casePredicate, Object switchParameter) {
        casePredicate.call switchParameter
    }
}

Now we can replace if-statement with the following switch:

use (CaseCategory) {
    switch (param) {
        case { it % 2 == 0 } : return 'even'
        case { it % 3 == 0 } : return 'threeven'
        case { 0 < it }      : return 'positive'
        default              : return 'negative'
    }
}

We can actually go further and extract in-line closures:

def even = {
    it % 2 == 0
}
def threeven = {
    it % 3 == 0
}
def positive = {
    0 < it
}

After which the code becomes even more readable:

use (CaseCategory) {
    switch (param) {
        case even     : return 'even'
        case threeven : return 'threeven'
        case positive : return 'positive'
        default       : return 'negative'
    }
}

Tuesday, June 07, 2011

Nothing new under the sun

Every generation of software developers needs its own fad. For my generation it was Agile, for generation before it was OOP, and before that it was another big thing. Gerald Weinberg, one of the most influential people in our industry, blogged yesterday about this issue. With over 50 years of experience in software development he knows what he is talking about. Read his blog post — he has a very good point.

P.S. I'm wondering what will be the next big thing. Will it be Cloud or Big Data?

Sunday, June 05, 2011

Multimethods in Groovy

Every time I switch from Groovy to Java I have to remind myself that some things that seem so natural and work as expected in Groovy, don't work in Java. One of such differences is method dispatching. Groovy supports multiple dispatch, while Java does not. Therefore the following code works differently in Groovy and Java:

public class A {
    public void foo(A a) { System.out.println("A/A"); }
    public void foo(B b) { System.out.println("A/B"); }
}
public class B extends A {
    public void foo(A a) { System.out.println("B/A"); }
    public void foo(B b) { System.out.println("B/B"); }
}
public class Main {
    public static void main(String[] args) {
        A a = new A();
        A b = new B();
        a.foo(a);
        b.foo(b);
    }  
}

$ java Main
A/A
B/A

$ groovy Main.groovy
A/A
B/B

Wednesday, June 01, 2011

Reversing Groovy switch statement

Recently I've been working on a Groovy code that had many methods with long multibranch conditionals like this:

def parse(message, options) {
    if (options.contains('A')) {
        parseARule message
    } else if (options.contains(2)) {
        parseSmallDigitRule message
    ...
    } else if (options.contains(something)) {
        parseSomeRule message
    } else {
        parseSomeOtherRule message
    }
}

Although this code is working, it is hard to see which branch is called under which condition. It would be much better if we could replace this code with something like Lisp cond macro. The best candidate for such a task in Groovy would be a switch statement. If we could only refactor the code above to something like following, it would significantly improve readability:

def parse(message, options) {
    switch (options) {
        case 'A' : return parseARule(message)
        case 2   : return parseSmallDigitRule(message)
        ...
        case ... : return parseSomeRule(message)
        default  : return parseSomeOtherRule(message)
    }
}

Unfortunately, this code doesn't work out of the box in Groovy, but it works if we do some metaprogramming.

The way switch statement works in Groovy is a bit different than in Java. Instead of equals() it uses isCase() method to match case-value and switch-value. The default implementation of isCase() method falls back to equals() method, but some classes, including Collection, override this behaviour. That's why in Groovy you can do things like this:

switch (value) {
    case ['A','E','I','O','U'] : return 'vowel'
    case 0..9                  : return 'digit'
    case Date                  : return 'date'
    default                    : return 'something else'
}

For our purposes we need some sort of reverse switch, where collection is used as a switch-value, and String and Integer are used as a case-value. To do this we need to override default implementation of isCase() method on String and Integer classes. It's not possible in Java, but is very easy in Groovy. You can change method implementation globally by replacing it in corresponding meta class, or locally with the help of categories. Let's create a category that swaps object and subject of isCase() method:

class CaseCategory {
    static boolean isCase(String string, Collection col) {
        reverseCase(string, col)
    }
    static boolean isCase(Integer integer, Collection col) {
        reverseCase(integer, col)
    }
    // Add more overloaded methods here if needed

    private static boolean reverseCase(left, right) {
        right.isCase(left)
    }
}

Now we can use this category to achieve the goal we stated at the beginning of this post:

def parse(message, options) {
    use (CaseCategory) {
        switch (options) {
            case 'A' : return parseARule(message)
            case 2   : return parseSmallDigitRule(message)
            ...
            case ... : return parseSomeRule(message)
            default  : return parseSomeOtherRule(message)
        }
    }
}

If you are comfortable with global method replacement, you can amend String and Integer meta classes. In this case you don't need to wrap switch statement with use keyword.

Anyways, with or without category, the final code looks better than the original noisy if-else chain. And you have learned the technique of reversing switch statement.

Thursday, February 03, 2011

Lazy lists in Groovy

I like lazy evaluation, and it's one of the reasons I like Haskell language so much. Although from engineering perspective lazy evaluation is probably not the most needed feature, it's definitely very useful for solving some mathematical problems.

Most languages don't have lazy evaluation out of the box, but you can implement it using some other language features. This is an interesting task, and I use it as a code kata which I practice every time I learn a new strict language.

So, how to implement lazy lists in strict languages? Very simple, if the language has functional capabilities. Namely, you build lazy list recursively by wrapping strict list within a function. Here is, for example, the strict empty list in Groovy:

[]

If we wrap it with a closure, it becomes lazy empty list:

{-> [] }

If we need a list with one element, we prepend (or speaking Lisp terminology 'cons') an element to lazy empty list, and make the result lazy again:

{-> [ element, {-> [] } ] }

To add more elements we continue the same process until all elements are lazily consed. Here is, for example, a lazy list with three elements a, b and c:

{-> [a, {-> [b, {-> [ c, {-> [] } ] } ] } ] }

Now, when you have an idea how to build lazy lists, let's build them Groovy way. We start with creating a class:

class LazyList {
    private Closure list

    private LazyList(list) {
        this.list = list
    }
}

The variable list encapsulates the closure wrapper of the list. We just need to expose some methods that allow constructing lists using procedure described above:

    static LazyList nil() {
        new LazyList( {-> []} )
    }

    LazyList cons(head) {
        new LazyList( {-> [head, list]} )
    }

Now we can construct lists by consing elements to empty list:

def lazylist = LazyList.nil().cons(4).cons(3).cons(2).cons(1)

To access elements of the list we implement two standard functions, car and cdr, that return head and tail of the list respectively.

    def car() {
        def lst = list.call()
        lst ? lst[0] : null
    }

    def cdr() {
        def lst = list.call()
        lst ? new LazyList(lst.tail()[0]) : nil()
    }

Here is how you use these functions to get first and second elements of the list constructed above

assert lazylist.car() == 1
assert lazylist.cdr().car() == 2

In Lisp there are built-in functions for various car and cdr compositions. For example, the previous assertion would be equivalent to function cadr. Instead of implementing all possible permutations, let's use Groovy metaprogramming to achieve the same goal.

    def methodMissing(String name, args) {
        def matcher = name =~ /c([ad])([ad]+)r/
        if (matcher) {
            matcher[0][2].reverse().toList().inject(this) {
                del, index -> del."c${index}r"()
            }."c${matcher[0][1]}r"()
        } else {
            throw new MissingMethodException(name, this.class, args)
        }
    }

It might look complicated, but in reality it's pretty simple if you are familiar with Groovy regex and functional programming. It's easier to explain by example. If we pass "caddr" as a value of name parameter, the method will create a chain on method calls .cdr().cdr().car() which will be applied to delegate of the operation which is our LazyList object.

With this method in place we can call car/cdr functions with arbitrary depth.

assert lazylist.caddr() == 3

If you create nested lazy lists, you can access any element of any nested list with this dynamic method.

def lmn = LazyList.nil().cons('N').cons('M').cons('L')
def almnz = LazyList.nil().cons('Z').cons(lmn).cons('A')
assert almnz.cadadr() == 'M'

With so many cons methods it's hard to see the structure of the list. Let's implement lazy method on ArrayList class that converts strict list to lazy. Again, we will use metaprogramming and functional techniques.

ArrayList.metaClass.lazy = {
    -> delegate.reverse().inject(LazyList.nil()) {list, item -> list.cons(item)}
}

Now we can rewrite the previous example as follows

def lazyfied = ['A', ['L','M','N'].lazy(), 'Z'].lazy()
assert lazyfied.cadadr() == 'M'

What have we accomplished so far? We learned how to build lazy lists from scratch and from strict lists. We know how to add elements to lazy lists, and how to access them. The next step is to implement fold function. fold is the fundamental operation in functional languages, so our lazy lists must provide it.

    boolean isEmpty() {
        list.call() == []
    }

    def fold(n, acc, f) {
        n == 0 || isEmpty() ? acc : cdr().fold(n-1, f.call(acc, car()), f)
    }

    def foldAll(acc, f) {
        isEmpty() ? acc : cdr().foldAll(f.call(acc, car()), f)
    }

The only difference between this fold function and the standard one is the additional parameter n. We will need it later when we implement infinite lists. foldAll function to lazy lists is the same as standard fold to strict lists.

assert [1,2,3,4,5].lazy().foldAll(0){ acc, i -> acc + i } == 15
assert [1,2,3,4,5].lazy().fold(3, 1){ acc, i -> acc * i } == 6

First example calculates the sum of all elements of the list, second calculates the product of first three elements.

If you have fold functions you can easily implement take functions

    def take(n) {
        fold(n, []) {acc, item -> acc << item}
    }

    def takeAll() {
        foldAll([]) {acc, item -> acc << item}
    }
   
    def toList() {
        takeAll()
    }

take is an inverse operation to lazy

assert [1,2,3,4,5].lazy().takeAll() == [1,2,3,4,5]
assert [1,2,3,4,5].lazy().take(3) == [1,2,3]

Our next goal is map function on lazy lists. Ideally I want the implementation look like this

    def map(f) {
        isEmpty() ? nil() : cdr().map(f).cons(f.call(car()))
    }

For some reason it doesn't work lazy way in Groovy — it's still strictly evaluated. Therefore I have to implement it directly with closure syntax

    def map(f) {
        isEmpty() ? nil() : new LazyList( {-> [f.call(car()), cdr().map(f).list]} )
    }

Unlike fold, lazy map is identical to strict map

assert [1,2,3,4,5].lazy().map{ 2 * it }.take(3) == [2,4,6]

The following example shows one of the benefits of laziness

assert [1,2,3,0,6].lazy().map{ 6 / it }.take(3) == [6,3,2]

map didn't evaluate the entire list, hence there was no exception. If you evaluate expression for all elements, the exception will be thrown

try {
    [1,2,3,0,6].lazy().map{ 6 / it }.takeAll()
}
catch (Exception e) {
    assert e instanceof ArithmeticException
}

For strict lists this is a default behaviour of map function.

The last function I want to implement is filter

    def filter(p) {
        isEmpty() ? nil() :
            p.call(car()) ? new LazyList( {-> [car(), cdr().filter(p).list]} ) :
                cdr().filter(p)
    }

In the following example we find first two elements greater than 2

assert [1,2,3,4,5].lazy().filter{ 2 < it }.take(2) == [3,4]

With the help of car/cdr, fold, map and filter you can implement any other function on lazy lists yourself. Here is, for example, the implementation of zipWith function

    static def zipWith(alist, blist, f) {
        alist.isEmpty() || blist.isEmpty() ? nil() :
            new LazyList( {-> [
                f.call(alist.car(), blist.car()),
                zipWith(alist.cdr(), blist.cdr(), f).list
            ]} )
    }

Now, after we implemented all lazy functions we need, let's define infinite lists

    private static sequence(int n) {
        {-> [n, sequence(n+1)]}
    }

    static LazyList integers(int n) {
        new LazyList(sequence(n))
    }

    static LazyList naturals() {
        integers(1)
    }

Infinite lists, from my point of view, is the most useful application of lazy lists

def naturals = LazyList.naturals()
assert naturals.take(3) == [1,2,3]

def evens = naturals.map { 2 * it }
assert evens.take(3) == [2,4,6]

def odds = naturals.filter { it % 2 == 1 }
assert odds.take(3) == [1,3,5]

assert naturals.cadddddddddr() == 10

def nonnegatives = naturals.cons(0)
assert nonnegatives.cadr() == 1

assert LazyList.zipWith(evens, odds){ x, y -> x * y }.take(4) == [2,12,30,56]

At this point you have all basic functionality implemented, and you should be able to extend this model to whatever you need in regards to lazy (infinite) lists. Happy lazy programming!

Resources and links

• Source code for this blog

• Lazy list implementation in Erlang

• Lazy list implementation in Lisp

Saturday, January 29, 2011

Counting modifications in Git repository

Recently Michael Feathers wrote a blog about Open-Closed Principle, where he described simple technique that measures the closure of code. I created a Groovy script which implements this technique for Git repositories. If you run it from the root of your Git project, it produces a CSV file with the statistics of how many times files have been modified. Feel free to use this script to find hot spots in your Git repository.

Monday, January 24, 2011

Git + Maven

When I first started working with Git in my Maven projects (three years ago), it was very awkward. Half of the release commands didn't work at all. Second half worked, but with ugly workaround via faked remote repository, which violated the entire Git philosophy.

Since then most of the issues have been resolved, including the following three which I mostly needed:

support for local Git repositories;
separation of git-commit and git-push commands in Maven release plugin;
critical bug fixes in Maven release and scm plugins.

I created a cheat sheet describing the way I typically set up and manage Git-Maven projects. Feel free to use it in your projects as well.

Side Notes