Creating MS Word reports with Java

If your application stores a lots of user content, it seems to be an obvious idea to offer some template mechanism, which can be used by your customers to create reports. So, the user can get a printable, styled and easy to process version of the interesting information out of your system, designed by himself and filled with data with just one click. Of course, creating and styling those templates should be easy enough for your customer to do it by himself… so, whats about using the well-known MS Word application for it? The user creates a document with his beloved office suite from Redmond, containing formatted text, images and charts and finally adds also some placeholders. Afterwards he passes this document to your application and, whenever the user wants to create a report, you magically fill the placeholders with the actual data from your system. Reporting can be so easy … if you can find some way to manipulate the word document the correct way with the tools your platform offers!

Continue reading

Advertisements

Why findAll on null returns empty lists in Groovy

Recently while debugging our grails application I saw something like this:

variable.findAll { it.condition }

Since the debugger told me that this variable was null, I happily thought I might have found my little bug and moved on waiting for the big crash and a nice NullPointerException. Instead I got an empty list.

Wait WAT?

That seemed rather bizarre to me, so I checked it inside a groovy shell.

null.findAll { true } ==> []
null.findAll { false } ==> []
null.findAll { it } ==> []
null.findAll {} ==> []
null.findAll() ==> []

Ok, what’s going on here? First thing to note is that Groovie’s null is an object.

null.getClass().name ==> org.codehaus.groovy.runtime.NullObject

From the Java perspective kind of surprising but since I’m familiar with Ruby it was somehow expectable. So far no magic, but where does that findAll method come from?

null.getMetaClass().methods*.name
==> [equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait, getMetaClass, getProperty, invokeMethod, setMetaClass, setProperty, asBoolean, asType, clone, equals, getNullObject, getProperty, hashCode, invokeMethod, is, iterator, plus, setProperty, toString]

Not there… so Groovy-Voodoo. Luckily there are some developers more experienced with Groovy than me in our company (even one who contributed to it some time ago), so I could ask someone else than Google. We got the source code (Groovy 1.8 in our case) and dug into it. The place where a lot of those magical methods dwell is DefaultGroovyMethods.java. According to the documentation the static methods in this class will be available in the class of each method’s first parameter. So here we found the following:

public static Collection findAll(Object self, Closure closure) {
    List answer = new ArrayList();
    Iterator iter = InvokerHelper.asIterator(self);
    return findAll(closure, answer, iter);
}

Which at least explains why that is available for null. Furthermore it shows us that findAll should work on any Object, too. A quick check in the console confirms this.

new Object().findAll { true } ==> [java.lang.Object@79ad86e9]

However it does not explain how the invocation works and why the result is []. So what’s happening here? The asIterator method simply invokes a method named iterator on self. Groovie’s NullObject defines this particular method in the following way:

public Iterator iterator() {
    return Collections.EMPTY_LIST.iterator();
}

This clearly explains why we get an empty list from our findAll call. In the case of an arbitrary GroovyObject we again find (after an unsuccessful lookup in GroovyObject.java) the iterator method for objects in the DefaultGroovyMethods class simply putting the object into a collection and iterating over it.

public static Iterator iterator(Object o) {
    return DefaultTypeTransformation.asCollection(o).iterator();
}

What is still missing to a full understanding of this phenomenon is how those default groovy methods get invoked. Covering this would be way beyond the scope of this blog post. If you browse around a little in the source all this meta stuff can get kind of overwhelming. What we can take out of this so far (beside getting confused by Groovie’s method invocation mechanisms) is some more awareness to the fact that anything and everything can happen in languages such as Groovy even when it all starts with an innocent null object…

The weak performance of the Groovy minus operator

During a recent hunt for performance problems within our grails application we’ve discovered that using the minus operator to subtract one list from another one seems to perform badly for list items which doesn’t implement the Comparable interface. This post will show the supposed reason for this bottleneck.

Have a look at the following simple microbenchmark which shows the problem:

We have a very simple pogo class which holds some data about books:

class Book {
    final String name
    final String author

    public Book(String  name, String author) {
        this.name = name
        this.author = author
    }

    public String toString() {
        return "Title: ${name} Author: ${author}"
    }
}

This Book class is used in an abstract unit-test to create one list of books, consisting of a configurable number of tech books and novels. The test cases will subtract the novels from the list: one time via an old school Java “removeAll” call, the second time via the Groovy minus-operator.

abstract class ListPerformanceTest extends TestCase {
    protected List<Book> novels
    protected List<Book> techBooks

    protected void setUp() {
        novels = createNovelsForFixture()
        techBooks = createTechBooksForFixture()
    }

    public void testGroovyMinus() {
        List<Book> allBooks = novels + techBooks
        measureTimeInMS("Subtracting ${novels.size()} novels from the list of ${allBooks.size()} books") {
            allBooks = allBooks - novels
        }
        assertEquals(techBooks.size(), allBooks.size())
    }

    public void testJavaRemoveAll() {
        List<Book> allBooks = novels + techBooks
        measureTimeInMS("Removing ${novels.size()} novels from the list of ${allBooks.size()} books") {
            allBooks.removeAll(novels)
        }
        assertEquals(techBooks.size(), allBooks.size())
    }

    public long measureTimeInMS(String description, Closure toMeasure) {
        long nanoTimeStart = System.nanoTime()
        toMeasure.call()
        long nanoTimeEnd = System.nanoTime()
        long result = (nanoTimeEnd - nanoTimeStart) * 1E-6
        println description + " took ${result} ms"
        return result
    }

    protected abstract List<Book> createNovelsForFixture()
    protected abstract List<Book> createTechBooksForFixture()
}

We created three subclasses: a SmallTestCase class (subtracting 1000 novels from a 2000 books’ list), a MediumTestCase (subtracting 2000 novels from a 4000 books’ list) and finally a LargeTestCase (subtracting 4000 novels from the list containing 8000 books).

Following you can find the results of the test runs:
Statistic Groovy minus

We performed the tests within the following environment:
Win-7 64 bit, JDK 1.6.0.33 (32 bit), Intel i5-2500 (4 cores, 3.3GHz per core), 8GB RAM

You could argue that it is not fair to compare both methods: the Groovy way will return a new list, the removeAll-method of Java uses an in-place-removement, so the original list will be lost. But just the fact that Groovy is creating a new list can not explain why the minus operator is so expensive: for the small list with 2000 books Groovy is about 1700 times slower than Javas’ removeAll, for the medium list even over 3000 times. And remember: all of this will only happen when your item class doesn’t implement Comparable.

That’s why we decided to do some more research to get an explanation. After some runs with JProfiler and the medium-sized test-case we found out that one of the biggest hotspots was the toString()-method of the Book class. According to the profiler it was called nearly 12 million times while performing the minus operation and burnt about 15% of the overall time. Because the profiler did not tell us anything about other hotspots we started with this trace and decided to take a look at the implementation of the minus operator within the Groovy source code under “Source release” of version 1.8.8.

We found the following implementation:

public class DefaultGroovyMethods extends DefaultGroovyMethodsSupport {
...
   public static <T> List<T> minus(List<T> self, Collection<?> removeMe) {
	...
        Comparator<T> numberComparator = new NumberAwareComparator<T>();
        if (nlgnSort && (self.get(0) instanceof Comparable)) {
		// Not interesting in our case, because Book doesn't implement Comparable
		...
        } else {
            //n*n version
            List<T> tmpAnswer = new LinkedList<T>(self);
            for (Iterator<T> iter = tmpAnswer.iterator(); iter.hasNext();) {
                T element = iter.next();
                boolean elementRemoved = false;
                for (Iterator<?> iterator = removeMe.iterator(); iterator.hasNext() && !elementRemoved;) {
                    Object elt = iterator.next();
                    if (numberComparator.compare(element, (T)elt) == 0) {
                        iter.remove();
                        elementRemoved = true;
                    }
                }
            }
            ...
        }
    }
...
}

At the first glance we did not find anything special about the code: it has a worst case scenario of O(n*m) and uses the so called “NumberAwareComparator” to decide, if two elements of the list are equal or not. So, let’s proceed with this Comparator:

public class NumberAwareComparator<T> implements Comparator<T> {
    public int compare(T o1, T o2) {
        try {
            return DefaultTypeTransformation.compareTo(o1, o2);
        } catch (ClassCastException cce) {
            /* ignore */
        } catch (GroovyRuntimeException gre) {
            /* ignore */
        }
        // since the object does not have a valid compareTo method
        // we compare using the hashcodes. null cases are handled by
        // DefaultTypeTransformation.compareTo
        int x1 = o1.hashCode();
        int x2 = o2.hashCode();
        if (x1 == x2) return 0;
        if (x1 < x2) return -1;
        return 1;
    }
}

You can find here a typical code smell in form of a misuse of the exception concept (see this refactoring catalogue for details): the comparator doesn’t check if the two instances can be compared at all by the DefaultTypeTransformation class before calling its compareTo method. Instead, the code relies upon the fact that DefaultTypeTransformation#compareTo will throw an exception whenever the list items can not be compared. In our case this will happen for nearly all Book items (except those pointing to one and the same reference). This leads to the following approximation of thrown exceptions:
Number of exceptions thrown by minus

So, in our example we will have about 6 million exceptions which will be thrown and afterwards just be ignored. Because the profiler ignores the overhead produced by the exceptions we did not have a chance to see this: the only trace for this problem were the 12 million calls of toString.

But everything we said up to now doesn’t explain them. To understand why we call toString so often, you just have to take a short look how the exception thrown by DefaultTypeTransformation#compareTo is constructed:

public class DefaultTypeTransformation {
...
    private static int compareToWithEqualityCheck(Object left, Object right, boolean equalityCheckOnly) {
        if (left == right) {
            return 0;
        }
	...
        throw new GroovyRuntimeException("Cannot compare " + left.getClass().getName() + " with value '" +
                left + "' and " + right.getClass().getName() + " with value '" + right + "'");
    }
}

As you can see here Groovys’ minus doesn’t only throw 6 million exceptions in our case but also use a string concatenation of the left and right element to construct the exception message, i.e. for each exception we will call toString() of the Book class twice.

Conclusion

You shouldn’t use the minus operator if you have lists with more than some hundred elements which don’t implement Comparable. We’ve tested this only with Groovy 1.8.8, because this is the version shipped with the current Grails version 2.1.1. But after a short look at the current Groovy 2.0.5 code it seems that the performance leak should still exist. And according to this jira issue we are not the only one having problems with the collection performance of Groovy. To avoid the minus operator just for performance reasons is a real pitty, because the minus syntax is not only much more readable than the method call but feels also “cleaner” in terms of having no side effects on the list it is called at.