Tuesday, January 01, 2008

It's In The Sorting, Dude!

I just finished consulting on an interesting project: this client's inhouse web application had been working great until recently when it seemed to take longer and longer to respond to requests. Being it is an SOA application that uses AJAX all over, speed is of the essence for an overall satisfactory user experience. A couple of times, the IT guys had upgraded the servers running the webapp (hardware), but the problem lingered. So obviously the webapp itself was to blame - but where would the issue be in an application that's been relatively bug free since its inception?
Long story short, the sorting algorithms used were to blame. I couldn't identify which one was being used (proprietary or more advanced than I have learned?), but magic happened when I replaced the sorter with the well-known mergesort algorithm. In preliminary testing, the problem is pretty much gone for the size of live data the company typically works with. Amazing!

My real excitement though is in the fact that I got to use (so soon) knowledge from a previous school class I had taken (Design and Analysis of Algorthms, and Data Structures). I didn't expect to be using ideas from the class in a real-world environment yet, but I found myself doing real analysis of the client's code like we had done in assignments, and making a decision based solely on the results - and it worked. Sometimes class materials don't make complete sense until you have to use them practically.

Mergesort is a divide-and-conquer comparison-based algorithm that runs in ϑ(n log n) computational time. Its only drawback is that it requires significantly more memory to do its job than other algorithms. Here's the Java implementation of mergesort:


public class MergesortListSorter implements ListSorter {
private final Comparator comp;

public MergesortListSorter(Comparator comp) {
assert (comp!=null): "Comparator cannot be null.";
this.comp = comp;
}

public List sort(List list) {
assert (list!=null): "List cannot be null.";
return mergesort(list, 0, list.size()-1);
}

private List merge(List left, List right) {
List result = new LinkedList();
Iterator l = left.iterator();
Iterator r = right.iterator();
l.first();
r.first();
while(!(l.isDone() && r.isDone())) {
if(l.isDone()) {
result.add(r.current());
r.next();
} else if (r.isDone()) {
result.add(l.current());
l.next();
} else if (comp.compare(l.current(), r.current()) <= 0) {
result.add(l.current());
l.next();
} else {
result.add(r.current());
r.next();
}
}
return result;
}

private List mergesort(List list, int s, int e) {
if(s == e) {
List result = new LinkedList();
result.add(list.get(s));
return result;
}
int split = s + (e - s)/2;
List left = mergesort(list, s, split);
List right = mergesort(list, split+1, e);
return merge(left, right);
}
}


There's probably better implementations out there than this, but this variety worked well. The actual solution was a blend of mergesort and shellsort because both perform well in average and worst case scenarios. Java helps with the problem of memory because object references are used in collections (as opposed to objects themselves), so the overhead in memory is in the additional references that are created.