Long story short, the sorting algorithms used were to blame. I couldn't identify which one was being used (proprietary or more advanced than I have learned?), but magic happened when I replaced the sorter with the well-known mergesort algorithm. In preliminary testing, the problem is pretty much gone for the size of live data the company typically works with. Amazing!
My real excitement though is in the fact that I got to use (so soon) knowledge from a previous school class I had taken (Design and Analysis of Algorthms, and Data Structures). I didn't expect to be using ideas from the class in a real-world environment yet, but I found myself doing real analysis of the client's code like we had done in assignments, and making a decision based solely on the results - and it worked. Sometimes class materials don't make complete sense until you have to use them practically.
Mergesort is a divide-and-conquer comparison-based algorithm that runs in ϑ(n log n) computational time. Its only drawback is that it requires significantly more memory to do its job than other algorithms. Here's the Java implementation of mergesort:
public class MergesortListSorter implements ListSorter {
private final Comparator comp;
public MergesortListSorter(Comparator comp) {
assert (comp!=null): "Comparator cannot be null.";
this.comp = comp;
}
public List sort(List list) {
assert (list!=null): "List cannot be null.";
return mergesort(list, 0, list.size()-1);
}
private List merge(List left, List right) {
List result = new LinkedList();
Iterator l = left.iterator();
Iterator r = right.iterator();
l.first();
r.first();
while(!(l.isDone() && r.isDone())) {
if(l.isDone()) {
result.add(r.current());
r.next();
} else if (r.isDone()) {
result.add(l.current());
l.next();
} else if (comp.compare(l.current(), r.current()) <= 0) {
result.add(l.current());
l.next();
} else {
result.add(r.current());
r.next();
}
}
return result;
}
private List mergesort(List list, int s, int e) {
if(s == e) {
List result = new LinkedList();
result.add(list.get(s));
return result;
}
int split = s + (e - s)/2;
List left = mergesort(list, s, split);
List right = mergesort(list, split+1, e);
return merge(left, right);
}
}
There's probably better implementations out there than this, but this variety worked well. The actual solution was a blend of mergesort and shellsort because both perform well in average and worst case scenarios. Java helps with the problem of memory because object references are used in collections (as opposed to objects themselves), so the overhead in memory is in the additional references that are created.