If you have never heard about such a topic, here is the chance, otherwise it's the hundredth time you see such title, and you are probably thinking "Another article about this topic, I'll probably not learn anything new, but anyway I'm bored so I'm gonna read it ...". And you are probably 90% right, nothing really new here, but I promise you a couple of colorful and beautiful charts that we don't have the opportunity to see everyday (and the ability to create your own) .
The first time I started wondering about collection performances was when I started working with some > 100 000 elements collections. At that time, I heard some bad jokes such as "I just understood why the Java logo is a cup of coffee, because Java collections are so slow that when manipulating them, you have the time to go and grab some coffee before they do the job ... Just kidding' !".
At that time, I wanting to use am implementation of a java.util.List that would have good performances on all the common methods provided by the interface (let's say get(index), add(Object), remove(Object), remove(index), contains(Object), iterator(), add other methods that you like), without wondering about memory usage (I mean, even this List would take 4 times the size of a LinkedList it wouldn't be a big deal).
In other words, some List that would not be instantiated a million times in my application, but a couple of times, and each instance will have great performances. For example, the model of a GUI Table, or some other GUI component, which data will evolve frequently, usually in batch of operation (the user add 100 000 elements, remove 50 000, etc.) , and which performances will sometimes be critical.
First of all, what do I mean by good performances ? Usually good performances are equivalent to an average complexity of O(1). But this one is impossible to get all the time, so at first let's study current JDK List/Collection implementations and see what we have.
Well, we already know so basic information about Collection methods complexity, such as the fact that a HashSet has an average O(1) complexity for its contains(o) method, or that ArrayList is also in O(1) for its method get(int index), etc.
For more details, check this post of Lalit Pant which did a nice summary on this subject.
Let's do some basic benchmark on some collections to see concreatly what they are capable of ...
For that purpose, I wrote this very simple benchmark code, that is not very elegant I have to admit, but do the trick. Here is the core of the benchmark :
private void execute(BenchRunnable run, int loop, String taskName) { System.out.print(taskName + " ... "); // set default context collection.clear(); collection.addAll(defaultCtx); // warmup warmUp(); isTimeout = false; // timeout timer Timer timer = new Timer((int) timeout, new ActionListener() { @Override public void actionPerformed(ActionEvent e) { isTimeout = true; // to raise a ConcurrentModificationException or a // NoSuchElementException to interrupt internal work in the List collection.clear(); } }); timer.setRepeats(false); timer.start(); long startTime = System.nanoTime(); int i; for (i = 0; i < loop && !isTimeout; i++) { try { run.run(i); } catch (Exception e) { // on purpose so ignore it } } timer.stop(); long time = isTimeout ? timeout * 1000000 : System.nanoTime() - startTime; System.out.println((isTimeout ? "Timeout (>" + time + "ns) after " + i + " loop(s)" : time + "ns")); // restore default context // the collection instance might have been corrupted by the timeout, // create a new instance try { Constructor<? extends Collection> constructor = collection.getClass().getDeclaredConstructor( (Class<?>[]) null); constructor.setAccessible(true); collection = constructor.newInstance(); collection = collection.getClass().newInstance(); // update the reference if (collection instanceof List) { list = (List<String>) collection; } } catch (Exception e1) { e1.printStackTrace(); } // gc to clean all the stuff System.gc(); }
where the BenchRunnable is just like a Runnable, but which can use the current loop index :
private interface BenchRunnable { public void run(int loopIndex); }
and the warmUp() method will manipulate a little the collection, to be sure that the internal structure is allocated.
A timer is used to timeout too long method benchmark, indeed my purpose is not to have a complete benchmark, but just a global idea of what implementations of Collection are efficient for a given method.
For example, I will call this method this way :
int nb = 1000000; final List<String> list = new LinkedList<String>(); execute(new BenchRunnable() { @Override public void run(int i) { list.add(Integer.toString(i % 100)); } }, nb, "add " + nb + " distinct elements : ");
I will finally display the results in a JFreeChart component, because the console output is not very friendly to compare results.
The complete source of the Benchmark class can be found here, feel free to play with it ! (To those who will think "You should have used a proper tool to do this stuff", I would answer "Yes, you're totally right, but the fact is that when I started it, I didn't mean it to be so big in the end. Anyway, I like the press-button execution ...").
I first launched it on a couple of famous lists, from the JDK, from Javolution, from Apache Commons, I also added a HashSet, just for the records. Here is what I got (the colorful charts that I promised you) (Note that every bench is done with an instance of the given collection populated with 100 000 elements) :
We can see some interesting things here, we confirm what we already knew about some collections, the CopyOnWriteArray is significantly slow on data modification, etc, we also can see, between ArrayList and LinkedList some significant differences (OK, the benchmark does not cover all the different cases, and maybe we are in the particular case were ArrayList is better than LinkedList, anyway it gives us an interesting overall).
The point here is to talk about List and especially fast implementations, but just for the records, I post the results that I got for some Sets :
- A CombinedList
By this I mean a CombinedList which will use internally a HashSet, an ArrayList and a TreeList, and in each of its method, will use the data structure the most efficient to do the job, and so my CombinedList will be almost as fast as the fastest collection for all of its methods.
By almost I mean their will be a little delay involved by the fact that we will sometimes need to synchronize the data in the internal collections. But it's not a big deal, because the clear() and addAll(collection) methods are quite fast, as we saw in our first charts, and they are definitely faster than some other O(n) or O(n x n) operations on some collections. So it's faster to recreate from scratch the all collection than applying a remove(Object) on an ArrayList (once again, here I don't care about Memory and GC).
So here we go, here are the attributes of my CombinedList
/** HashSet */ private HashSet<E> hashSet; /** ArrayList */ private ArrayList<E> arrayList; /** LinkedList */ private Tree linkedList; /** If hashSet is up-to-date */ private boolean isHashSetUpToDate = true; /** If arrayList is up-to-date */ private boolean isArrayListUpToDate = true; /** If linkedList is up-to-date */ private boolean isLinkedListUpToDate = true;
After that, in each method I will use the fastest collection, for example, the contains(o) has an average O(1) in the HashSet implemetation, so :
/** * @see java.util.Collection#contains(java.lang.Object) */ @Override public boolean contains(Object o) { // most efficient is HashSet updateHashSetIfOutOfDate(); return hashSet.contains(o); }
where the updateHashSetIfOutOfDate() will be coded like this :
/** * Update the hashSet if out of date */ private void updateHashSetIfOutOfDate() { // one of the two collection arrayList or linkedList is necessarily up-to-date if (!isHashSetUpToDate) { hashSet.clear(); if (isArrayListUpToDate) { hashSet.addAll(arrayList); } else { hashSet.addAll(linkedList); } isHashSetUpToDate = true; } }
Concerning data modification, we will do a similar trick :
/** * @see java.util.List#set(int, java.lang.Object) */ @Override public E set(int index, E element) { modCount++; // most efficient is ArrayList updateArrayListIfOutOfDate(); E old = arrayList.set(index, element); setOtherThanArrayListOutOfDate(); return old; }
where setOtherThanArrayListOutOfDate() will look like :
private void setOtherThanArrayListOutOfDate() { isHashSetUpToDate = false; isLinkedListUpToDate = false; }
We'll extend <strong>java.util.AbstractList</strong>, and don't forget to increment modCount when changing internal data, so that the ConcurrentModificationException mechanism is inherited from it.
Finally, we'll do some custom optimization, on the retainAll(Collection) as mentioned on this Java bug.
And we are done, the complete source of the CombinedList class can be found here.
And here are the benchmark results :
We are quite good on contains(Object) and containsAll(Collection) thanks to the HashSet, and we should statistically be a better than ArrayList on remove(Object) and removeAll(Collection) thanks to LinkedList. We could eventually replace the LinkedList by a TreeList to be better on the remove (TreeList is in O(log n) for the remove). We could also parametrized the use of an internal collection with the size of the data, or the index of the data regarding the size of the list etc.
- Memory usage of Collections
For that purpose, and as I still don't want to use Yourkit profiler today (which is an excellent tool by the way), I just added some simple method to measure our collection size :
gc(); long usedMemory = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed(); Constructor<? extends Collection> constructor = clazz.getDeclaredConstructor((Class<?>[]) null); constructor.setAccessible(true); // do the test on 100 objects, to be more accurate for (int i = 0; i < 100; i++) { this.collection = (Collection<String>) constructor.newInstance(); // polulate collection.addAll(defaultCtx); // do some operation to be sure that the inernal structure is allocated collection.remove(0); if (collection instanceof List) { ((List) collection).indexOf(((List) collection).get(collection.size() - 1)); } collection.iterator(); collection.toArray(); } gc(); // measure size long objectSize = (long) ((ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed() - usedMemory) / 100f); System.out.println(clazz.getCanonicalName() + " Object size : " + objectSize + " bytes"); collection.clear(); collection = null;
And display the results in the same JFreeChart component :
Ok, we have what we were expecting, that is to say our CombinedList size a LinkedList + an ArrayList + a HashSet.
Also, this means that this List will generate a lot of GC, and thus make it not suitable for massive random operations. But for my first purpose of a model for my JTable, it's a pretty good deal !
But the fact is that we execute several (thousand of ) times the same method, so I consider the JIT compilation time is negligible.
Also, what about Heap allocation and status and its impact on the test ? For this purpose, I carefully free all objects at the end of each collection benchmark, and call a heavyGc() to be sure to minimize this matter.
private void heavyGc() { try { System.gc(); Thread.sleep(200); System.runFinalization(); Thread.sleep(200); System.gc(); Thread.sleep(200); System.runFinalization(); Thread.sleep(1000); System.gc(); Thread.sleep(200); System.runFinalization(); Thread.sleep(200); System.gc(); } catch (InterruptedException ex) { ex.printStackTrace(); } }
This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..
ReplyDeleteAWS Training in Bangalore
AWS training in sholinganallur
AWS training in Tambaram
AWS training in Velachery
Attend The Python Training in Bangalore From ExcelR. Practical Python Training in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Python Training in Bangalore.
ReplyDeleteI liked this post. It was really helpful and informative. Keep blogging and sharing with us. Thanks!! machine learning course bangalore
ReplyDeleteThank you for your post. This is excellent information. It is amazing and wonderful to visit your site.Prathima Infotech training center bangalore
ReplyDeleteSAP APO Training in Bangalore
sap gts training in bangalore
sap testing training in bangalore
sap fiori training in bangalore
I like that article is very awesome content. thanking you.
ReplyDeletePython Training in Chennai
Python Training in Training
Python Training in Bangalore
Python Hyderabad
Python Training in Coimbatore
I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
ReplyDeleteangular js training in chennai
angular js training in tambaram
full stack training in chennai
full stack training in tambaram
php training in chennai
php training in tambaram
photoshop training in chennai
photoshop training in tambaram
Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
ReplyDeleteoracle training in chennai
oracle training in omr
oracle dba training in chennai
oracle dba training in omr
ccna training in chennai
ccna training in omr
seo training in chennai
seo training in omr
Good Post! , it was so good to read and useful to improve my knowledge as an updated one, keep blogging. After seeing your article I want to say that also a well-written article with some very good information which is very useful for the readers....thanks for sharing it and do share more posts like this.
ReplyDeleteSAP Online Training
thanks for sharing this article.its very informative.Angular training in Chennai
ReplyDeletevery intersting to read your blog and it makes the viewers to visit your blog and keep on updating.
ReplyDeleteTally Training in Bangalore
Tally Course in Hyderabad
Such a good post .thanks for sharing
ReplyDeleteGerman Classes in Porur
German Classes in chennai