Benchmark in general and hibernate benchmark in more details

 Few weeks ago I was asked about Hibernate overhead compared to using direct SQL/JDBC.
The questioner expected to get a numeric answer, which will help him decide between the two. Unfortunately there is no short answer. During this post I will try to cover benchmark concept in general and Hibernate benchmark in particular.


Benchmark in general:

There is a saying that is attributed to Benjamin Disraeli (1804-1881) :
I will focus on the third kind - statistics

Few years ago benchmarks was very popular, but today people realize that test results are affected by dozen of parameters such as JVM flags, the number of cores, the hard disk drive , and others. Also, each framework reaches its best on a specific distinguished scenario when running on a specific environment. Eventually the popularity of benchmarks went down. Once in while I encounter benchmarks which demonstrate results of more than 1000% gap between different frameworks, and then, the gap vanish by only changing a single factor like heap size. Furthermore, some frameworks specialize in complex model, while other, in flat model. The conclusion is that the same test case will not bring the best from all frameworks. For those reasons and others, it’s very easy to create misleading benchmarks.

Should benchmark be the leading factor?

The short answer - No.
The fact that framework A is three times faster than framework B is not a guaranty to get the same results in our application.
There could be couples of reasons for that:
1. The benchmark didn’t use cache and in our application we use in cache.
2. It could be that in our application framework A contributes only one percent of the time, thus, replacing framework A for another that work three times faster will have negligible impact to our application performance.
3. It could be that framework B gains its performance speed by creating lots of objects, thus, in case our application suffers from frequent GC or has limited memory,  integrating framework B in our application instead of framework A will reduce performance.



Summary of the problem so far:

1. There are endless test cases; each of them can lead to different result.
2. Each framework need different configuration in order to get the best of it. Furthermore, different test cases scenarios will need different configurations for given framework in order to get the best of it.
3. Faster benchmark doesn’t mean our application will be faster.
4. Speed is not the only factor.

Benchmark and hibernate:

Hibernate performance can be affected by dozens factors, such as memory, DB; local or remote DB, the model complexity and etc’
Database is ranked as one of the main reasons for bottleneck. Hibernate is working with a Database. This brings Hibernate to be a popular player in application performance discussion.
Most of the time (almost 100%), Hibernate development team gets angry when a new post on hibernate performance pop-up. They respond that the benchmark that was performed was not written well. Examples for that:
Look on Gavin King response there:
“Carl, nobody gets upset about competition, people get upset over FUD benchmarks.

Take it offline, guys.”
The guy who post the above didn’t take hibernate unpleasant response for granted and published an angry post on hibernate performance and unpleasant response in his blog:
I posted some questions to the Hibernate forum to validate the approach I had taken and I received the rather terse and unhelpful reply “This is called 'A useless micro benchmark' and has been addressed a thousand times.” from a member of the Hibernate team. I did some reading of the Hibernate Performance FAQ and found some interesting statements in there, such as "We claim that Hibernate performs well" and "Many people try to benchmark Hibernate. All public benchmarks we have seen so far had (and most still have) serious flaws.” It almost seems like it is taboo to question Hibernate performance. I don't understand this defensive stance since most developers would recognize that there is a tradeoff between ease of use and performance.”


So, where is the truth - what is the overhead of using Hibernate?

If you read the post so far you should already know that you will not get a simple numeric answer from me. As Explained above, it depends on a many factors.
So how should one decide if to use Hibernate or not?
The simple answer - if its work for others it will probably work for you.
This answer is not a silver bullet because small differences between applications can cause a huge performance impact, however if your application has a mainstream common behavior, like other applications which use Hibernate, then it will probably work for you too.

As a child I did stupid things many times (I didn’t rehabilitate from this habit). When the word came to my dad, that I did something stupid, he used to ask me – “why did you do this?” and I always had the same answer – “I did this because everyone else did”. Then my father used to reply – "what if everyone was jumping off the roof, would you also do did this?”
The analogy is that the fact that everyone uses hibernate is not strong and sufficient case, but this case has some power. If everyone succeeds in using it, then the burden on proving there are problems in Hibernate passes to you. In most cases you will find out that the problems which popped up came from using the framework incorrectly, and not because Hibernate has problems or performance issues.

Stop blathering and give as some numeric answers

Hibernate FAQ states that there is 10% overhead on using this framework:
http://simoes.org/docs/hibernate-2.1/15.html “We claim that Hibernate performs well, in the sense that its performance is limited by the underlying JDBC driver / relational database combination. (Generally the overhead is much less than 10% of the JDBC calls.)”
“Most of the performance problems we have come up against have been solved not by code optimizations, but by adding new functionality. It turns out that the overhead of Hibernate itself, compared to equivalent direct SQL/JDBC, is almost always so small as to be irrelevant. So we concentrate our effort upon producing more efficient SQL/JDBC. The bottleneck is always the database itself.”

However, all over the web you will find people saying they think there is a greater overhead, which means less performance.  The truth is that most of them did not ran any benchmark test. Some, ran benchmark, but did run them well. They did not take into consideration that a good benchmark test should clear the DB cache, warm the JVM and etc’
Following the link to a post about consideration to be taken with JVM level, in order to create a good benchmark:
I chose to cover here three benchmarks.

The Ugly:

The ugly benchmark is so ugly that it caused Gavin King to be angry and even to invest time and publish a post on this benchmark. Why is it ugly?
A company named Software Tree released a new version of their ORM product, JDX. In addition they came equipped with STORM, the Tree Object/Relational Mapping Benchmark Software.
Since this is a commercial product I expect them to invest a minimal time and effort before they publish benchmarks, but they didn’t do that. Instead,  the code they wrote for benchmarking Hibernate was worse than a beginner/new comer to hibernate code.
It worth to look and see why Gavin doesn’t like benchmarks:
“Now, having spent a fair amount of time performance profiling database access technologies, I knew for a fact that these kinds of benchmarks are usually completely misleading. With such small datasets, accessed repeatedly, the database is able to completely cache results in memory; the benchmark never actually involves any real disk access (watch your hard drive while STORM runs). We never get to see what happens once the dataset is too large to fit in memory, or is being updated by another transaction. We never get to see what happens with the database is under load. In fact, this benchmark involves no concurrency at all! We never get to see any joins or any of those other things that happen in realistic use cases. Furthermore, these kinds of benchmarks are often run against a local database, which gives results that are absolutely meaningless once the database is installed on a physically separate machine. What this means is that
1. Any overhead added by the ORM is massively exaggerated compared to production scenarios
2. We cannot observe how the system scales

The interesting part of ORM performance comes when you start to investigate caching and especially the flexibility of association fetching strategies. In a nontrivial object model with associations, it is usually association fetching that limits performance. ORM tools must provide flexible ways of choosing between Lazy fetching plus process-level caching and
Eager fetching using outer joins.”

Gavin take their benchmark and play a little with it.
They started with 2 minute to Hibernate and 12 sec to JDX.
After changing some basic hibernate usage, the time has been reduced to 21 sec.
JDX doesn’t use in prepared statement which means that if you run small tests the benefit can be better. Gavin plays a little with the test and in some cases Hibernate wins JDX and in another the results are the same.
The above post is recommended for reading.

The Interesting:

Another interesting benchmark came from Italy.
Why is interesting? Because, unlike other benchmarks that compare the time that take for each tool to do X , this one compares the productivity of the frameworks for a given time (30 minutes).In additional unlike others that compare flat simple single class in almost empty DB here is uses  MySql sample database. The employee sample database e was developed by Patrick Crews and Giuseppe Maxia and provides a combination of a large base of data (approximately 160MB) spread over six separate tables and consisting of 4 million records in total. This benchmark comparea different JPA implementations, but unfortunately it didn’t add JDBC to this benchmark.
Results
These were the results of the tests per JPA implementation library. Notice that the time frame is fixed; 30 minutes for running each framework, and here is what each framework succeed to do in 30 minutes.
Number of queries+inserts executed
Number of queries executed
Number of inserts executed
Max mem occupied during the test(Mb)
Mem occupied after the test(Mb)
OpenJPA
3928
3530
398
96
61
Hibernate
12687
3080
9607
130
79
Toplink Essentials
5720
3740
1980
55
25
Eclipselink
5874
3735
2139
57
25

Red represent bad.
Bold represent top result.
Ps – Hibernate consume large memory but it did more work than others, so it’s reasonable.
First we need to grateful for the investment, but still this benchmark suffers from several benchmarks issues. It cleans the DB and avoid from basic problems, but have other issues that need to take into consideration when you do benchmark.  The guy did that benchmark gets a lot of response, few of them deal with the way he did the benchmark. He also publishes the reference in server side and gets a lot of response there.
Its worth to take a look on   John Stecher interesting comment before jumping to conclusions:
http://www.theserverside.com/news/thread.tss?thread_id=53142

.Net and Java

This benchmark is for both .Net and Java include Linq and Hibernate
The test suite consists of:
A simple query to return 1000 records.
A simple query to return 1 record.
A simple query to insert 1 record

His Conclusion:
“These tests demonstrate raw crunching speed of returning row data, if you are using complex queries where you have slow result times (greater than 10ms) from the database, all of these timings for the connection layer become negligible.
One solution does not fit all, as with most programming practices. Use your best judgment and try to keep your code simple, but not so simple that it creates an efficiency bottleneck. Real world is somewhere between the 1000 query result and the 1 query result set. Use your discursion on where your code resides within these bounds. These results lend me to emphasize that you use object oriented best practices around all your code, but if necessary on pure raw atomic crunching queries, use your judgment on when to jump the fence to the raw crunching power of ADO.NET or JDBC. Make sure your code is well commented.
JavaHibernateSQL/NHibernateToSQL(.NET) and compiled Linq queries appears to perform approximately 2 times slower than raw JDBC/ADO.NET at worst. Overhead as expected with an ORM”

But, after looking at his code I was disappointed that many benchmark considerations like cache, DB size … as explain above didn’t take into consideration here.

My favorite (which still need a lot of improvements):

This benchmark was published by PolePostion. PolePosition is a benchmark test suite to compare database engines and object-relational mapping technology. As of today it is by no means complete. Database vendors and open source database project participants are invited to improve the test implementations and to contribute further disciplines ("Circuits"). The PolePosition framework source code enable implementing new tests quickly and help measuring time, and outputs the results as number series and graphic visualizations.
They have some test suite for example – Barcelona:
Writes, reads, queries and deletes objects with a 5 level inheritance structure

The read results:
t [time in ms]
selects:100
objects:1000
selects:100
objects:3000
selects:100
objects:10000
selects:100
objects:30000
150
433
1317
4068
574
1300
4255
12377
331
990
3368
10530
19
43
162
483


But, The tests were written by people that admit they are not familiar with Hibernate also a lot of consideration in benchmarks as explain above didn’t take into consideration.
Any way – they did a great work and invested in code. Still they need to do some additional work there in order to be an excellent reference
This benchmark was published on Hibernate forum and gets the usual
Hibernate team answer:
“Since this "benchmark" has been written by someone who admits that he doesn't know anything about Hibernate, you won't get much usable information out of it. This particular thing has been debunked internally by a member of our team in 15 minutes - with now Hibernate being "100x faster". There is just no value in publishing this, since it would a) encourage others to waste our time again with similar stuff and b) cost more of our time than it is worth.

We get many benchmarks like this from competing vendors (note that this is particular benchmark very likely encouraged and/or sponsored by an object database vendor) and we don't have the time to debunk them all. We encourage you to create your own benchmarks if you want objective results. Some trivial micro-benchmarks that show the real JDBC overhead for unrealistic non-concurrent data access are available in the Hibernate distribution, as the "ant perftest" target.”

Overhead – is it the best tool for compare?
Look at the benchmark above, if you build your conclusion on the HSQLDB results you can come to a misleading conclusion that Hibernate overhead is 700% compared to JDBC, because as you can see above that HSQLDB JDBC time measurement is 19 and Hibernate time is 150. The overhead as a single metric is misleading in this case, because when deal with in memory DB or local DB or cached DB or small tables or simple model, any overhead added by the ORM is massively exaggerated compared to production scenarios. So overhead is not always the best tool.

What about Hibernate own tests?

Hibernate have a minimal and simple performance test, located under Hibernate source:
Hibernate-distribution-3.5.1-Final\project\testsuite\src\test\perf\org\Hibernate\test\perf\NewerPerformanceTest.java.
The model is simple and the DB doesn’t contain a lot of records, so the tests suffer from exaggeration of the ORM part. Even so I run them on local MySql and the results was 100% overhead. On a production DB (contention on the DB, complex model, remote…) the overhead will be much smaller, but I didn’t test it. In a post I published on Hibernate forum I encouraged Hibernate team to publish their tests and to have more performance visibility. When examining Hibernate we don’t need to focus on overhead only, we need to examine how much time it add to our specific application use cases. When you evaluate you should take into consideration – first and second level cache, lazy loading, merge algorism, optimistic lock handling and etc which can significantly reduce performance impact. In addition, don’t forget the Pareto rule. For the 20% hard problems that don’t get a full solution by Hibernate you can always fall back to direct JDBC, so when you make your considerations it’s very important to focus on the 80%.

Conclusions:

1.       Fact#1 - Hibernate works. Fact#2 –on average enterprise application it can work with good performance. Those two facts get an approval by thousand applications, which are shipped with Hibernate.
2.       Don’t believe to any benchmark that you see on the internet.
3.       Try to create your own benchmark based on your application needs, based on your application architecture (remote or local db/cache policy…).Run the benchmark on semi production environment.
4.       When you create a benchmark, focus on the 80%. Test the average scenario. The other 20% can get an alternative solution. Don’t chose a solution that cover all the 100% but required a lot of investment and has a curve learn.
5.       Overhead can be misleading. The real factor is the fixed time that adds to the use case.
6.       Hibernate is not suited to all applications. There are lots of alternatives out there like IBatis, Spring JDBC…
7.       Creating a good benchmark is not easy. Many factors are involved
8.       It will be great if hibernate team will improve their performance visibility. Cooperation with “PolePostion” is required.

Thanks to Yair Fine for the English review,I invented a new language when i wrote it and thank to Yair help the post now can read by brave English reader and not limited only to "Avihai" speakers, however all syntax mistake belong to me and Yair didnt have time to cover all.



תגובה 1:

  1. it looks like comments are disabled on the blog, or blogpost is having issues.
    So adding here some thoughs:

    1) I liked the historycal insight a lot, it was nice to see this collection of references; a good amount of directions about what everybody should consider.

    2) I don't consider it meaningful to measure the number of SQL statements performed per unit of time; as mentioned in many places by experienced people, the ORM "overhead" will hardly be a bottleneck in any real world scenario. What is really affecting real world scenario is the choice of statements, the working of caching, the configuration flexibility, the options to fall-back to JDBC in case of need.
    It's easy to understand that producing a different kind of JOIN statement could heavily affect the system performance; also in some cases a clever JPA implementation might use less statements than another to perform the business logic and perform better because of this, or others might perform better because they are instead generating more statements, but are simpler for the database to optimize, or have a better cache hit-ration and so perform much better.

    It would be great to see a benchmark using a big data set and not measuring the number of statements but the performance of "business methods" using JPA. How many blogposts can I insert per second using Hibernate? So as everybody should do already, you should stress-test your final applications and use your real world data to see what performs better in your specific case.

    3) Comparing JPA to JDBC has another issue: they are two different things. I hardly believe the "business layer" is going to interact with ResultSets - the maintenance cost would become very high - so in the JDBC use case to be fair you should still introduce a minimal mapping to minimal DTOs; and think about a fairly complete and flexible code .. not worth to try creating such a benchmark, as every line of code you introduce is hardly similar to what others would do.

    4) Business's point of view: performance/$$$:
    The best performance benchmark should consider that producing software has a cost, it takes time and there's a risk factor associated with custom-build code, and maintenance costs.
    Let's say you have a budget of X$ to build a new application having a performance requirement. Assuming that using Hibernate you'll have more productive developers, and a less costly overall maintenance, you're saving some part of X compared to the cost of building the same app using direct JDBC, this money is likely able to buy you more hardware and largely compensate for any possible performance issues - taking into consideration that for the 20% borderline critical performance issues you might have when using an ORM you can always fall back to plain SQL, and you can still reuse resource pools and object mappers provided by Hibernate to minimize the effort, the risks in your code and the amount of things to maintain and test.
    Not to mention that a simpler to maintain software will get you a better time-to-market in future for requirement changes and new features!
    Of course, that's all hard to "benchmark scientifically", just to point out that using an ORM just wins compared to JDBC, whatever you might measure in any benchmark.

    A completely different thing is to find out changes we could apply to Hibernate to improve it even more, that's always interesting to discuss and well-proven improvements are very likely to be accepted.

    השבמחק