Whose Hadoop is Bigger? Really…

Actually, Datameer contributed the most code to the Hadoop ecosystem.

After weeks of hard work we made a surprising discovery, actually 99% of the Hadoop ecosystem code was written by Datameer engineers.

We hired an external, independent, respected analyst firm that had 5 PhD’s working on a new generation of algorithms that analyze the Hadoop ecosystem source code, jira posts, emails and ideas contributed in verbal conversations.

The breakthrough was that we analyzed the object inheritance and the call stack to weight the importance of each line of code. We also took the mental stability of contributors into account. BTW, if you still wondering what was in my coffee this morning, you don’t get my German sense of sarcasm.


When I joined the Nutch project in the early 2000’s, I was known to communicate my strong points of views very loudly in the community. I guess I lost some steam over the years, I have not even published a blog post in last few years and the Hadoop & Co mailing lists are on read only subscription.

But I felt I had to speak up about all this commotion around “my Hadoop is bigger than yours” currently lighting up the community.

I tried to take some wind out of this conversation over the last few months by using our product to analyze the Hadoop source code and present, in a very fun way, some Hadoop source code insights here. These analytics discovered the longest email conversation for the smallest code change or longest commit comment for the shortest change, etc etc.

So now we find our partners and friends sparring over whose contribution is bigger than the others. Frankly, this is all surprising to me since we have so much more work to do to move Hadoop forward. Don’t get me wrong, we love Hadoop for what it is but we all can agree that the code is still a work in progress, monolithic, difficult to test and concepts like inversion of control do not exist… I could go on for a while.

So actually I’m happy to announce that our own awesome engineering team is not responsible for this but instead focused on working on a great analytics product on Hadoop that brings great value to our customers.

Here at Datameer we work hard but also make sure we have a good time including sharing a laugh over the most stressful situations.

In that spirit we would love to contribute a laugh to the ongoing “civil war” in the Hadoop ecosystem.  To commemorate this epic discussion, we have designed a special t-shirt that we would love to share free with the community.

Ok, people, now back to work – lets build some great technology instead of arguing about lines of code.

P.S. We have some customers using DAS, their Hadoop is for sure bigger than yours. :)

Stefan Groschupf

