About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

Hadoop Standardization Is Essential For Industry Growth

By on July 27, 2015

**This post first appeared on TechCrunch**

There’s no doubt that as data grows, Hadoop has proven to be the platform of choice for big data. According to Allied Market Research, the Hadoop market is worth an estimated $3 billion, and will surpass $50 billion by the end of the decade. Yet its long-term success depends on implementing standardization and a more structured process for innovation.

As one of the most widely adopted big data technologies, the Hadoop market needs more standardization so that whatever software a company develops or buys, they can be confident that it will work on the platform year after year. There is commercial pressure for vendors to differentiate themselves to gain market share, but the current approach (with no specifications or standard APIs) comes at the cost of incompatibility.

This incompatibility stifles innovation in the application layer above the platform, and further splinters the infrastructure landscape, which slows innovation there, as well. We only need to look to Java’s success and UNIX’s demise to realize the importance of standardization.

With a coordinated industry standards process, vendors can add their own bells and whistles on top of a standard API or optimize for behaviors, like performance, without affecting compatibility. The consistency and ease to developers and vendors up the stack will encourage more widespread adoption of Hadoop-based technologies, and, ultimately, the rising tide will lift all boats.

The Hadoop Market Misses The Mark On Unifying Standards

Currently there is no standards process. It doesn’t yet exist because the Apache Software Foundation has not pushed the concept of standardization into their ethos or operating model. While Apache has a great model for joint contribution of open source so organizations can collaborate, there is no similar process for product standardization or the development of formal specifications.

As new versions of Hadoop are released, there is nothing to stop vendors from modifying a release and changing how it behaves. If developers modify a release to fix a problem for one customer, they could potentially break applications and force other developers and vendors to spend time and resources fixing and working around each change.

Today, vendors and developers are bogged down with fixing applications and testing them in multiple versions of Hadoop after every release, and slowed in their migration of custom-built apps to a new version of Hadoop. Additionally, this complexity creates a Swiss-cheese matrix of platform support amongst vendors, with customers left forced to choose between one tool or the other, each trained to work around different bugs or limitations.

The goal should be to make Hadoop-based tools as easy to use as possible, but the current infighting among platforms is creating a more chaotic marketplace. Standardization is better for customers, better for ISVs (Independent Software Vendors) up the stack and, ultimately, better for all parties involved.

Implementing Standardization Eliminates Risks

So how can Hadoop implement governance? The Java Enterprise Edition (JEE) platform and its community process for standardization (JCP) is a prime example of how we can successfully standardize Hadoop. Vendors can submit a Java Specification Request (JSR) that a committee of thought leaders reviews and decides upon.

The “working group” can then provide a reference implementation of that standard to illustrate what it would look like if you built it, giving a complete demonstration and blueprint of how to use the standard in the right way, what is and is not allowed and how ISVs could build a product on top of that. Ultimately, it protects companies’ investments in Java-based technologies, encouraging more adoption.

Those standards were ultimately part of all the products that were brought to market in the Java-based middleware and application infrastructure. All of those products were built on JEE as a foundation and required to adhere to these standards.

By creating reliable standards, the Java community successfully pulled the software forward and drove demand by providing a stable foundation for an application ecosystem. On the flip side, UNIX failed to created standardizations in the 1970s and 1980s, and ultimately lost its market share to Linux.

As application vendors across verticals increasingly build applications based on Hadoop, there will also be an increasing momentum toward standardization. We are seeing more Hadoop-native applications being conceived and built, and companies are being formed and funded purely to deliver Hadoop-based applications.

As the Hadoop community continues to grow, it will become more difficult to accept the pain points that come with a lack of standardization.

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook

Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.