**This post first appeared on TechCrunch**
There’s no doubt that as data grows, Hadoop has proven to be the platform of choice for big data. According to Allied Market Research, the Hadoop market is worth an estimated $3 billion, and will surpass $50 billion by the end of the decade. Yet its long-term success depends on implementing standardization and a more structured process for innovation.
As one of the most widely adopted big data technologies, the Hadoop market needs more standardization so that whatever software a company develops or buys, they can be confident that it will work on the platform year after year. There is commercial pressure for vendors to differentiate themselves to gain market share, but the current approach (with no specifications or standard APIs) comes at the cost of incompatibility.
This incompatibility stifles innovation in the application layer above the platform, and further splinters the infrastructure landscape, which slows innovation there, as well. We only need to look to Java’s success and UNIX’s demise to realize the importance of standardization.
With a coordinated industry standards process, vendors can add their own bells and whistles on top of a standard API or optimize for behaviors, like performance, without affecting compatibility. The consistency and ease to developers and vendors up the stack will encourage more widespread adoption of Hadoop-based technologies, and, ultimately, the rising tide will lift all boats.
Currently there is no standards process. It doesn’t yet exist because the Apache Software Foundation has not pushed the concept of standardization into their ethos or operating model. While Apache has a great model for joint contribution of open source so organizations can collaborate, there is no similar process for product standardization or the development of formal specifications.
As new versions of Hadoop are released, there is nothing to stop vendors from modifying a release and changing how it behaves. If developers modify a release to fix a problem for one customer, they could potentially break applications and force other developers and vendors to spend time and resources fixing and working around each change.
Today, vendors and developers are bogged down with fixing applications and testing them in multiple versions of Hadoop after every release, and slowed in their migration of custom-built apps to a new version of Hadoop. Additionally, this complexity creates a Swiss-cheese matrix of platform support amongst vendors, with customers left forced to choose between one tool or the other, each trained to work around different bugs or limitations.
The goal should be to make Hadoop-based tools as easy to use as possible, but the current infighting among platforms is creating a more chaotic marketplace. Standardization is better for customers, better for ISVs (Independent Software Vendors) up the stack and, ultimately, better for all parties involved.
So how can Hadoop implement governance? The Java Enterprise Edition (JEE) platform and its community process for standardization (JCP) is a prime example of how we can successfully standardize Hadoop. Vendors can submit a Java Specification Request (JSR) that a committee of thought leaders reviews and decides upon.
The “working group” can then provide a reference implementation of that standard to illustrate what it would look like if you built it, giving a complete demonstration and blueprint of how to use the standard in the right way, what is and is not allowed and how ISVs could build a product on top of that. Ultimately, it protects companies’ investments in Java-based technologies, encouraging more adoption.
Those standards were ultimately part of all the products that were brought to market in the Java-based middleware and application infrastructure. All of those products were built on JEE as a foundation and required to adhere to these standards.
By creating reliable standards, the Java community successfully pulled the software forward and drove demand by providing a stable foundation for an application ecosystem. On the flip side, UNIX failed to created standardizations in the 1970s and 1980s, and ultimately lost its market share to Linux.
As application vendors across verticals increasingly build applications based on Hadoop, there will also be an increasing momentum toward standardization. We are seeing more Hadoop-native applications being conceived and built, and companies are being formed and funded purely to deliver Hadoop-based applications.
As the Hadoop community continues to grow, it will become more difficult to accept the pain points that come with a lack of standardization.