Datameer Blog post
Big Data & Brews: Monte Zweben on the Value of HBase
by Datameer on Dec 08, 2015
In episode 2 with Monte, we dive into why Splice Machine runs on Apache HBase and how scale-out architectures for enterprises can mean the difference between hundreds of thousands of dollars for a large-scaled database versus millions. He even gives a great example with their customer Harte Hanks. Watch to get all of the details.
Andrew: There’s lots of stuff out there that calls itself SQL and Hadoop.
Monte: There is.
Andrew: You talked about what that broadly meaning as having SQL query capabilities on the one hand and running on the Hadoop stack as its underpinning on the other hand. It goes deeper, right? There are some databases that just give you a SQL syntax but they’re not full relational. Even if they are full relational, they’re not transactional, as you were discussing before.
Then there’s different parts of the Hadoop stack that you could run on. You’re actually running on HBase, which is a NoSQL database, so you are a full-scale relational database running on a NoSQL database.
Monte: That’s exactly right. I couldn’t have said it better.
Andrew: It’s all well and good that I can identify this, but why did you guys decide to go that way and doesn’t that lead to some architectural kind of conflicts and how do you overcome those? Did you even need to?
Monte: We knew when we started we wanted to build an operational database that handled both OLTP and OLAP workloads. That meant that we had to be able to do real-time updates.
Andrew: Can I be the acronym police? So OLTP being transactional operational stuff and OLAP means different things. It’s a category of products but it’s not really what you mean. You just mean analytical processing.
Andrew: The T is transactional. The A is analytical.
Monte: That’s exactly right. We knew that we wanted to be able to handle the transactional piece in particular and that was going to be new. Hadoop isn’t well-architected for that, or at least traditionally, people didn’t think of it as being well architected for that. What the Hadoop community did was build a key value store that allows you to do real-time updates and that key value store was designed after Google’s Bigtable and that was HBase.
It was a nice building block for us. Otherwise, we’d have to build all that. Yes, we built on HBase. It provides us that transactional capability for changing individual key value records in a table, but then we built around that. It was a good choice for us and we’re seeing it scale nicely.
Andrew: I guess if even the incumbent relational databases are really just built on a file system, then arguably building on a NoSQL database is starting with a more structured kind of fabric to begin with.
Monte: Right, it’s an API that makes sense for operations. You can get a record, you could scan a range of records. You can write a record and delete a record. It’s a very simple API that we can base all of our transactional activity upon.
Andrew: Alright, and you get the same kind of commodity hardware, maybe more important, commodity storage that anything with Hadoop involves and, as you were saying, you’re building stuff based on adding inexpensive nodes instead of adding expensive resources to a single machine.
Monte: That’s right. For example, our first customer is a company by the name of Harte Hanks. Harte Hanks is a marketing agency and marketing software company. They have an omni-channel campaign management solution, not unlike the things we built back in the day at Blue Martini, that they service many different retailers and financial services providers, automotive providers with, and they had a stack of software that was a state-of-the-art campaign management stack. It used tried and true software. It used IBM’s Unica application. It used Cognos for reporting. It had Ab Initio for ETL processing and then Harte Hanks has a division called Trillium, which basically de-dupes and does house-holding types of transactional work.
All of this was powered on Oracle and Harte Hanks was out in the marketplace looking to solve a particular problem because their Oracle instance powering these campaign management and campaign marketing solutions was really grinding to a halt. They were literally thinking about throwing away the whole software stack and writing it all in NoSQL – and so germane to our conversation about SQL versus no SQL – and we said to them, “Don’t do it”, and we said, “You can keep your entire stack of software, but just replace the Oracle rack system with Splice Machine’s relational database system”. They said, “There’s no way that can really work”. We said, “We’ll prove it to you”.
The reason why I tell you this story is because it gives you a feel for the cost comparison between the different architectures. Just in the case, we were able to show in the queries they gave us almost seven times performance improvements at a quarter of the cost that they were using today, let alone what they’d have to scale up to, to be able to handle the kinds of volumes that they wanted to get to. I think, literally, when you compare the scale-out architecture, it’s not just Splice Machine, but scale-out in general, you see the difference between hundreds of thousands of dollars for a really scaled database versus many millions. It’s a fundamental difference for the enterprise.
At Datameer, we’re obsessed with making data the most valuable asset in any organization. We believe that when people have unconstrained access to explore massive amounts of data at the speed of thought, they can make data-driven decisions that can wholly impact the future of any business.