Datameer has introduced its new Visual Explorer component, and in my opinion, big data analysis has undergone a sea change because of it. I know that sounds hyperbolic, especially since I am not a disinterested party – but I am in earnest. Visual Explorer is a combination of technological breakthrough and a unique user experience that changes the landscape for doing exploration work with raw data at high-volume. In this post, I’d like to expound on why it’s so fundamentally transformational.
Register for the On-demand Webinar – Introducing Datameer Visual Explorer: The World’s First Platform for Interactive Data Exploration at Massive Scale
The Historical Need for Speed
I come from an enterprise technology background. At first, in the 80s and early 90s, that meant working with desktop database engines that were – despite their now decades-old vintage – super fast. Just a few years later, in the early 90s, industry momentum shifted to client-server technology, where the database ran on a remote server. Things got slower, but the server architecture made possible much more powerful, multi-user applications with a lot of intelligence built into the database itself. The trade-off was worth it and the performance was still acceptable.
BI technologies came next and they cried out for speed. How could you use a system designed to help you derive rich information from mere transactional data and have to wait for it? BI systems needed to offer a query language that made complex questions relatively easy to express and database engines that could get the answer back quickly. While early BI systems may have fallen a bit short of that goal, they at least provided for optimizations that allowed skilled practitioners to coax performance up to that level. And subsequent advances in columnar storage and parallel processing eventually provided that level of performance out of the box.
That status quo lasted a long time – more than a decade. But then, under the radar, the nature of data started changing and its volumes started increasing substantially. At the same time, the need to analyze it immediately, without having to model it first, also increased. And, to be frank, this is where things got bad – the technology wasn’t ready to handle those requirements and keep speeds where they were previously. The good companies in the big data space acknowledged this state of affairs and have worked to mitigate the difficulties, but those difficulties have been absolutely genuine.
A Foundational Flaw
What are the roots of these challenges? The very technologies designed to work with Big Data have mostly been based on approaches antithetical to ad hoc analysis. Technologies like Hadoop were based on batch processing – running complex transformations, based on imperative programming, on huge volumes of data. Batch operations involve processing whole jobs. Meanwhile, the “run a specific query, get a quick answer” workflow of ad hoc analysis is not job-oriented at all; it’s iterative and granular –almost conversational.
Data warehouses and BI platforms could handle that workflow, but they’ve required that the data be modeled first. Big data platforms, on the other hand, devour raw data, dispensing with the formality of modeling. But historically, they have not offered interactive performance. So, when it comes to performing exploratory analysis with big data, there has been an entrenched, almost unacceptable choice: model the data to optimize for responsive analysis of it, or perform direct, but slow, processing on the raw data directly.
The Art of Compromise
What’s a poor agility-seeking big data analyst to do? Until now, she had to make compromises. For example, she could reduce her data volumes dramatically and then use distributed, in-memory processing platforms like Apache Spark. Another option has been to explore a sample of the data interactively, then run a batch job on the full data set. This approach isn’t bad – in fact, it’s the one used by the pre-Visual Explorer Datameer exclusively.
Another option has existed and it’s a compelling one: index the data on specific attributes, then use query technology that can take advantage of the indexes to perform ad hoc analysis. This is a nice technique and can work well in combination with the other approaches. But indexing has had downsides: building the indexes is itself a lengthy process and indexes require lots of storage space, so analysts have had to choose carefully what indexes to create. Effectively, that made indexing comparable to modeling: a lengthy preprocessing step that requires formal design rather than a more casual, experimental approach.
If there were only a way to make index creation lightning-fast and a way to eliminate its storage overhead, then the preprocessing delays and formality would be dispensed with and we’d still get the benefit of indexed query performance. Most of us in the industry have been trained to believe that such a prospect is a bit like having your cake and eating it too. The industry has essentially viewed the problem as zero-sum. And so the industry has been stuck.
Change the Calculus
Stuck, that is, until now. I say that because Datameer has audaciously confronted and solved this problem with Visual Explorer. That’s why I say there’s a sea change afoot here.
Simply put, Datameer has thought about this problem differently. It shed the zero-sum attitude that has hamstrung the industry and dared to imagine an indexing approach that emancipates analysts from the waiting, the planning and the resultant long, horrible detours and roadblocks in the journey to data insight.
Datameer re-orged the thinking on indexing. Instead of building indexes in advance and storing them, Visual Explorer’s indexes are created on-demand, in a few seconds and they live in memory, not on disk. This eliminates almost all the processing and storage overhead of indexing and – by making the indexes temporary and disposable – it removes the formality as well. By then combining this technology with a simple and intuitive data visualization front-end, Datameer has created the ultimate raw, big data exploration environment.
Index what you want on whim, hunch or experience. The first time you need it, it will take a few seconds and subsequent uses of that index will incur zero latency. The visual interface will let you drill down and across, and show you what your data looks like. From there, you’ll either start to derive the insight you’re looking for or you’ll proverbially fail fast, moving on to index the data differently and explore it some more.
From the Beginning
Perhaps most interesting, Visual Explorer is not meant to be a replacement for the tools you may use to perform visual data analysis on your processed and curated data sets. Instead, Datameer sees this facility as an aid to data preparation, early analytics and transformation. It all happens in Datameer’s workbook environment, enabling you to shuttle between the visual exploration and the cell-based formula approach Datameer is so well-known for.
Think of Visual Explorer as a film editor’s console in the hands of an enthusiast movie-goer, allowing him to pour over early footage rather than waiting for the fully edited movie to come out months later, but still providing a preview of the story and facilitating early ideas on how best to edit it.
The Importance of Being Exploratory
That’s quite radical. So much so that it may at first be counterintuitive. As data workers, we’ve been trained – indoctrinated, really – to think we can and should visualize only data that has been edited. Reversing the order of that – that is, visualizing data before it’s been prepared and edited – hasn’t been feasible. Even thinking about doing that has been axiomatically and dogmatically discouraged.
But Datameer believes that you shouldn’t just transform your data in order to visualize it. You should visualize data to accelerate the transformation process and to help you devise strategies for carrying it out. Visualization and ad hoc data work isn’t just the end goal in Big Data. It’s part of the journey – and Datameer believes it to be a critical part. Visual Explorer is the realization of that belief and philosophy.
Raw data isn’t just raw material; it’s the lifeblood of the analytic process and it’s the essential ingredient to data-driven culture. But most of all, the ability to work with raw data comfortably and fluently is the only route to authentic, genuine, digital transformation. That’s why Datameer built Visual Explorer. And that’s why we believe every data-savvy organization needs to have it, now.