Datameer Blog post
It’s Time for a New Model for Data Exploration – Meet Datameer Visual Explorer
by John Morrell on Mar 05, 2018
At the heart of digital transformation is the creation of new data assets that drive the processes that fuel digital business. Optimizing and operationalizing the data asset creation process is at the core of the next iteration of the data lake, in which the data lake becomes the production line for the digital business. Companies that succeed at optimizing these production lines will ultimately have a tremendous leg up over their competition in the digital economy.
Today, companies are racing to lead that transformation. And because it is a race, in an attempt to get there the fastest, companies are mistakenly resorting to old paradigms believing they will get them where they need to be.
We live in a world where we often try to innovate by applying old world ideas to new world problems. Sometimes this works. But in the new era of digital transformation and big data, the old world model of the EDW stack simply does not work.
Why? Because in the digital economy, the world and your business changes every minute. New competitors arise. New interaction channels and models come to life. New suppliers can offer better products at better prices. New risks emerge.
If you are to succeed with digital transformation, your analytic data models need to have the flexibility to turn on a dime. Your approach to analytics needs to adapt at every turn. Applying old world EDW techniques such as building a multi-product stack and materializing OLAP cubes wipes out the flexibility and speed your business requires to win in the digital economy.
The Old Model in the New World
Enterprise Data Warehouses (EDWs), along with the tools and architectures that surrounded them, served a great purpose. It enabled business reporting on a scale we’ve never before seen. And it gave us the early underpinnings for analysis, albeit simplistic, highly structured analysis. However, the emergence of new tools that simulate the EDW stack on the data lake raises the question – is this appropriate to fuel a new model of data asset creation? This approach combines various products to move data through a pipeline to feed your traditional BI tools (see figure).
Rather than foster innovation, recreating the EDW stack on a data lake, albeit with some newer tools, creates inefficiencies in the process. Certain tools in the stack may be better than their EDW predecessors (e.g. data preparation is more flexible than cleansing inside the ETL tools). However, problems still abound:
- The lengthy stack creates “latency” in the end-to-end pipeline
- Data is duplicated consuming additional system and storage resources
- Data governance and security becomes complex with extra copies of data and each tool having their own models
- Each tool has different execution and processing models, making operationalizing pipelines practically impossible
At the top of the stack is OLAP on Hadoop, which really is not an innovation, but a port of traditional OLAP to Hadoop, and, it is counter-intuitive to the data exploration paradigm a data lake is trying to foster.
OLAP on Hadoop makes a trade-off between performance and data exploration. In order to gain interactive performance on an traditionally non-interactive platform – Hadoop – OLAP on Hadoop engines require a pre-defined model and pre-build their indexes and aggregations – in other words they materialize the cubes.
This limits the data exploration to the “paths” in the pre-defined model – the dimensional hierarchy, values and metrics. Not only does this limit what the analyst can explore, but could also lead the analyst to a non-optimal or even wrong answer, using attributes, values or results that in fact have little to no bearing on the true answer.
A New Model Emerges
When it comes to data exploration at massive scale, architecture matters. Traditionally there has been an architectural trade-off of performance versus flexibility. To gain performance, OLAP architectures need to materialize their indexes and data. Why? Because they could not scan the data fast enough to get the interactive response times analysts wanted.
But this traded off flexibility – the ability to explore the data in a variety of ways. Pre-built models limited the exploration to pre-defined dimensions, values and metrics.
To deliver on the true promise of big data exploration, an architecture needs to support three key tenants:
- Free-form exploration without the constraints of pre-fixed models
- Interactive exploration of datasets in the billions of rows
- Sub-second response time to explore at the speed of thought
And this needs to be integrated onto a converged platform offering governance, security and scale.
Introducing Datameer Visual Explorer
Today, we are proud to unveil that major technical innovation, which finally allows businesses to explore their big data in a platform that affords them both performance AND speed. Datameer Visual Explorer is the first and only solution for instant, interactive visual exploration of entire data sets of any size. It combines free-form exploration, scale and interactivity using a patent-pending schema-less architecture that frees the analysts from the complex modeling, rigid multi-dimensional schemas, and pre-aggregation processing of OLAP solutions. This lets your business analysts focus on what’s important – the data and the process of extracting real value from it for the business.
The ability to explore your data in any direction, at will, and at any scale is an unmatched combination. The free-form exploration effectively fits the metaphor of working with big data. Drill-down and –up. Drill-across and –back. Change to any metric. Change to any attribute. Drill-down to any new attribute. Filter on any value. There is literally no shortage of ways in which Visual Explorer lets you look at your data.
Dynamic Indexing with Rapid Micro-Scans
The dynamic indexing architecture of Visual Explorer delivers the scale, speed and efficiency needed to enable a large number of concurrent users explore large datasets and drill into the details. Highly focused indexes are generated on the fly based on the data being explored for maximum efficiency and rapid sub-second response.
The secret to the speed, efficiency and free-form exploration capabilities of Visual Explorer comes from a highly integrated combination of:
- A highly tuned distributed search technology that focuses on optimizing generating search indexes on data
- The use of optimized columnar data management that makes it extremely fast and efficient to scan the data in real-time, even at scale
The two features complement each other. The distributed search technology enables highly focused, easily navigated indices that can be generated on the fly as the user explores certain areas of the data. Because the “queries” are highly focused, the engine can then do “micro-scans” of the data, grabbing and aggregating only the data needed for that exploration path.
These two extremely efficient approaches enable the Visual Explorer Dynamic Indexing Server to deliver results with sub-second response times on extremely large datasets. But it also contributes to the free-form exploration as the indexing on the fly and extremely fast micro-scans enable agility to explore in any direction, on any attribute, value and metric in the dataset.
While Visual Explorer is a breakthrough in data exploration, the converged Datameer platform provides the ability to manage the entire data lifecycle and help organizations produce analytic data assets faster and more efficiently. The single model for governance, security and operationalization reduce the time and cost to create and manage data asset pipelines.
Visual Explorer combines with the remainder of the Datameer platform for the first ever collaborative data curation and refinement process that brings together the data engineer and emerging power analyst – business analysts tasked with exploring deeper digital transformation questions. For the first time, the business analyst can comfortably use the data lake to explore large-scale datasets in a free-form manner, using a familiar visual metaphor.
Get More Value Today
Getting more value from the data lake requires more consumption from the emerging power analyst persona in the organization. These power analysts are the leaders who are exploring the bigger questions that can help drive digital transformation at an organization.
Datameer Visual Explorer takes data exploration to an entirely new level, bringing the power analyst to the data lake so they can create and consume data assets faster. The unique new architectural approach delivers free-form exploration at unprecedented scale allowing the power analysts to create more data assets faster and more efficiently, to ultimately fuel their digital transformation.