Datameer Blog post
Five Rules of Data Exploration
by John Morrell on Apr 06, 2018
As always, one always learns something new at the Gartner Data and Analytics Summit (the 2018 North America version held last week in Grapevine, Texas). I attended a fascinating session with two of Gartner’s most knowledgeable analysts – Mark Beyer and Adam Ronthal – on modern data architectures. In this case, it was not so much learning something new, but rather being reminded of a concept I had used before.
During the session, the presenters showed the following data management infrastructure model from an October 2017 research report titled, Solve Your Data Challenges With the Data Management Infrastructure Model:
According to the Gartner report,”the model represents how different infrastructure components align with specific use-case characteristics and how to evolve those use cases as the data and the desired outcomes become better understood. The model is easily understandable and broadly applicable.”.
Seeing this chart brought up a strong sense of deja vu, as Datameer used a very similar chart in presentations explaining the type of problems we solve for customers. We help people in the upper right quadrant – exploring unknown data to answer a range of unknown questions by getting more data in timely manner to business analysts.
Drilling down to the next level, it is important to understand that data exploration and discovery is both an art and a science. There is the science of digging into data and coaxing answers from it, and there is the art of knowing where to look and working with your data-driven colleagues.
With that context, let’s take a look at five key rules to successful data exploration and discovery that take into account both the art and science sides.
1. Explore Data AND Questions
This may seem obvious from the Gartner chart, but to get the most of data exploration, you need to both dig into the data and ask a lot of questions, regardless of the answer. If you don’t explore the data, you don’t necessarily know if it is the right data to answer the question. Or, looked at another way, you want to let the data answer the questions, not force the questions on the data.
This also brings up the always talked about the concept of “fail fast”, referring to the ability to quickly explore various unsuccessful paths until you find the right answer. A good data exploration interface will allow you to explore at the speed of thought, failing fast on a number of different dimensions, metrics, and attributes until you find the right combinations.
In addition, a good answer will almost always lead to another question that needs to be answered. This is part of the art of exploring. Swiftly and continuously ask questions – whether you fail or find answers – to determine the best actions to take.
To answer difficult digital age questions, data diversity is critical. Using single sources or small amounts of data tends to lead toward bias in your answers. This is especially true when generating data sets to feed into data science (predictive, AI and ML) models.
There is another trap you can fall into when not using more diverse and voluminous data – settling for the less optimal answer. One can find answers in simpler datasets, but the answer may not be the best one to create the right outcomes you seek (we will talk more about outcomes below). Using greater volumes and more diverse data lets you explore a greater variety of options to find the BEST answer, not simply AN answer.
3. Don’t Turn Your Back on Time
According to an older but still relevant research report from Nucleus Research, business data has a half-life of usefulness, just like radioactive materials. This report reports the half-life of data for strategic decision makers is an average of 56 hours, with a 70 percent of the data still viable.
In the data and analytics world this is called data perishability – data that loses its initial value over time and must be acted on swiftly to yield any benefit. This increases the need to get data curated and consumable more swiftly (self-service access) and to explore the data at the speed of thought.
Data perishability is also very relevant to data gravity and the need to process and explore data in the cloud, if that is where it lands. Taking the time and resource to move data increases the likelihood data becomes perishable.
4. Look for Outcomes
In the digital economy, we are exploring data to drive action, and not simply to make decisions. Many of the insights derived from exploring unknown data and questions will be used to determine what to do next such as streamlining customer experiences, determining the next best action for a customer, or determining better shipping routes for goods.
When formulating your data for exploration, you need to include information about what type or form of outcome occurred, such as if a customer churned or grew their lifetime value. You need to explore the data on either side of this equation to explore what happened and identify the key items and attributes to use in actions to derive the needed outcomes.
5. Tell Them About It
Pure, low-level metadata doesn’t say much about the data. If you’re data knowledgeable, then good but simple metadata can provide clues to navigating datasets looking for interesting insights and patterns. But we are in the realm of unknown data and we trying to enable a broader array of data consumers to explore questions.
This is where the data catalog comes into play. Describe what you found out about dataset inside the catalog to better inform your colleagues on what it can reveal, how it can be used, potential outcomes it can drive and more. This helps guide other analysts to explore and find their own insights more quickly and drives re-use of data assets across your organization.
In our modern digital economy, data is the new currency. Faster delivery or more data to business teams let their analysts explore this unknown data and ask previously unknown questions, finding angles to leapfrog competitors and create better relationships and higher lifetime value with customers.
Does your company have a forward-thinking data strategy? Are your business analysts empowered to explore unknown data and questions to discover insights that create faster actions and better outcomes? Learn more about how to create an insight-driven organization and scalable data strategy here at Datameer.com, and follow our five rules of data exploration to be successful.