What is Dark Data?
- Justin Reynolds
- February 8, 2020
Suppose you’re like most businesses these days. You’re generating and capturing more and more data with each passing day. But where is all of that data going? And are you even able to benefit from the majority of it? This is where dark data comes in.
As it turns out, you’re probably only using a small portion of the data that you’re collecting on a daily basis. Chances are the bulk of your data is sitting unused in various backend storage systems, where it will never see the light of day. This is unstructured data, and without the right tools in place, it’s entirely undiscoverable.
That’s because this data exists in formats that are harder for a computer to categorize and analyze. Think PDFs, video files, logs, and audio files, which don’t fit nicely into a spreadsheet. Frequently, businesses lack the resources to convert unstructured data into a structured format they can then leverage. In many cases, they don’t even bother trying to look for it in the first place.
All of the data you’re not using is called dark data, which is a problem for almost all companies today. In fact, by some estimates, more than 90% of all business data is dark data.
Dark Data Defined
Dark data is information that companies collect, process, and store without repurposing it in any way. It’s typically generated from everyday digital business activities — some of which you may not have considered as producing serious data.
To give you a better idea of what dark data looks like, here are some common examples:
- Financial statements
- Survey data
- Old emails
- Video footage
- SMS messages
- Previous employee files
- Employee weblogs
As you can see, just about every digital process leaves behind a trail of data. Just think about all of the various digital processes that your company uses, and you will be astounded at the sheer amount of data that is going to waste in your company.
Consider how many people you emailed or texted just today, and scale that figure across your entire company. All of that data is sitting somewhere in your organization — along with your recent call logs and much more.
This raises an important question: Does dark data have any value at all, or is it just a drain on an enterprise? Let’s take a closer look.
A Risk or an Asset?
According to Gartner, companies primarily store dark data for compliance purposes. As a result, storing and securing data usually carries more expense and risk than value — mainly if your data contains sensitive information that could lead to reputational harm or business loss if exposed.
For example, imagine if your sales team’s SMS messages were discovered in a hack or if your company’s pay stubs were breached. This is probably information that your organization should find and be aware of before the general public gets its hands on it.
Yet, despite the inherent risks associated with keeping dark data, most companies choose to hang onto it instead of deleting it from their systems.
This is often important for recordkeeping and litigation purposes. Consider a case where a company needs to fire an employee for wrongdoing. The company may choose to use their data trail as evidence that the individual broke company protocol in one or more areas. Deleting this information could put the company at a disadvantage during a hearing.
At the same time, dark data often contains useful insights that could benefit an organization. For example, a company could potentially study dark data to analyze employee behavior. Imagine analyzing expense reports over a 10-year period to determine how employees are spending company funds on trips and use that data to identify waste or abuse. You might discover that spending money on client dinners does little to help close deals and decide to set limits regarding what sales associates can spend on trips. Or you may find the opposite is true and optimize in that direction.
Similarly, you might use dark data to analyze customer support logs to determine how your audience is most likely to reach out to your business and invest additional resources in that channel (e.g., live chat).
As you can see, dark data can be both a risk and an asset for an organization, depending on how it’s used. Ultimately, you won’t know whether it’s an asset for you until you dive in and take a look at the trove of hidden information that is hiding across your enterprise.
What to Do About Dark Data
In the past, it was challenging to discover and analyze dark data. But thanks to new tools, it’s getting easier to extract value from dark data due to recent advancements in technologies like robotic process automation (RPA), artificial intelligence (AI), optical character recognition (OCR), and business intelligence software. Cognitive technologies make it possible to shed light on dark data to be extracted, analyzed, processed, and put to use, or discarded.
If you’re thinking about taking control over your company’s dark data, here’s a three-step process to follow.
1. Discover Your Data
You can’t use dark data if you don’t know whether it exists in the first place.
Using a tool like Datameer Spotlight, you can discover, access, model, and share your dark data. Spotlight enables analysts to search for data across any location — including the cloud, data lakes, and applications, among other areas, from a single, unified platform.
For example, you could use Spotlight to run a search on every image stored across all of your accounts and systems. You’ll find images hiding in emails, applications, websites, and more. With the help of Spotlight, you can bring it all together into one central repository for easy access.
2. Classify Your Data
Once you’ve found dark data, the next step is analyzing and classifying the information you’ve discovered.
For example, if you find financial data, you might want to categorize them as invoices, purchase orders, and 1099 information. By organizing these files, you can ensure that employees can quickly access the information they’re looking for whenever it’s needed.
3. Delete Unnecessary Information to Reduce Risk
After classifying your dark data, you need to delete any data that no longer serve the organization.
For example, after conducting a PDF search, you may find tons of old files that no longer hold any value. You could then delete that data after reviewing it to save money, free storage space, and reduce risk.
In this light, a dark data initiative can be an excellent strategy for reducing data clutter and improving data hygiene.
Ready to Take Control over Dark Data?
No matter what industry you’re operating in, your company has volumes of dark data sitting around waiting to be discovered. Some of it is valuable, some of it is risky, and some of it is just taking up space. Unfortunately, you won’t know until you take a deep dive and discover it all for yourself.
Instead of letting dark data pile up unbeknownst to you, it’s time to take control over all of your organization’s data. This is even more important looking into the future, as your company continues to invest in big data and accelerate digital transformation initiatives. As you take in more and more data, it’s vital to stay organized and in command of your assets. Otherwise, you might be sitting on a goldmine of data without even knowing it.
Data transformation is a critical component to taking advantage of your dark data. Dark data is a set of complex and diverse datasets that need to be cleansed, blended, and shaped into final form. Often times this involves high degrees of collaboration between data engineering and analytics teams. It also requires rich data documentation to back up compliance processes.
Datameer is a powerful SaaS data transformation platform that runs in Snowflake – your modern, scalable cloud data warehouse – that combines to provide a highly scalable and flexible environment to transform your data into meaningful analytics. With Datameer, you can:
- Allow your non-technical analytics team members to work with your complex data without the need to write code using Datameer’s no-code and low-code data transformation interfaces,
- Collaborate amongst technical and non-technical team members to build data models and the data transformation flows to fulfill these models, each using their skills and knowledge
- Fully enrich analytics datasets to add even more flavor to your analysis using the diverse array of graphical formulas and functions,
- Generate rich documentation and add user-supplied attributes, comments, tags, and more to share searchable knowledge about your data across the entire analytics community,
- Use the catalog-like documentation features to crowd-source your data governance processes for greater data democratization and data literacy,
- Maintain full audit trails of how data is transformed and used by the community to further enable your governance and compliance processes,
- Deploy and execute data transformation models directly in Snowflake to gain the scalability your need over your large volumes of data while keeping compute and storage costs low.
Learn more about our innovative SaaS data transformation solution by scheduling a personalized demo today!