Disrupting the traditional central data warehouse model
- Press Release
- December 1, 2020
The new flagship product from Datameer upends a three-decade-old approach to data analytics – Disrupting the traditional central data warehouse model
SAN FRANCISCO, California, December 1st, 2020 –
Datameer announces the introduction of a breakthrough platform, Datameer Spotlight, that flips the traditional central data warehouse paradigm on its head and enables organizations to run analytics at scale in any environment and across data silos at a fraction of the cost.
The approach to leveraging data for analytics has remained unchanged for almost three decades: Organizations pipe all enterprise data into a centralized data warehouse or data lake in an expensive, time-consuming process.
Despite company after company failing at this elusive data centralization quest, companies such as Oracle, Teradata, and Informatica — then, later on, AWS, Google Cloud, Azure, Snowflake, Talend, and Fivetran — have thrived under this three-decade-old model.
Promoting the vision of a single source of truth that delivers a 360-degree view of the customer, vendors have been competing to store a copy of your data in their data centers using their tools. The cloud data warehouse’s advent brought incremental improvements, which saved organizations from having to plan for excess storage and compute on-premises. However, it has not changed the fundamental issue of duplicating and centrally storing data.
This approach is unwieldy, costly, and leaves enterprises struggling to leverage their data’s full value.
Data Replication is Costly & Wasteful
Data replication is not free. On-premises or in the cloud, data replication requires storage, tools, and highly-skilled, highly-specialized data engineers to code and maintain complex ETL scripts. Unfortunately, demand for data engineers has grown 50%, and salaries have increased by 10% year over year, according to Dice.
According to IDC, it also has a non-negligible impact on the environment. Nearly 10 million data centers were built in the last decade. Now, data centers have the same carbon footprint as the entire aviation industry pre-pandemic.
Data Pipelines are Lengthy & Unwieldy
Business users need instant access to data to make real-time business decisions. Current batch ETL processes for moving data don’t give users the instant access they need. Making matters worse, it takes days, weeks, and sometimes months to initially set up a data pipeline. Data pipelines’ specifications can also get lost in translation between the business domain experts and the data engineers who build them, complicating things further.
What’s more, business users don’t always know what transformations, cleansing, and manipulation they’ll want to apply to the data. Going back and forth with data engineers makes the discovery process very cumbersome. Hadoop solved this issue with schema on reading. But the complexity of the technology combined with the still monolithic data lake model doomed this ecosystem.
Central Data Warehouse Model Backfired with Governance & Security Risks
Replicating data via data pipelines comes with its own regulatory, compliance, and security risks. The centralized data approach gave IT teams the illusion of tighter control and data governance. However, this approach backfired. With data sets never meeting business needs, different teams began to set up their data marts, and the proliferation of these only exacerbated data governance issues.
Sunk Costs & Throwing Good Money after Bad
Over the years, organizations have made significant investments to build their version of the enterprise data warehouse. Despite these projects falling short of their promises, organizations have been committing to sunk cost fallacy. Throwing more money at them in an attempt to fix them, e.g., recruiting more specialized engineers and buying more tools vs. looking for alternative approaches and starting anew.
For example, enterprises move some of their data to the cloud on AWS, Azure, Google, or Snowflake, expecting faster, cheaper, more user-friendly analytics. Migration projects are rarely 100% successful and often result in more fragmented data architectures that make it harder to perform analytics in hybrid or multi-cloud environments. For example, businesses might purchase Alteryx to enable domain experts to transform data locally on their laptops. They are contributing to more data chaos and the proliferation of ungoverned data sets. After that, they purchase a data catalog to index that data and help business users find it. The IT team will want to invest in tools to add a layer of governance for peace of mind. Data stacks often end up thrown together like the Winchester Mystery House, becoming a money pit for enterprises.
And yet despite these massive investments:
- 60% of executives are not very confident in their data and analytics insights (Forrester)
- 73% of business users analytical time is still spent searching, accessing, and prepping data (IDC)
- More than 60% of enterprise data will not be used for analytics (Forrester)
Datameer Solves These Data Challenges
With over 200 connectors and counting, Datameer Spotlight provides business end-users virtual access to any on-prem or cloud data sources. —including data warehouses, data lakes, and any applications—and lets them combine and create new virtual data sets specific to their needs via a visual interface (or a SQL code editor for more advanced users), with no need for data replication. Data will be left in place at the source. This new approach solves for:
– Data governance: data remains at the source, and no data replication is needed.
– Cost: with no need for ETL tools, a central data warehouse, data cataloging, a data prep tool, data engineering, or middle man between the data and the end-user, the solution ends up at a fraction of the cost of traditional approaches.
– Speed and agility: connecting Datameer Spotlight to a new data source takes as much time as entering your credentials to this data source. Once connected, business users can create new datasets across data sources in a few clicks.
– Data discoverability: it virtualizes your data landscape by indexing every data sources’ metadata. Then creating a searchable inventory of assets that can easily be mined by analysts and data scientists—all without moving any data.
Ready to give Datameer Spotlight a test drive? Try it for free here.