What is Data Blending?

data blending

Data blending is the act of combining two or more datasets together.  Data blending is not just for tabular data but should be universal across any data format and source including databases, CSV files, XML, JSON, text, and a variety of others.

Data Blending

The most often places where you will see data blending is for:

  • SQL Queries, where a user will JOIN or UNION two or more tables, either from a single database or a virtual one, within a single query,
  • BI tools, where a user will combine two or more datasets in a query that will fill a single report, visualization, or dashboard,
  • ETL, where the transformation part of the process will combine multiple datasets together,
  • Data transformation, where a transformation step is performed after an extract and load (ELT) that combines multiple datasets together,
  • Data preparation, where a user will combine multiple datasets together as part of a process to fully prepare data for analysis.

Data blending is often considered synonymous with data integration, although the term data integration has taken a much larger meaning around larger ETL and ELT processes to feed data warehouses or data marts.

Using Data Blending in Data Pipelines

A major use case for data blending is within data pipelines that feed downstream analytics.  Data blending within a data pipeline would be done in one of three ways:

  • As a middle or intermediate step within an ETL data pipeline before the data is loaded,
  • As one of the final steps within an ELT data pipeline after the source data has been loaded, or
  • As part of a larger set of data preparation steps within a data pipeline.

The final result of the data blending (and other transformation and/or preparation steps) will be contained in a data warehouse, cloud data warehouse, data mart, or fed directly to BI or data science tools (via files and/or native formats).

Types of Data Blending

Regardless of where it is performed, data blending is used to create a larger dataset that offers a more complete, in-depth view.  This view can then be used for information purposes within an application, analytics in a report, visualization or dashboard, or data science to feed AI, ML, or other types of models.

A prime example of data blending is Customer 360, where multiple sets of information about a customer are combined together to give a comprehensive view of the customer’s activity, actions, and behavior.  This data can then be fed into a CRM or customer service application, used for various forms of customer analytics including customer behavior, or to feed data science models that will predict behavior and/or make recommendations.

There are five uses for data blending:

  • Combining highly related datasets with similar keys from different sources to gain a single view on a subject such as all customer activity
  • Filtering datasets by combining datasets and keeping only the intersection of these to find commonalities,
  • Enriching data by combining master data about a specific subject with other data to add dimensions for analysis,
  • Enriching data by combining somewhat disparate datasets to additional aspects to an analysis or data science model,
  • Data cleansing and verification to fill in missing or incomplete datasets, or verify that datasets are accurate.

Data Blending for Data Enrichment

While combining related datasets was one of the original uses for data blending, using data blending for data enrichment has grown substantially as analytics have gotten more sophisticated and pinpoint.  Data enrichment is also highly essential for data science.

Data enrichment comes in two forms:

  • Blending master data – in this case, master data about a subject, such as customers, products, assets, etc., are combined with transactional, event, or activity data to provide additional dimensions for analysis or features to feed a data science model.  Master data on a specific is often maintained in a system of record, such as a customer master in a CRM system or a product master in an ERP system.
  • Combining somewhat unrelated data – here, outside data is combined with transaction, event, or activity data to add more flavor to an analysis and provide more dimensions.  Prime examples are combining weather data with activity to determine their impact on events or activity and make prediction, or add spatial data to explore location-specific nuances to events or activity.

Early uses of data enrichment were very personalized, with analysts or data scientists combining data on their own.  However, with the advent of more comprehensive data integration platforms that offer rich data preparation capabilities, such as Datameer, data enrichment processes can be standardized in data pipelines for greater use across an enterprise and governed more effectively.

Datameer Dot Green and Navy Blue

Data Blending in Datameer

Datameer offers a comprehensive set of capabilities for data blending as part of the over 300 graphical functions for almost any form of data transformation and as part of data imports.  This includes:

  • The ability to perform SQL-based data blending as part of the data extraction process to combine data within the same data source, push down the query processing to the original source, and reduce network data traffic.
  • Numerous wizard- and recommendation-driven methods to combine datasets, including simple JOINs over one or multiple columns, ranged Joins, self-Joins, and Unions.
  • SQL worksheets where SQL-savvy users can create or even migrate existing SQL scripts that combine data into Datameer.
  • Schema-free mappings that do not require users to define schemas – Datameer automatically understands the underlying data structure and maps to the desired output.
  • File uploads that allow users to upload and integrate local or external data in files with enterprise data to enrich data and greater detail or context.

A Datameer Customer Using Data Blending

A leading provider of title insurance and property and mortgage-related services had highly complex and diverse datasets including data coming from services partners.  The diverse data required a heavy dose of coding to normalize and enrich the data for analytics.  Each dataset also needed to be classified in multiple ways enriched with calculated values as it came in.

The customer turned to Datameer to eliminate their dependence on time-consuming, manual SQL coding and share data curation processes between the data engineering teams and analyst community.  They were able to take advantage of the rich array of graphical Datameer functions to have data engineering teams normalize and classify data, and have analysts enrich data on their own to their specific analytics needs.

Datameer Dot Green and Navy Blue

Learn More

Learn more about Datameer’s data blending capabilities, as well as the remainder of our over 300 comprehensive graphical functions for various forms of data transformation, cleansing, enrichment, and preparation, by scheduling a personalized demo.

No-Code Analytics Built for Snowflake

Try Free Now