Blending Data from Multiple Data Lake Sources

Thanks to the ever-growing nature of formats and sizes of data that you will encounter in any data-driven scenario, it is hard, if not almost impossible, to think of a use case where data comes from a single source. Blending data from multiple sources is necessary to enhance the meaning and value that data will provide to the enterprise.



In this tutorial, we will show you how to use Spotlight’s semantic layer capabilities to blend and enrich data from multiple sources, whether in the cloud, on-premise, or even local files. You are more than welcome to follow along with this tutorial using the Spotlight virtual lab. We also created a video overview if you rather sit back and watch.

Previewing the Datasets

While in Spotlight, you can work with any of the multiple sources supported by Datameer. In this tutorial, we will focus on Product data that we have stored in Amazon S3

and Customer data that we have stored in Snowflake.

Creating a Workspace

First, we need to create a workspace that we will use to model the data. Click on + Add New  and then select Workspace.

Give the workspace a name and then click OK.

The newly created workspace will show up in the list of available assets, click on it to access it

Click on the plus sign to start adding datasets to it

From the Customer connection select the Marketo dataset and then click on Add to Workspace

Next, click on +Add… on the top left corner of the workspace and then select Data

Then from the Product Dataset 2021 connection I’ll select the Sales opportunities dataset and click on Add to Workspace.

Now with both datasets on the Workspace, we will click on Open Workbench.

Modeling the Dataset

Using Spotlight’s Workbench allows you to view data from the references and datasets in your Workspace, then use that data to create new datasets. New datasets can be edited with operations, used elsewhere in Spotlight, or opened in external tools like Tableau, or Jupyter for further analysis. 

Once we are in the Workbench, we are going to create a new dataset, then will use it to blend both datasets (Marketo and Sales opportunities)

Click on the plus sign on the selected dataset

Now we need to add an operation, click add operation.

Then select, blend.

From the blending settings, select one of the datasources in the workbench, and then select the blend mode that fits your business needs.

Click on use suggested columns

And then create the blend, the resulting blend should look like this:

You can repeat this process as many times as needed with other data assets in your Spotlight environment.

Visualizing the Data

At this point you are ready to visualize the dataset using any of the Spotlight-supported BI or data science tools.  To do so, select any of the tools available in the workbench, create the connection between Spotlight and your tool and start visualizing the dataset you just created.

Wrapping Up

The steps that we took in this tutorial allowed us connect and blend data from Snowflake and Amazon S3. You can use the same process to work with any data regardless of its location. Using the multiple operations available in Spotlight, you can model your data to fit the business needs of your use case.

To learn more about the modeling operations included in Spotlight please check out our documentation. As always, we look forward to your feedback. Please get in touch if you have any questions, comments, or other ideas.