Data Preparation

Data preparation is the process of cleaning, structuring, and enriching raw data, including unstructured or big data. The results are consumable data assets used for business analysis projects.
In the data science community, data preparation is often called feature engineering. Although data prep and feature engineering are used interchangeably, feature engineering relies on domain-specific knowledge compared to the standard data prep process. Feature engineering creates “features” for specific machine learning algorithms, while data prep is used to disseminate data for mass consumption.
Both data preparation and feature engineering are the most time-consuming and vital processes in data mining. Having data prepared correctly improves the accuracy of the outcomes. However, data preparation activities tend to be routine, tedious, and time-consuming.
There is a host of tools on the market today that provide data preparation capabilities. They are typically applications meant to streamline and operationalize the data preparation process. These tools are found in centralized IT departments, are used by Data Engineers, and are designed to batch and schedule data pipelines rather than explore and discover new analytics assets.
Stand-alone data prep vendors, such as Datameer X, Datameer Spectrum, and Alteryx, shape this software market’s foundation. The applications are designed to transform complex data into consumable datasets for analytics and then create data pipelines to produce it consistently.
Data preparation tools are excellent for IT teams to make centralized, complex data consumable on a scheduled basis via data pipelines. Spotlight is great for exploring and discovering new analytics assets at the business lines, not only from those centralized data pipelines but also from the data that resides everywhere else.
Datameer Spotlight allows analytics teams to find, create, collaborate, and then publish trusted analytics assets in complex hybrid landscapes. Spotlight provides unified access across analytics silos, increases the use of analytics assets, and furthers data knowledge.
Spotlight is built for ad-hoc analytics and includes key data prep capabilities, so analytic professionals can quickly enrich most assets rather than relying on centralized data pipelines and procedures used in data preparation tools. With Spotlight, professionals can directly:
Spotlight builds trust in the analytics assets through a community of experts. Spotlight works interactively with data prep outputs through virtual queries, allowing analysts to discover, access easily, and use those datasets with ease. Data preparation tools can continue to be used for data engineering purposes, producing data pipelines, and robust datasets for the enterprise. End-users build on the hard work of the data engineering team – by tagging, publishing, sharing these datasets in real-time, and promote greater use of these assets – all in a SaaS solution.
Spotlight allows analytics teams to find, create, collaborate, and publish trusted ad-hoc analytics in complex hybrid landscapes. Spotlight provides unified access across analytics silos, increases the use of analytics assets, and furthers data knowledge.
A cooperative environment between Datameer Spotlight and data preparation tools provides customers with many benefits: