Regardless of whether the enterprise selects the platform or standalone approach for self-service data prep, maximum compatibility with data sources – both on-premises and in the cloud – should be a key priority in selecting a product.
The self-service analytics era demands ready access to prepared data input into self-service visualization and BI tools. The traditional BI approach to data preparation, using ETL tools to populate centralized data warehouses, was never meant to support the modern volume of analytics end-users; it was also an IT-controlled process where IT was the bottleneck in preparing data for analysis. As the number of analytics end-users has grown and users expect ad hoc access to data, users need to be given tools to prep and blend data themselves.
Self-service data prep functionality has evolved to give general business users the guided tools to transform and prepare data sets without complete reliance on IT, allowing the data prep process to scale within the organization. But as more users access growing volumes of diverse data to prepare for analysis, there needs to be the governance of the process. To address this need, many data prep environments are building in native data catalog functionality or providing integrations with best-in-breed standalone data catalog tools so that users can navigate and find the data they need.
Catalyst. Ovum View. Key findings.
Inclusion criteria. Exclusion criteria. Methodology. Ovum ratings. Ovum Interactive Decision Matrix.
Market leaders, challengers, and followers. Market Technology, Execution, and Impact.
Alteryx Designer. ClearStory Data. Datameer Enterprise. Datawatch Monarch. IBM Data Refinery. Oracle Analytics Cloud. Trifacta Wrangler. Unifi Data Platform.
The self-service data prep market, paradoxically, is both mature in some regards and still rapidly evolving in others. Across Ovum’s technology assessment categories of product functionality, products tended to be very closely grouped in capabilities associated with core data prep functions described by the data manipulation category, such as joins, transformations, merging, cutting, and replacing values. Similarly, most products scored highly in the administration category, which encompasses features related to deployment, processing, and architecture; the overwhelming majority of self-service data prep providers, for instance, already support deployment on all three major public cloud providers. The battleground, then, for this Ovum Decision Matrix scoring process came down to categories with more variance in response.
These categories, such as collaboration and machine learning, and data governance, represent areas of rapid development in the enterprise software market. They are not capabilities unique to self-service data prep products but rather broader trends incorporated by necessity within these environments. These are becoming emergent areas of differentiation between products, and vendors draw upon their diverse product backgrounds to flesh out functionality in unique ways.
Market leaders in this Ovum Decision Matrix are ClearStory Data, Datameer, Trifacta, and Unifi. Regardless of their architectural approach, the leading vendors are typically notable for their high scores in the data governance and the collaboration and machine learning categories in the technology features assessment. Additionally, they edged out others in technology categories, such as data manipulation, closely clustered scoring. For execution categories, market leaders had solutions that tended to score better on maturity and deployment.
Datameer is a broad and complex platform, originally built on Hadoop implementations but now native in the cloud, that is focused on building and managing the data pipelines that enable data to be fed into any analytic tool. The current platform performs ingestion, integration, prep, enrichment, exploration, and some visualization. It is well-suited to IT ecosystems that are complex and high scale. Connectors to more than 70 data sources beyond the Hadoop ecosystem provide immense flexibility and connectivity, allowing it to be a central hub for data prep and staging before data is sent to an analytics tool.