Informatica offers an end-to-end data integration platform that has an extensive set of capabilities. The company has a portfolio of data integration and cloud data integration products and areas related to data integration, such as data engineering, data cataloging, data quality, data governance, and master data management.
Informatica’s legacy data integration product – PowerCenter – has been on the market since the 1990s and was designed and optimized for on-premises deployments. Only recently (in 2018) did Informatica move their data integration products to the cloud in both their own – Informatica Cloud – and on public clouds (AWS, Azure, and GCP).
Informatica’s product family supports two main forms of data integration: ETL and point-to-point synchronization. The main data integration products work with the ETL data flow style model. The “hub” products (Cloud Integration Hub, Data Integration Hub, etc.) support the publish and subscribe data synchronization model, more appropriate for application integration, and are not advisable for data integration for analytics.
Datameer Spectrum combines the full power and reliability of fully-featured ETL with wizard-driven simplicity for faster, easier self-service data integration without writing one line of code. Once ready, Spectrum’s complete operationalization and governance features enable reliable, automated, and secure data pipelines to ensure a consistent data flow.
Spectrum has its roots in Datameer’s premier on-premises solution, Datameer X, giving it enterprise features that are proven at large enterprises, providing you with the best ETL data integration functionality. What Spectrum adds is ease of use, full data preparation, elasticity, and manageability in a single package, allowing you to make your data ready for analytics 10 to 20 times faster at a fraction of the cost.
Spectrum provides a hybrid ETL and ELT platform for flexibility to support both data integration forms your organization needs on the same scalable platform. Spectrum is cloud-native on all three major cloud platforms (AWS, Azure, GCP) and carries with it the elasticity and cost economics you would expect from the cloud. Spectrum also bridges any data sources you have regardless of type, format, and location (cloud or on-premises).
Informatica and Datameer Spectrum have several common attributes and also many differences. Let examine how Informatica and Spectrum compare in these seven key areas:
In both Informatica PowerCenter (the legacy data integration product) and Cloud Data Integration, the approach is a very traditional ETL one. Tasks are pieced together into a workflow, or a dataflow, to connect to and extract from the source data, transform the data and map it to a target schema, then load the data into the destination. Any number of tasks can be added to create highly sophisticated dataflows.
Spectrum supports a more general approach to data integration freeing designers to piece together dataflows that best support their needs. This includes both ETL and ELT (extract, load, and transform) approaches, or just general-purpose orchestrated data pipelines.
Informatica and Spectrum are stark contrasts in user experience and how the designer tools work. Informatica’s user interface is workflow/dataflow oriented, where the user adds individual tasks to extract, transform, and load the data. Tasks can also be added for other items such as monitoring and logging.
Tasks can be very minute actions, especially in the transformation phase, such as applying a single function to transform one field. For more sophisticated dataflows, this can be too cumbersome and make flows hard to read. Also, there is no interactivity with the data as tasks are added, making debugging difficult.
Spectrum’s wizard-led extraction and loading, easy spreadsheet-style interface, over 300 powerful single-click functions make it faster and easier to design ETL flows. Users fully interact with the data as they build the flow, seeing changes to the resulting data as each function is applied. This allows users to immediately seeing how the data is being shaped and potential errors. Operations can be reverted (undo) at the click of a button.
Spectrum Visual Explorer allows users to graphically explore data at any stage in the flow to further understand the data and see places where data might be cleansed or shaped. As users refine and drill down into the data along different dimensions, transformation workbooks can be generated with a single click.
Informatica is designed to support standard data transformation needs specific to data integration. It contains 110+ basic transformation functions and another 30 higher-level tasks such as aggregations, sorts, or parsing. Informatica has a separate data preparation product that is not integrated into its data integration platform and has minimal functionality for data lakes.
Datameer Spectrum has integrated data preparation capabilities and a library of over 300 functions to enable complete data preparation and transformation capability within data pipelines. Many Spectrum functions perform wizard-led sophisticated transformations that would typically take multiple functions to piece together, such as deduplication, pivoting, applying algorithms, and encoding data for AI and machine learning. Automated parsers for complex file formats eliminate the need to transform the data inside dataflows.
Spectrum also includes inline visual data profiling of any field at any stage in the flow. This allows users another graphical view of the data’s contents and enables single-click data filtering, cleansing, and more.
Informatica and Spectrum provide similar folder and component asset organization and management of data integration components. Security and access controls are applied at various levels in this hierarchy.
Spectrum takes this model a step further than Informatica. First, it automates and persists managed datasets within the flow. This is an additional component that users can access and share within a project.
Spectrum also provides some data cataloging features, including allowing users to annotate and tag worksheets and annotate fields within a worksheet, giving them further technical or business context. Users can also search for worksheets and datasets based on many items, including names, descriptions, annotations, and tags.
Both Spectrum and Informatica take a meta-driven approach to job execution. They read meta-data about a job and use that information to execute the job. In the case of Spectrum, highly optimized Java code is generated and executed.
Informatica’s standard job execution engine is their proprietary Blaze engine. Blaze is the only available engine for on-premises deployments. In Cloud Data Integration, there is an add-on option to use elastic Spark clusters for the engine, which carries additional costs.
Spectrum is specifically designed to use scalable Spark clusters as the job execution engine. Highly optimized code is generated to execute jobs efficiently. Clusters are elastic and auto-scaled to the performance needs of running jobs.
Both Spectrum and Informatica provide enterprise-class security and governance capabilities. The critical difference is that Spectrum offers integrated security and governance, while Informatica requires add-ons for advanced security and a completely additional product for data governance.
Informatica is very cagey about their pricing with limited transparency. Their on-premises pricing is expensive. Running a minimal (only three connectors) Data Integration Essentials on the public cloud (AWS) or the Informatica Cloud costs $72,000 annually. But this leaves out many of the enterprise features we discussed earlier as either add-ons or separate products:
Also, some capabilities such as Data Preparation are entirely separate products that offer no integration with Cloud Data Integration.
Datameer Spectrum provides all these capabilities and more, integrated into a single platform with transparent, per user pricing.
Datameer Spectrum offers all the critical enterprise-grade ETL data integration and pipeline capabilities as fully featured data integration platforms such as Informatica. But it does so with an easy-to-use, extremely agile dataflow definition user experience, integrated packaging on an elastic cloud-native infrastructure, and transparent pricing one expects from the cloud data integration.
|Datameer Spectrum||Informatica (PowerCenter and Cloud Data Integration)|
|Supports multiple forms of integration including ETL, data integration, data preparation, and data engineering.||Only supports ETL and point-to-point (no transformations) integration.|
|Supports ETL, ELT, data preparation, data engineering and data science pipeline jobs.||Supports ETL jobs.|
|Has an easy to use spreadsheet-style, and wizard-driven user experience that is code-free; Includes a large library of over 300 functions and tasks; Offers interactive data previews, visual data profiling and visual data exploration at any stage; Full data preparation capabilities are included.||Offers a more complex user experience that is dataflow oriented and code-free; Includes a limited library over 140 transformation functions and tasks; Does not offer data previews, data profiling or data exploration; No data preparation available.|
|Project and folder style asset management with reusable and shareable components; Supports both asset tagging and annotations, and full search to discover assets.||Project and folder style asset management with reusable and shareable components; Only supports asset tagging and has no asset search.|
|Integrated enterprise security.||Enterprise security at an extra cost.|
|Integrated data governance.||Data governance in a separate, additional cost product.|
|Uses fully elastic Spark cluster to execute jobs (included).||Uses proprietary engine for jobs (included). Elastic cluster available at an extra cost.|
|Offers a simple pricing model, at competitive per user pricing in a single package.||Offers a complex pricing model that can be expensive, with many add-ons costs or additional products for enterprise features.|