The Top 5 ETL Tools in 2021
- John Morrell
- April 15, 2021
ETL Tools Market Trends
The ETL tools market continues to grow at a strong pace, reaching $8.5 billion in 2019, and is expected to grow at a CAGR of 13.9% to reach $22.3 billion by 2027. The market is quite mature, with one of the long-time independent suppliers such as Informatica having been founded in 1993.
But as the ETL market moves into its fourth decade in the 2020s, A number of new trends are driving new growth in the market:
- Cloud – the high popularity of cloud platforms and data warehouses has caused strong growth in the cloud data integration segment of this market.
- More data sources – new forms of data and SaaS applications and cloud services has further complicated already dispersed data landscapes.
- ELT – with the availability of inexpensive compute resources in cloud data warehouses, a new model of data integration has emerged – Extract, Load, and Transform.
- Speed and simplicity – with data engineering resources already stretched, new tools look to allow less technical staff to create data pipelines with wizard-driven simplicity.
- Discovery – with data pipelines creating a growing number of analysis-ready datasets, analytics teams need interfaces to search and discover the right data for the job.
What Comes with My Cloud?
The three major cloud platforms offer their own ETL tools: AWS Glue, Azure Data Factory, and Google Cloud Data Fusion. Each is unique, but all three have limited functionality when it comes to data pipeline definition, with poor dataflow designers that often force users to break down and write ETL code. In addition, many of the cloud platforms have gaps when it comes to enterprise security and governance, and are not suitable for bridging on-premises and cloud data sources.
A recent GigaOm white paper recently outlined many of the functionality gaps in the cloud vendor ETL tools and recommended using third-party tools.
The ETL Tool Leaders
Off the top 5 ETL vendors in 2021, two are more traditional suppliers, Informatica and Talend, while another two are considered modern, cloud ETL vendors, Fivetran and Matillion. And the fifth is a supplier of a modern, ETL++ platform, Datameer. Let’s explore and quickly compare these top 5 vendors.
Datameer Spectrum is a fully-featured ETL++ data integration platform with a broad range of capabilities for extracting, exploring, integrating, preparing, delivering, and governing data for scalable, secure data pipelines. Once integration dataflows are ready, Spectrum’s enterprise-grade operationalization, security, and governance features enable reliable, automated, and secure data pipelines to ensure a consistent data flow.
Spectrum offers a comprehensive suite for data integration, supporting analyst self-service data preparation, data science, and data engineering use cases, thereby enabling a single hub for all data pipelines across an enterprise. Its point-and-click simplicity makes it easy for analysts and data scientists, and even non-programmers, to create data integration pipelines of any level of sophistication, allowing you to make your data analytics-ready 10 to 20 times faster at a fraction of the cost.
Spectrum provides a hybrid ETL and ELT platform for flexibility to support both data integration styles on the same scalable platform. Spectrum is cloud-native on all three major cloud platforms (AWS, Azure, GCP) and carries with it the elasticity and cost economics you would expect from the cloud. Spectrum also bridges any data sources you have regardless of type, format, and location (cloud or on-premises).
Informatica offers an end-to-end data integration platform that has an extensive set of capabilities. The company has a portfolio of data integration and cloud data integration products and areas related to data integration, such as data engineering, data cataloging, data quality, data governance, and master data management.
Informatica’s legacy data integration product – PowerCenter – was designed and optimized for on-premises deployments. Only recently (in 2018) did Informatica move their data integration products to the cloud in both their own – Informatica Cloud – and on public clouds (AWS, Azure, and GCP). The main data integration products work with the ETL data flow style model. Many of Informatica’s enterprise features listed above are only available as add-ons or separate products.
When comparing Spectrum and Informatica, there are major differences between the two:
- Spectrum supports a more general approach to data integration, freeing designers to piece together dataflows that best support their needs, while Informatica has a very rigid ETL approach.
- Spectrum offers wizard-led data extraction, an easy spreadsheet-style interface, over 300 single-click functions that make it faster and easier to design ETL flows, while Informatica’s rigid, dataflow UI complicates designed ETL processes.
- Spectrum has integrated data preparation capabilities and a library of over 300 functions, while Informatica contains limited data preparation with only 100+ functions.
- Spectrum offers integrated security and governance with enterprise-class capabilities, while Informatica requires add-ons for advanced security and a separate product for data governance.
- Spectrum provides all features integrated into a single platform with transparent, consumable cloud-based pricing, while Informatica’s pricing has limited transparency and requires additional products for many of the enterprise features.
Read our more detailed comparison of Spectrum and Informatica.
Talend offers a comprehensive integration platform covering a full range of integration scenarios. Talend roots are in an open-source data integration platform. On the data integration side, they offer a core data integration platform (on-premises or in the cloud), a specialized data replication product for the cloud – Stitch – and related products for data cataloging, data preparation, and data stewardship.
When comparing Spectrum and Talend, users can immediately see the differences:
- Spectrum offers a single integrated platform and toolset that provides a unified data integration hub for any of your use cases, while the Talend platform consists of a complex set of tools designed for specific purposes that are not well integrated.
- Spectrum provides a single, seamless user experience that supports all aspects of data pipeline creation and management without any coding, while Talend requires using multiple tools, each with specific functionality.
- Spectrum has an extensive array of over 80 connectors designed to work with different sources and many formats – structured, semi-structured, and unstructured, while Talend has a limited set of connectors requiring specific functions.
- Spectrum’s platform has the same functionality regardless of whether running it on-premises or in the cloud, while Talend’s platform has mismatches in functionality between Talend’s on-premises and cloud offerings.
- With Spectrum, data preparation is a central part of the platform and a critical piece of the user experience, while Talend has a very limited, unintegrated data preparation tool.
- Spectrum offers a simple, cost-effective pricing model based on the number of users and compute resources required for data pipelines, while Talend has an old-fashioned costly pricing model and requires you to purchase add-ons for additional enterprise features.
Read our more detailed comparison of Spectrum and Talend.
Fivetran is a cloud-based ELT data integration platform that offers a simple, reliable way to replicate and synchronize data into your cloud data warehouse (CDW). It is a basic, reliable service that lets you set up “connections” between your data sources – primarily SaaS applications, cloud services, and cloud databases – and your cloud data warehouse. Transformation capabilities require SQL coding or using an add-on open-source package called dbt.
When comparing Spectrum and Fivetran, the following differences can be seen:
- Spectrum supports a wide range of data integration patterns – ETL, ELT, data preparation, and data science pipelines – while Fivetran supports one simple data integration pattern – ELT.
- Spectrum has a scalable, flexible job execution system for operationalizing jobs to run reliably and is extremely easy to operate, while Fivetran only supports a basic operationalization model around each connection.
- Spectrum provides enterprise-class security and governance features, while Fivetran offers standard security and no governance features.
- Spectrum offers the ability to integrate and bridge on-premises data sources into the cloud, while Fivetran lacks this ability.
- Spectrum includes a powerful yet easy-to-use data preparation capability that allows analysts and data scientists to shape data to their needs without any coding, while Fivetran’s transformation capabilities require writing sophisticated SQL code or using open source dbt.
- Using Spectrum’s ETL model, all transformations are performed in-transit within Spectrum, using its compute infrastructure, while Fivetran’s ELT model transforms in the CDW, creating extra, hidden costs on your CDW bill.
Read our more detailed comparison of Spectrum and Fivetran.
Matillion is one of the younger, cloud-based ETL solutions on the market. It consists of three components: the underlying platform, a graphical data orchestration tool, and a management tool. Matillion does not have a storage and execution engine, and all data processed in a data flow is stored in its intermediate form in your cloud data warehouse tables.
When comparing Spectrum and Matillion, the following differences are immediately observable:
- Spectrum’s spreadsheet-style UI makes it easy to interactively put together transformation operations needed, while Matillion’s data orchestration tool forces users to string together many components into a flow, making overly complex data flows and inefficient pipelines.
- With Spectrum, data preparation and transformation is a first-class part of the product with a deep library of close to 300 functions, each applicable graphically, while Matillion offers a limited set of 25 components for data transformation.
- Spectrum uses secure protocols, enterprise security controls, and encryption to integrate and bridge on-premises sources to the cloud securely while Matillion offers no capabilities to reach and integrate on-premises sources.
- Spectrum provides enterprise-class security and governance features, while Matillion offers limited, standard security and no data governance.
- Spectrum offers a single integrated platform that offers all the capabilities for many different use cases, providing a versatile, unified data integration hub, while Matillion is only suitable for data integration of cloud and SaaS data sources into a cloud data warehouse.
- Spectrum runs its own elastic Spark-based compute cluster to give jobs scale and performance and uses a patented Smart ExecutionTM optimizer to efficiently execute jobs, while Matillion relies on the CDW for processing and has no intelligence to optimize jobs.
- With Spectrum, all transformations are performed in-transit using its compute infrastructure, while Fivetran’s ELT model transforms in the CDW, creating extra, hidden costs on your CDW bill.
Read our more detailed comparison of Spectrum and Matillion.
While highly mature, the ETL market continues to grow rapidly and see innovation. The cloud, a growing number of data sources, new processing models, and greater speed and simplicity are driving new innovation and capabilities in the ETL market.
But why settle old-fashioned ETL tools that were not made for the cloud and offer dis-integrated pricey services, or even so-called modern ETL tools that offer extremely limited capabilities. Use the modern ETL++ capabilities of a Datameer Spectrum that gives you the best of both worlds – the enterprise capabilities of traditional ETL tools and the speed and simplicity of modern ETL tools – at easily consumable cloud-based pricing.