The Leading No-code Airflow Alternative

Airflow & Datameer

Airflow is a general-purpose, open-source workflow tool that is used as a data orchestration tool to define and coordinate analytics data pipelines.  Airflow has some similar objectives as Datameer. Your target is often a cloud data warehouse such as a Snowflake, and it allows you to apply software engineering best practices to process the data.  The similarities end there.

For data transformation, Datameer offers an easier, hybrid no-code/SQL-code user experience usable by all your personas – data engineer, analytics engineer, data analyst, and data scientist.  The catalog-like data documentation, collaboration, easy data enrichment, deep data profiling, and Google-like search and discovery make Datameer a superior choice for your data transformation needs.

What is Airflow?

Data Governance - who is responsible

Apache Airflow is an open-source workflow management platform.  It allows teams to define, manage, execute, and monitor workflows programmatically.  Workflows contain any number of tasks, each of which will connect with various back-end services/systems and execute the task within these services/systems, with Airflow coordinating the end-to-end workflow.

Airflow is very lightweight – it pushes tasks down into underlying services/systems – and uses message queues to gain parallelism and reliability.  Airflow is designed to be scalable, dynamic, and extensible.  Workflows and tasks can be templated (via Jinja) to facilitate reuse and dynamic execution.

Airflow is also highly general purpose.  It connects to almost 100 different services which have a wide variety of applications.  Workflow tasks can retrieve data, process data, insert data, and more within these various back-end services.

Besides being available in the open-source package, other companies have embedded and used Airflow as part of their more specialized solutions include:

  • Astronomer – a SaaS-based Airflow data workflow/orchestration platform
  • Google Cloud Composer – general-purpose orchestration tool to coordinate jobs across multiple GCP services
  • AWS – offers an Airflow service to coordinate jobs across multiple AWS services

How Airflow Works

Data Governance frameworks

Users define workflows in Airflow using Python and orchestrate them via Directed Acyclic Graphs (DAGs) of tasks.  A workflow will contain any number of these tasks, and tasks may have interdependencies.  Connections to outside services/systems that tasks will interact with are defined and managed independently.

DAGs representing the overall workflow is declared and defined via Python.  A DAG contains any number of tasks, which are also defined in Python.  Each task can interact with a back-end service via the service’s API, also via Python.

Tasks contain relationships and dependencies, which help define both the order of the workflow and parallelism.  Message queues manage communication between tasks to ensure the seamless flow of data, status, and other execution information.  Message queues also allow failed workflows to be restarted at the point of failure, with the full state restored.  Airflow also contains “sensors “that wait for an event before starting or continuing the execution of a workflow or task.

Workflows are run via the Airflow Executor.  Jobs can be scheduled or triggered by external events (see sensors above).  Pools can be created and tasks assigned to them to define how parallel a job will run.  Jobs can be run on compute clusters.

Jinja can be used to template both DAGs and individual tasks.  This allows entire DAGs to be parameterized, thus executed in different ways via the parameters, and tasks to be parameterized and reused in multiple DAGs.

How Do Organizations Use Airflow?

security-and-data-governance

Airflow is a general-purpose workflow platform that can be used for any type of job.  A common use case for Airflow is to orchestrate data pipelines that include data transformation.  For the purpose of this comparison, we will explore using Datameer for orchestrating data transformation workflows versus Airflow.

What is Datameer?

Datameer is a powerful SaaS data transformation platform that runs in Snowflake – your modern, scalable cloud data warehouse – that combines to provide a highly scalable and flexible environment to transform your data into meaningful analytics.  With Datameer, you can:

  • Allow your non-technical analytics team members to work with your complex data without the need to write code using Datameer’s no-code and low-code data transformation interfaces,
  • Collaborate amongst technical and non-technical team members to build data models and the data transformation flows to fulfill these models, each using their skills and knowledge
  • Fully enrich analytics datasets to add even more flavor to your analysis using the diverse array of graphical formulas and functions,
  • Generate rich documentation and add user-supplied attributes, comments, tags, and more to share searchable knowledge about your data across the entire analytics community,
  • Use the catalog-like documentation features to crowd-source your data governance processes for greater data democratization and data literacy,
  • Maintain full audit trails of how data is transformed and used by the community to further enable your governance and compliance processes,
  • Deploy and execute data transformation models directly in Snowflake to gain the scalability you need over your large volumes of data while keeping compute and storage costs low.

Datameer provides a number of key benefits for your modern data stack and cloud analytics, including:

  • Creating a highly efficient data stack that reduces your data and analytics engineering costs,
  • Allowing you to share the data transformation workload across your broader data and analytics team,
  • Fostering collaboration among the data and analytics team to produce faster, error-free projects,
  • Efficiently using your Snowflake analytics engine for cost-effective data transformation processing,
  • Enabling you to crowd-source your data governance for more effective and efficient governance processes, and
  • Improving data literacy to expand knowledge and effective use of your data.
checklist-icon

Comparison

At the surface level, it might seem obvious that there is no comparison between the Datameer and Airflow.  The two offerings focus on different problems and have very different approaches to solving their respective problems.

Airflow is a highly programmatic approach to workflows, including data orchestration workflows.  Users require heavy Python experience and strong knowledge of the underlying service APIs which the tasks use.

Yet, some organizations use Airflow to orchestrate data pipelines which will include data transformation, or specifically use Airflow for data transformation.  The data transformation tasks will either (a) transform the data directly in Python, or (b) load data into database schemas, transform the data using SQL statements in Python, then reload the data into new schemas – both highly programmatic.

If data architects are coordinating complex dataflows across multiple systems that have transactional properties, Airflow is a good tool and platform.  In this use case, data architects will require the precision and control that Airflow offers.  Data architects will also have strong Python skills and an understanding of the underlying APIs they use.

For data pipelines and data transformation, Airflow’s complexity and sophistication make it overkill for the use case.  Modern ELT data pipelines can easily be defined and managed using a combination of no-code EL tools such as Fivetran and no-code/low-code data transformation tools such as Datameer – without writing ANY code.  This allows your broader community of non-programmers in your analytics community to get involved in the analytics engineering process and expand the speed and adoption of your analytics.

Datameer Airflow
Purpose-built tool and platform for data transformation and modeling General-purpose workflow and orchestration tool
No-code, low-code, and SQL-code interfaces for data modeling and transformation. No Python or Jinja needed Python and Jinja interfaces requiring strong programming knowledge
Abstracts the user from the underlying services and system (Snowflake) Requires strong understanding of how to use underlying data services and interfaces
Schemas and models are automatically carried forward between steps, requiring no coding Data elements need to be redefined within each task if carried between tasks

Why Datameer?

Datameer is explicitly designed and optimized for in-Snowflake data transformations and hits a home run for this use case.  For data transformation, Datameer offers many advantages:

  • Hybrid SQL/low-code/no-code UI – Datameer’s hybrid user experience with three different user interfaces enables your entire team to get involved in the analytics engineering process, sharing the workload and enabling self-service in the analytics community.
  • Easier data enrichment – Datameer’s spreadsheet-like UI and its ability to easily add file-based data make for a much easier path to enrich data.  Adding new, enriched columns is led by an easy, wizard-driven formula builder.
  • Rich, catalog-like data documentation – Datameer automatically documents system-level metadata and properties, and users can further enrich the information with wiki-style descriptions, custom properties and attributes, tags, and comments.
  • Collaboration – Teams can use shared workspaces to share, reuse, and collaborate around models to speed projects. Different model types can be mixed and matched into larger dataflows.  The rich data documentation facilitates knowledge sharing.
  • Data profiling – Datameer maintains a deep data profile that is expressed visually to users so they can see the full shape and contents of the data as they transform it.  This also allows users to identify invalid, missing, or outlying fields and values.
  • Google-like faceted search and discovery – Datameer offers a Google-like faceted search that allows users to discover data models and datasets.  The search covers all the data documentation and information to find just the right data model.
  • InSnowflake execution – Datameer uses the scalable, cost-efficient Snowflake engine you already own for data transformations and keeps your data safe and secure within Snowflake as you transform it.
checklist-icon

Conclusion

Airflow is very good for coordinating complex dataflows across multiple systems that have transactional properties.  It gives data architects precision and control and lets them use their strong Python skills and API knowledge. But, for data pipelines and data transformation, Airflow’s complexity and sophistication make it overkill.

Datameer’s explicit focus on in-Snowflake data transformation makes it much more applicable for ELT data pipelines.  It offers a much more inclusive and easier user experience that supports multiple personas, collaboration among team members, a much deeper set of searchable, catalog-like data documentation, and transforms directly in Snowflake, using its powerful engine and keeping data and models secure.

Are you interested in seeing Datameer in action?  Contact our team to request a personalized product demonstration.

Comparison Table

Datameer Airflow
Data transformation General-purpose data workflow
In cloud data warehouse Uses the engines of underlying services used within tasks
Three distinct UIs for code (SQL), low-code (spreadsheet-like), and no-code (graphical) Programmatic Interactive Development Environment (IDE)
UI/UX that supports all your personas: data engineer, analytics engineer, data analyst, and data scientist Only supports strong programming personas
Easy, no-code data enrichment via a wizard-driven formula builder in the spreadsheet UI Programmatic via Python
Shared workspaces, model reuse, mix-and-match of model types, and shared catalog-like data documentation facilitate collaboration None
Maintains a deep, visual data profile that easily allows users to identify invalid, missing, or outlying fields and values, as well as the overall shape of the data None
A rich set of catalog-like auto-generated and user-created data documentation, including system-level metadata and properties, wiki-style descriptions, custom properties and attributes, tags, and comments None
Google-like faceted search across all information captured on the data, including system-level metadata and properties, descriptions, custom properties and attributes, tags, and comments None

See How Quickly Datameer Can Transform Your Data in Snowflake.

Learn More