Get all the capabilities of a SQL query engine like Amazon Athena, with the ability to easily federate data over many more distributed sources, model data with self-service tools, and collaborate using an integrated data catalog. Datameer Spotlight lets your analytics teams discover, model, federate, query, and govern data ANY data required on their own, without the need for IT. The result is faster, trusted insights, and immediate time to value.
Amazon Athena is an interactive query engine on AWS that attempts to make it easy to analyze data sitting in S3 data lakes or files using standard SQL. Athena’s query engine is based on Presto, an open-source distributed SQL query engine. Presto was initially developed at Facebook, then open-sourced.
The Athena service is serverless – meaning there is no need to configure and deploy AWS machine instances (and carry the cost of these instances). The service simply runs using the required compute and storage resources. Athena integrates with other AWS services to fill in functionality gaps, including Glue for data cataloging, Lambda for federated queries, IAM for security, and KMS for encryption.
Amazon Athena looks and acts like a virtual database on top of data files stored in Amazon S3. Like a database, it offers a SQL query engine and mechanisms to define data structures using SQL DDL. If you use the Amazon Glue integration, Athena can query Glue objects as if they were tables/views.
Tables/views in Athena (or from Glue) will map down to data files sitting in S3. As mentioned, Athena is serverless, requiring no predefined compute resources (machine instances). Athena will use its query optimizer to parallelize query execution as much as possible to get the best possible processing and response time.
Athena integrates with other AWS services to round out its full range of services:
On top of the platform, Athena provides standardized JDBC and ODBC interfaces and the Athena Console, which also contains a command-line interface. Athena supports a standard SQL language interface.
Datameer Spotlight is a virtual data management platform with a distributed query engine and optimizer, self-service tools, and a collaborative data catalog that gives analytics teams easy access to all enterprise data assets—regardless of type or location. Spotlight flips the analytics data model on its head, eliminating the need for costly ETL and data replication for analytics and wasted time waiting for data.
Spotlight lets analysts quickly discover, create, share, and collaborate on data assets, building knowledge and trust along the way. It provides a single place where analytics teams can quickly discover all these analytics assets and understand which best solve their problem to produce actionable results promptly. It provides an environment where teams can share and reuse assets, collaborate to form new assets, and increase knowledge using familiar social media-like features and AI-augmented information about asset utilization.
Under the covers, Spotlight provides a scalable, performant virtual data query and access environment that brings together all the data analysts need without the need to ETL or replicate data. Spotlight is a SaaS managed service that does not require IT administration and uses patent-pending optimization techniques and an elastic compute architecture to maintain performance and scale.
Spotlight increases the ROI on your data, BI, and analytics investments. It works with any data source you may have – databases, data warehouses, data lakes, files, and applications – and any BI, analytics, and data science tool used. Best of all, the virtual query engine eliminates the need for ETL, allowing you to lower your data integration costs.
Amazon Athena was designed to help make it easier to query data sitting in files on S3 using SQL and eliminate the need to ETL these files into a database or cloud data warehouse. The data residing in S3 is typically part of a cloud data lake.
Both Spotlight and Amazon Athena take a no-ETL approach to allow users to query data where it resides – Athena with S3 data files and Spotlight with any data. They both also have performant query engines that can parallelize query tasks for rapid response times.
Athena was never intended to be a general-purpose, performant distributed query engine like Presto and Spotlight. Athena uses Presto as its core query engine but does not use several connectors, optimization techniques, and other services Presto provides for highly distributed queries and data virtualization.
Spotlight provides a similar, high performance distributed query engine as Presto, with all the same bells and whistles, making distributed data look like a single database to users and ensuring high-performance queries on large datasets. But Spotlight also offers several additional differentiated capabilities that Athena does not for federating data across a larger number of sources, data cataloging, easy data discovery, analyst collaboration, and more, making it a more complete platform for managing data for analytics.
Spotlight offers an integrated architecture and set of components to discover, collaborate, and model data for analytics on top of managing, querying, and consuming it. It is a well-rounded offering that provides all the critical services to make analysts more productive and give them access to the data they need to facilitate faster analytics the business can trust.
So why would you consider Spotlight over Amazon Athena? Let’s examine why.
Everything In One – Spotlight combines multiple services to let users discover, model, collaborate, query, and federate any data to create a seamless, self-service experience for the data and business analyst. Athena forces users to jump around between various external services and use different technical skills along the way.
Work with and Federate More Data – Spotlight has connectors to over 200 different data sources of different types and locations (on-premises, cloud, SaaS). It allows the easy creation of federated data models and queries across these sources.
Self-service Data Modeling – Spotlight provides easy to use, point-and-click modeling tools to analysts to create their own federated data models without coding. Athena forces you to write SQL DDL to define basic data structures and complex SQL queries to join datasets.
Collaborative Data Catalog – Spotlight offers an integrated data catalog with a rich set of information about the data beyond the physical metadata, including tags, descriptions, comments, certification, business level metadata, and custom properties. Users can collaborate around data with shared workspaces. Athena offers integration with AWS Glue (an external service), which only holds technical metadata and offers no collaboration.
Easy Discoverability – Spotlight allows users to perform faceted-search across any information in the data catalog, provides a detailed data preview, and allows users to see which others are using the dataset to find the right dataset for the job quickly. Athena is hard-wired to the underlying data assets. Users must know what assets are in the system and their structure.
Integrated Data Governance – Spotlight provides a rich suite of data governance features to provide the proper access to, manage the privacy of, and monitor the use of data. Athena offers no data governance capabilities.
Integration with the Same Services – Spotlight also integrates with the same AWS external services that Amazon Athena uses: Glue, IAM, and KMS. This creates a cooperative metadata management, security, and encryption environment between Spotlight and your AWS cloud.
Spotlight delivers all the capabilities and benefits of a highly optimized distributed query engine that covers far more data sources and combines it with true self-service tools and packaging, and a rich, collaborative data catalog. The entirely self-service environment lets analytics teams determine their destiny, reduces the need for IT and data team involvement, and eliminates risky upfront data design projects.
|Datameer Spotlight||Amazon Athena|
|Spotlight is designed to be a complete data environment for analytics teams, hiding all the technical complexity, and providing easy to use, self-service tools.||Athena simply provides a SQL query engine, with no self-service tools, and is designed to work with just S3 data files.|
|With connectors to over 200 different sources, Spotlight lets teams work with ANY data without the need for ETL.||Athena only has out of the box integration with S3 data files and requires technical coding and services to reach additional data sources.|
|Spotlight provides a self-service visual data modeling environment where analysts can organize the data they need how they need it (including federated models). It is entirely graphical and requires no coding.||Data experts need to create objects in Athena via SQL DDL and use complex SQL to create federated queries at execution time.|
|Spotlight contains a user-friendly data catalog and semantic layer with physical metadata, tagging, descriptions, comments, custom properties, business-level metadata, and usage information.||Athena does not have a data catalog, only a physical data meta-store. AWS Glue, an external service, can be integrated with Athena to work with technical metadata only.|
|Spotlight allows users to quickly discover and explore assets using faceted-search across names, descriptions, tags, custom properties, and any item in the catalog. It also allows users to see who is using assets and how they are used to determine fit for their project.||Athena has no search and discovery features. Users must know what assets are in the system or write SQL queries against the meta-store.|
|Spotlight allows users to collaborate in shared workspaces, add knowledge (tags, properties, etc.), and create additional shared assets. It also supports social media-like features around assets.||Athena has no collaboration features.|
|To maintain good governance, Spotlight maintains multiple forms of metadata about assets, full lineage, and usage auditing. It maintains several system-level and user-set properties, such as status, which can define an asset’s state.||Athena has no data governance features.|