Go beyond data lakes! Datameer Spotlight lets you efficiently manage, catalog, and query ANY data from across your enterprise to perform ANY form of analytics. Your analytics community can quickly discover, model, consume, and govern data for analytics on an automated SaaS-based service that delivers faster, trust assets, and immediate time to value.
Dremio is a data lake engine that allows you to organize data dispersed in your data lake and perform much faster queries on it. Data lakes have historically been very disorganized and suffered from poor query performance. Dremio attempts to alleviate both pain points with a self-service semantic layer that maps the underlying data and a robust in-memory query engine.
Dremio has three major components:
Data analysts and data engineers work in the toolset to create the right analytics-ready datasets based on the raw data. At the core of each virtual dataset is a SQL query that defines the structure and physical datasets (raw data).
Dremio can use data from two major areas: data lakes and databases. Data lake data sources include files, cloud object stores (AWS S3, Azure ADLS, etc.), and Hadoop data stores (Hive, HBase, etc.). It can also connect to and query from typical databases like Oracle, Teradata, Microsoft SQL Server, MySQL, Amazon Redshift, etc.
The semantic layer is the repository of datasets that analysts can query for their analytics. Users can see metadata about the datasets and derived semantic information such as transformations and data lineage. They can also add user-provided “Wiki-style” descriptions for datasets and spaces (collections of datasets) and tags to datasets. There are limited search facilities allowing simple search on metadata and tags.
The query engine facilitates SQL-based queries on the data, based on the datasets in the semantic layer. It is based on the Apache Arrow open source project. The query engine uses data reflections (materialized views), in-memory caching, and pipelining to accelerate queries and performance.
Datameer Spotlight is a virtual data management platform and data catalog that gives analytics teams easy access to all enterprise data assets—regardless of type or location. Spotlight flips the analytics data model on its head, eliminating the need for costly ETL and data replication for analytics and wasted time waiting for data.
Spotlight lets analysts quickly discover, create, share, and collaborate on data assets, building knowledge and trust along the way. It provides a single place where analytics teams can quickly discover all these analytics assets and understand which best solve their problem to produce actionable results promptly. It provides an environment where teams can share and reuse assets, collaborates to form new assets and increase knowledge using familiar social media-like features and AI-augmented information about asset utilization.
Under the covers, Spotlight provides a scalable, performant virtual data query and access environment that brings together all the data analysts need without the need to ETL or replicate data. Spotlight is a SaaS-managed service that does not require IT administration and uses patent-pending optimization techniques and elastic compute architecture to maintain performance and scale.
Spotlight increases the ROI on your data, BI, and analytics investments. It works with any data source you may have – databases, data warehouses, data lakes, files, and applications – and any BI, analytics, and data science tool used. Best of all, the virtual query engine eliminates the need for ETL, allowing you to lower your data integration costs.
At its core, Dremio is a query acceleration engine for cloud data lakes, with a self-service modeling and semantic layer. Spotlight is purpose-built to accelerate any analytics (not just data lakes) with a highly optimized virtual data management server, a broad suite of connectivity to any data, and a collaborative catalog for easy data discovery.
Spotlight and Dremio have a few things in common:
Beyond this, Spotlight offers several key differentiated capabilities versus Dremio that allow it to facilitate faster analytics of any kind:
Dremio has a minimal set of data connectors that are extremely focused on data lakes (files, cloud object stores, and Hadoop) and databases (7 in total including Redshift, Oracle, Teradata, DB2, and others). The main objective is to facilitate analytics on your data lake and combining it with supporting data from data marts and warehouses.
Spotlight has over 200 connectors to a wide variety of data sources: databases, data warehouses, cloud data warehouses, analytical data sources, SaaS applications, cloud services, and more. Spotlight’s objective is to facilitate cloud-based analytics across ANY and ALL of your data, supporting analytics of any form.
At the core of each dataset in the Dremio semantic layer is a SQL query. Dremio does provide some easy menu-based operations for JOINs and basic transformations and offers visual data lineage. But essentially, creating most virtual datasets requires SQL coding.
Spotlight has a codeless, visual approach to modeling through its intuitive point-and-click interface. Spotlight introspects and catalogs the objects from your sources, lets you search and discover the right assets for your analysis, and has AI-driven recommendations to guide the modeling process.
Dremio has a catalog view that allows users to see Wiki content, tags, and fields for all the datasets. Related datasets can be organized into spaces to make them easier to find. Users can search for datasets based on field and table names or find all datasets with a specific tag. To fully explore a dataset, it needs to be queried from an external tool.
Spotlight has a rich catalog and allows users to easily search across names, descriptions, tags, custom properties, and any item in the catalog. Search results can also be filtered by who is using a dataset (owners and collaborators) and other usage information. Spotlight provides a detailed data preview, and users can open a dataset in their favorite BI tool from within Spotlight to explore it visually.
Dremio has an easy-to-use but limited data catalog. For each dataset, the Dremio semantic layer contains three items: the physical metadata (including lineage and transformations), a Wiki-like description, and ad-hoc tags.
Spotlight contains a very rich data catalog and semantic layer. Beyond the physical metadata, users can provide information about the data, including tags, descriptions, and comments. They can also certify assets, provide custom properties, and add business-level metadata. Spotlight supplements this by capturing information on where an asset is referenced, who is using it, and how often it is used. Users can search for assets across technical metadata and all of the added knowledge.
With Dremio, users can share, reuse, and chain virtual datasets. Spaces can be used as an area where users can collect related datasets and perform rudimentary collaboration on a project.
Spotlight allows users to work together in shared workspaces to collaborate, add knowledge (tags, properties, etc.), and create additional shared assets. It also supports social media-like features around assets. The owner can add collaborators, and users can request to follow or fully collaborate on an asset. Once added, followers will receive notifications in their activities inbox. Collaborators can fully exchange comments and notifications on assets.
Dremio provides its own user-, group-, and role-based security. It can integrate with LDAP and SSO for enterprise security and access rights and supports Personal Access Tokens (PTOs). At the data level, users can have Edit rights (ability to modify a dataset), Query rights (ability to query/use a dataset), or no rights (will not see the dataset). Dremio also supports encryption on the wire.
Dremio DOES NOT validate user access rights to data objects with the originating data source, forcing data stewards to re-implement data-level access controls in Dremio and creating potential security holes. A user with access to a physical dataset can create a virtual dataset containing the physical data, then grant access to the virtual datasets to another user who may not have access to the physical data creating a security loophole.
Spotlight provides a deep set of security capabilities, including:
Spotlight is intentionally designed NOT to replicate already-in-place access control mechanisms in place for the data. Metadata visibility controls in Spotlight and data assess controls independent of each other. The data source maintains access control to the data. The Spotlight user’s security credentials are passed down to the source at query execution time, eliminating potential conflicts and loopholes. Even when it caches datasets, Spotlight always re-authenticates with the originating data sources before permitting access, maintaining consistent security across all data.
Data governance goes beyond security, allowing organizations to understand what data assets are made up of, their meaning, and how they are being used. In many organizations, strong data governance is needed for regulatory requirements.
Dremio provides only physical metadata and data lineage for governance. Spotlight contains several features to maintain governance, including:
Dremio works like a piece of data infrastructure, and as such, requires a great deal of administration to scale and maintain performance. To gain performance via caching, data reflections and associated refresh jobs need to be defined and managed. And their “elastic engines” are a bit of a misnomer as the engines require a great deal of setup and maintenance, use a predefined compute resource, and do not auto-scale to the needs of queries or jobs.
Spotlight is a SaaS-managed service that requires no operational administration, particularly for performance and scale. Under the covers are managed Spark clusters that are elastic and can auto-scale to your environment’s needs to maintain high performance and fast response time.
Spotlight lets you simplify and scale-out your data management for any form of analytics, not just cloud data lakes. It virtually connects directly to over 200 different data sources, offering much broader access to data. Spotlight also provides visual, code-free data modeling and a much richer data catalog and semantic layer, facilitating faster discovery, knowledge-sharing and collaboration, and better data governance. And the auto-scaling elastic service dramatically reduces the administrative overhead that Dremio burdens your team.
|Datameer Spotlight||Dremio Enterprise|
|With connectors to over 200 different sources, Spotlight lets teams work with any data for any type of analytics.||Dremio only has connectors to data lakes and databases, focusing only on data lake analytics.|
|Spotlight provides an entirely code-free, visual data modeling environment.||Dremio forces you to perform modeling in SQL, with a few point and click elements.|
|Spotlight has a rich catalog and allows users to easily search across names, descriptions, tags, custom properties, and any item in the catalog.||Dremio has a simple catalog view allowing users to browse content for datasets with a simple search based on names and tags.|
|Spotlight contains a rich data catalog and semantic layer with physical metadata, tagging, descriptions, comments, custom properties, business-level metadata, and usage information.||Dremio has an easy to use but limited semantic layer with physical metadata, a Wiki-like description, and ad-hoc tags.|
|Spotlight allows users to work together in shared workspaces to collaborate, add knowledge (tags, properties, etc.), and create additional shared assets. It also supports social media-like features around assets.||With Dremio, users can share, reuse, and chain virtual datasets. Spaces can be used as an area where users can collect related datasets and perform rudimentary collaboration on a project.|
|Spotlight provides complete, end-to-end enterprise security and pushes down data access controls to maintain data source security integrity.||Dremio has user- and role-based security that requires you to replicate security controls and can create potential data security holes.|
|To maintain good governance, Spotlight maintains multiple forms of metadata about assets, full lineage, and usage auditing. It maintains multiple system-level and user-set properties such as status, which can define an asset's state.||Dremio has limited data governance features (physical metadata and lineage).|
|Spotlight is a SaaS managed service that requires no operational administration, particularly for performance and scale. Under the covers are managed Spark clusters that are elastic and can auto-scale to your environment's needs.||Dremio requires a great deal of administration to define and maintain data reflections, associated refresh jobs, and "elastic engines."|