Apache™ Hadoop™
Datameer is natively built on Hadoop to provide big data analytics, integration and visualization.
-
-
What's a Hadoop?
Apache™ Hadoop™ is the open source implementation of the Google data storage and processing infrastructure. It uses a programming model called Map and Reduce that was already in use in the 70s in functional programing languages like LISP where all data could not be loaded into memory.
MapReduce is easily parallelizable, scales linearly and is highly optimized for analytical workloads.
Hadoop became very popular after companies like Yahoo and Facebook used it to analyze user interaction data.
-
Why Hadoop?
Scalable, Economical, Compute, Storage
Hadoop brings a new way to store and analyze data. Since it is linear scalable on low cost commodity hardware, it removes the limitation of storage and compute from the data analytics equation. Instead of pre-optimizing data in the traditional ETL, data warehouse, and BI architecture, Hadoop stores all of the raw data and applies all transformation and analytics that it might be done on demand. Think of a traditional, static schema database as cache, that thanks to Hadoop, we don't need anymore.
-
Hadoop: optimized for analytics
New Architecture, New Use Cases
Apache Hadoop is optimized for analytical workloads. The MapReduce programming model is designed for analytics and the Hadoop file system is optimized for sequential data access. On the other hand, traditional RDMS databases are purpose built and optimized for record storage and retrieval with random read and write access. Thus, Hadoop is magnitudes faster for analytics workloads that need to scan through all the data like joins and aggregations.
-
Comparing random vs. sequential
Small and Big Data
Hadoop's low-level optimization for analytic workloads makes it a powerful platform on individual computers as well as clusters of machines. Optimized sequential data access is not only faster on normal hard drives but also on new Solid State Drives and even outperforms in-memory random data access. Therefore, using Hadoop for small datasets on your desktop makes sense since your data may grow or you may develop data analytics that one day will run against a larger data set on a cluster.
-
-
Choose the best Hadoop for you
Hadoop Distributions
Hadoop recently became very popular with several different vendors offering distributions with a set of optimizations and features. Datameer is committed to supporting all of the Hadoop distributions and allows easy migration from one to other. Datameer isolates the end user from the lower level technical details and provides an simple though powerful web bases application on top that abstracts all interactions with Hadoop.
-
- Apache
- Amazon
- Cloudera
- EMC
- Hortonworks
- IBM
- MapR
- Microsoft
logo are trademarks of the Apache Software Foundation. Other names may be trademarks of their respective owners.
