Apache™ Hadoop™

Datameer is natively built on Hadoop to provide big data analytics, integration and visualization.

  • Hadoop
  • What's a Hadoop?

    Apache™ Hadoop™ is the open source implementation of the Google data storage and processing infrastructure. It uses a programming model called Map and Reduce that was already in use in the 70s in functional programing languages like LISP where all data could not be loaded into memory.

    MapReduce is easily parallelizable, scales linearly and is highly optimized for analytical workloads.

    Hadoop became very popular after companies like Yahoo and Facebook used it to analyze user interaction data.

  • Why Hadoop?

    Why Hadoop

    Scalable, Economical, Compute, Storage

    Hadoop brings a new way to store and analyze data. Since it is linear scalable on low cost commodity hardware, it removes the limitation of storage and compute from the data analytics equation. Instead of pre-optimizing data in the traditional ETL, data warehouse, and BI architecture, Hadoop stores all of the raw data and applies all transformation and analytics that it might be done on demand. Think of a traditional, static schema database as cache, that thanks to Hadoop, we don't need anymore.

  • Why Hadoop Why Hadoop
  • Hadoop: optimized for analytics

    New Architecture, New Use Cases

    Apache Hadoop is optimized for analytical workloads. The MapReduce programming model is designed for analytics and the Hadoop file system is optimized for sequential data access. On the other hand, traditional RDMS databases are purpose built and optimized for record storage and retrieval with random read and write access. Thus, Hadoop is magnitudes faster for analytics workloads that need to scan through all the data like joins and aggregations.

    Download the Why Hadoop PDF

  • Comparing random vs. sequential

    Small and Big Data

    Hadoop's low-level optimization for analytic workloads makes it a powerful platform on individual computers as well as clusters of machines. Optimized sequential data access is not only faster on normal hard drives but also on new Solid State Drives and even outperforms in-memory random data access. Therefore, using Hadoop for small datasets on your desktop makes sense since your data may grow or you may develop data analytics that one day will run against a larger data set on a cluster.

  • Random vs. Sequential
  • Choose the best Hadoop for you

    Hadoop Distributions

    Hadoop recently became very popular with several different vendors offering distributions with a set of optimizations and features. Datameer is committed to supporting all of the Hadoop distributions and allows easy migration from one to other. Datameer isolates the end user from the lower level technical details and provides an simple though powerful web bases application on top that abstracts all interactions with Hadoop.

  •  

    • Apache
    • Amazon
    • Cloudera
    • EMC
    • Hortonworks
    • IBM
    • MapR
    • Microsoft
  • +1.800.874.0569