What is Big Data?

Big Data is the combination of data volume, velocity, and variety.

  • Big Data analytics provides opportunities to discover deeper, more complete business insights through the analysis and visualization of significant volumes of rapidly changing structured and unstructured data.  Traditional business intelligence systems struggle to analyze big data because they were designed for the analysis of relatively small amounts of structured data and simply cannot easily analyze unstructured data, large data sets or data sources that change quickly.

    Big Data analytics originated with web search companies who had the need to index and analyze large volumes of web pages.  Their work eventually led to the creation of Hadoop.

  • big data processing


  • Hadoop Analytics
  • Apache™ Hadoop™

    Datameer is natively built on Hadoop to provide big data analytics, integration and visualization.

    Apache™ Hadoop™ is the open source implementation of the Google data storage and processing infrastructure. It uses a programming model called Map and Reduce that was already in use in the 70s in functional programing languages like LISP where all data could not be loaded into memory.

    MapReduce is easily parallelizable, scales linearly and is highly optimized for analytical workloads.

    Hadoop became very popular after companies like Yahoo and Facebook used it to analyze user interaction data.

  • Why Hadoop?

    Scalable, Economical, Compute, Storage

    Hadoop brings a new way to store and analyze data. Since it is linear scalable on low cost commodity hardware, it removes the limitation of storage and compute from the data analytics equation. Instead of pre-optimizing data in the traditional ETL, data warehouse, and BI architecture, Hadoop stores all of the raw data and applies all transformation and analytics that it might be done on demand. Think of a traditional, static schema database as cache, that thanks to Hadoop, we don't need anymore.

  •  

    processing on big data
  • Analytics on Hadoop What is Big Data Processing
  • Hadoop: optimized for analytics

    New Architecture, New Use Cases

    Apache Hadoop is optimized for analytical workloads. The MapReduce programming model is designed for analytics and the Hadoop file system is optimized for sequential data access. On the other hand, traditional RDMS databases are purpose built and optimized for record storage and retrieval with random read and write access. Thus, Hadoop is magnitudes faster for analytics workloads that need to scan through all the data like joins and aggregations.

    Download the Why Hadoop PDF

  • Comparing random vs. sequential

    Small and Big Data

    Hadoop's low-level optimization for analytic workloads makes it a powerful platform on individual computers as well as clusters of machines. Optimized sequential data access is not only faster on normal hard drives but also on new Solid State Drives and even outperforms in-memory random data access. Therefore, using Hadoop for small datasets on your desktop makes sense since your data may grow or you may develop data analytics that one day will run against a larger data set on a cluster.

  • Hadoop Analytics; Random vs. Sequential
  • Choose the best Hadoop for you

    Hadoop Distributions

    Hadoop recently became very popular with several different vendors offering distributions with a set of optimizations and features. Datameer is committed to supporting all of the Hadoop distributions and allows easy migration from one to other. Datameer isolates the end user from the lower level technical details and provides an simple though powerful web based application on top that abstracts all interactions with Hadoop.

  •  

     

    • Apache
    • Amazon
    • Cloudera
    • EMC
    • Hortonworks
    • IBM
    • MapR
    • Microsoft
  • +1.800.874.0569