What is Big Data?
Big Data is the combination of data volume, velocity, and variety.
Big Data analytics provides opportunities to discover deeper, more complete business insights through the analysis and visualization of significant volumes of rapidly changing structured and unstructured data. Traditional business intelligence systems struggle to analyze big data because they were designed for the analysis of relatively small amounts of structured data and simply cannot easily analyze unstructured data, large data sets or data sources that change quickly.
Big Data analytics originated with web search companies who had the need to index and analyze large volumes of web pages. Their work eventually led to the creation of Hadoop.
Datameer is natively built on Hadoop to provide big data analytics, integration and visualization.
Apache™ Hadoop™ is the open source implementation of the Google data storage and processing infrastructure. It uses a programming model called Map and Reduce that was already in use in the 70s in functional programing languages like LISP where all data could not be loaded into memory.
MapReduce is easily parallelizable, scales linearly and is highly optimized for analytical workloads.
Hadoop became very popular after companies like Yahoo and Facebook used it to analyze user interaction data.
Scalable, Economical, Compute, Storage
Hadoop brings a new way to store and analyze data. Since it is linear scalable on low cost commodity hardware, it removes the limitation of storage and compute from the data analytics equation. Instead of pre-optimizing data in the traditional ETL, data warehouse, and BI architecture, Hadoop stores all of the raw data and applies all transformation and analytics that it might be done on demand. Think of a traditional, static schema database as cache, that thanks to Hadoop, we don't need anymore.
Hadoop: optimized for analytics
New Architecture, New Use Cases
Apache Hadoop is optimized for analytical workloads. The MapReduce programming model is designed for analytics and the Hadoop file system is optimized for sequential data access. On the other hand, traditional RDMS databases are purpose built and optimized for record storage and retrieval with random read and write access. Thus, Hadoop is magnitudes faster for analytics workloads that need to scan through all the data like joins and aggregations.
Comparing random vs. sequential
Small and Big Data
Hadoop's low-level optimization for analytic workloads makes it a powerful platform on individual computers as well as clusters of machines. Optimized sequential data access is not only faster on normal hard drives but also on new Solid State Drives and even outperforms in-memory random data access. Therefore, using Hadoop for small datasets on your desktop makes sense since your data may grow or you may develop data analytics that one day will run against a larger data set on a cluster.
Choose the best Hadoop for you
Hadoop recently became very popular with several different vendors offering distributions with a set of optimizations and features. Datameer is committed to supporting all of the Hadoop distributions and allows easy migration from one to other. Datameer isolates the end user from the lower level technical details and provides an simple though powerful web based application on top that abstracts all interactions with Hadoop.
logo are trademarks of the Apache Software Foundation. Other names may be trademarks of their respective owners.