Hadoop provides scalable data storage using the Hadoop Distributed File System (HDFS) and fast parallel data processing on a fault-tolerant cluster of computers.
If you are a system administrator responsible for setting up or configuring a Hadoop cluster to use with Datameer, this topic provides some links you may find useful.
If you are setting up a new Hadoop system for use with Datameer, see System Requirements for details on hardware and software requirements.
Getting started with Hadoop
To learn about Hadoop, you can go directly to the source at: http://hadoop.apache.org/
If you need to learn more about the HDFS architecture, see: http://hadoop.apache.org/common/docs/current/hdfs_design.html
Here are some additional links where you can learn more about Hadoop:
- Hadoop wiki (a useful general starting point): http://wiki.apache.org/hadoop/
- Cluster setup: http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html
- Yahoo Hadoop tutorial: http://developer.yahoo.com/hadoop/tutorial/index.html
- Tuning Hadoop for performance: http://www.slideshare.net/ydn/hadoop-summit-2010-tuning-hadoop-to-deliver-performance-to-your-application
- MapReduce tutorial: http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html