Q. How can I monitor the performance of Datameer?
Q. Where are my Amazon EC2 and SCP credentials stored?
A. Security credentials specified through the Datameer UI are stored (encrypted) in the Datameer metadata store (normally MySQL database) which stores all information related to data stores, workbooks, etc. See: Monitoring Hadoop and Datameer using Nagios
Q. How can I open a workbook that seems corrupted anyway?
A. When a formula of a workbook gets corrupted (e.g. manual formula editing on the database level or a failed database migration), a workbook becomes inaccessible and you will see the message "Oops, an error occurred'. In this situation, try the following setting in the file conf/default.properties:
and restart Datameer application.
After this try to reopen your workbook again.
Q. Where can I go to learn more about Hadoop?
Q. How can I optimize my Hadoop installation for use with Datameer?
Q. How can I choose the Job queue/pool to which Datameer submits jobs:
- First determine the appropriate Java system property which selects job queues on your Hadoop cluster. This is based on your chosen Hadoop scheduler. If you are using Fair Scheduler, this is the Hadoop property
mapred.fairscheduler.poolnameproperty, configured in
conf/mapred-site.xmlof your Hadoop installation.
- Set this property in the Datameer UI under Administration -> Hadoop Cluster -> Custom Property and set the value to the name of the pool which Datameer should use.
Q. How do I configure Datameer/Hadoop to use native compression?
A: When working with large data volumes, native compression can drastically improve the performance of a Hadoop cluster. There are multiple options for compression algorithms. Each have their benefits, e.g. GZIP is better in terms of disk space, LZO in terms of speed.
- First determine the best compression algorithm for your environment (see the "compression" topic, under Hadoop Cluster Configuration Tips)
- Install native compression libraries (platform-dependent) on BOTH:
- The Hadoop cluster: (normally <HADOOP_HOME>/lib/native/Linux-[i386-32 | amd64-64])
- The Datameer machine (<Datameer install dir>/lib/native/Linux-[i386-32 | amd64-64])
Configure the codec-use as custom properties in the Hadoop Cluster config section in Datameer, for example GIP would be configured as follows:
Q. How do I configure Datameer/Hadoop to use LZO native compression?
A: LZO provides a great ratio of CPU/compression, and is the algorithm of choice for applications with high data throughput. However, it requires an additional download and configuration steps, as described below.
- Follow steps 1 and 2 described in How do I configure Datameer/Hadoop to use native compression?
- Copy the LZO Java library (see Using LZO compression) into <Datameer_install_folder>/etc/custom-jars. This library will allow Datameer to access the native libraries. This will be done both by Datameer and by your Hadoop cluster at various times. Datameer will include this library in the Hadoop job-jar, but it is the administrator's responsibility to ensure all native libraries exist on the Hadoop cluster. Otherwise, jobs submitted by Datameer will fail.
Configure the codec-availability as custom properties in the Hadoop Cluster config section in Datameer:
Q. How do I configure Datameer/Hadoop to use Snappy native compression?
A: Snappy compression codec provides high speed compression with reasonable compression ratio. See original documentation at http://code.google.com/p/snappy/ for more details.
- For now CDH3u1 and newer versions are containing Snappy compression codec already. Following link https://ccp.cloudera.com/display/CDHDOC/Snappy+Installation contains the configuration instructions. In addition, Snappy will be integrated into Apache Hadoop versions 1.0.2 and 0.23 (https://issues.apache.org/jira/browse/HADOOP-7206)
Using Clouderas distribution of Hadoop it is required to enable the codec inside Datameer application either in Hadoop Cluster settings or on per job basis. Please add the following settings therefor:
Q. How can I use a custom Hive SerDe?
A: The classes for your Hive SerDe must be in the classpath of the Hive plug-in used to connect Datameer to Hive. Datameer provides Hive plugins for each major version of Hive (e.g. 0.5 and 0.7).
To add your custom SerDe to Datameer:
Determine the version of Hive you're using (e.g. 0.7)
unzip <Datameer Install folder>/plugins/plugin-hive-<Hive Version>-<Datameer Version>.zip
Add the JAR file of your SerDe to: /lib/compile and rezip.
Remove the corresponding .md5 file, if it exists (e.g. plugin-hive-0_7-1.3.7.zip.md5)
Q. Plugin registry fails to resolve dependency plugin-das-extension-points?
A: If you observe a message similar to
WARN [2011-07-13 17:58:11] (PluginRegistryImpl.java:374) - Missing dependency plugin-das-extension-points for plugin <XYZ>
please note, that plugin extension point needs to be changed in all Datameer plugins when moving to version 1.3 (or higher).
For a temporary solution, copy the old plugin file <Datameer old version>/plugins/plugin-das-extension-points-1.2.x.zip 1.2 to the new installation folder and restart the Datameer application. This allows the custom extensions to be loaded.
However, you should consider maintaining your plugin code. For versions 1.3 onwards change your plugin.xml to (removes requirements for plugin-das-extension-points):