Migrating from One Hadoop Cluster to Another

When migrating from one cluster to another, (e.g., from a QA to a production cluster) you must update all the paths in the Datameer database (DB) which point to the new cluster.

Follow these steps to move your data:

  1. Stop Datameer.
  2. Move all your data from the old cluster to the new cluster. The best way to do this is using the Hadoop distcp tool.

    hadoop distcp hdfs://old.cluster:9000/old-root-path hdfs://new.cluster:9000/new-root-path
  3. Make a backup of the Datameer DB.

    mysqldump [-h <dbhost>] -u dap dap -p > das-backup.dmp
  4. Check if the backup was created successfully:

    head -n 50 das-backup.dmp
  5. Update the paths to the new location in the Datameer DB:

    bin/update_paths.sh hdfs://old.cluster:9000/old-root-path hdfs://new.cluster:9000/new-root-path
    

    If the Datameer DB differs from the settings in conf/default.properties, you can pass the corresponding parameters to the update tool:

    bin/update_paths.sh -h <dbhost> -o <dbport> -n <dbname> -u <dbuser> -p <dbpassword> hdfs://old.cluster:9000/old-root-path hdfs://new.cluster:9000/new-root-path

After the above steps you need to update the cluster settings in Datameer:

  1. Restart Datameer.
  2. Update the Datameer cluster settings by clicking the Admin tab at the top of the page, then click Hadoop Cluster in the column on the left.
  3. Click Edit, then update the settings as needed.
  4. Click Save.

If the user running this utility doesn't have permissions to write to /tmp, then set the TMPJ environment variable to designate another path.