Additionally, a property to turn off sampling has been added:
This property disables Smart Sampling and is based on a random sample.
Also, you will see that you are able to allocate more memory to this job with the following property:
This property allows the workbook to use more memory than other jobs on the machine. The numerical value entered is in bytes.
Select the compression
In the same Custom Hadoop Properties field, you can also select your compression type for the workbook. Here, you can choose the compression that will best optimize your workbook.
What is the best compression, you ask? Well, that depends on your workbook! For example, if your workbook takes a toll on your CPU, you may want to choose Snappy compression because it focuses on speed, not maximum compression.
Once you select your compression type, you will add this configuration to the field.
In this case, we have added the default just to give you an idea of what this looks like:
mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec (Defines the compression codec of the output of Map)
mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec (Defines the compression codec for the final output of a Map-Reduce job)
You can find the different compression configurations in our Frequenly Asked Questions.
Now you are finished with your job optimization.
“Got a question? Have an answer? Join the Datameer Community!”