When importing data into Datameer, it is important to know which type of data you are importing or need to import. When using your data within workbooks, certain Datameer functions only work with certain types of data. Congruently, certain functions return only a specific type of data.

Also there are data requirements when using infographic widgets to visualize your data.

## Data Field Types

Field type | Product icon | Description | Internal representation |
---|---|---|---|

64-Bit integer value | Java Long | ||

Unlimited integer value | Java BigInteger | ||

64-Bit float value | Java Double | ||

High-precision float value | Java BigDecimal | ||

Date object | Java Date | ||

String object | Java String | ||

Boolean object | Java Boolean | ||

list | a collection of multiple values of one data type | ||

float, big decimal, integer, or big integer | |||

float, big decimal, integer, big integer, date, string, list, or Boolean |

### Integer

In mathematics integers (aka whole numbers) are made up of the set of natural numbers including zero (0,1,2,3, ...) along with the negatives of natural numbers (-1,-2,-3, ...). When talking about Integers in computer programming, it is necessary to define a minimum and maximum value. Datameer uses a 64-bit integer which allows the user to represent whole numbers between -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

### Big integer

Big integers are like integers, but they are not limited to 64 bits. They are represented using arbitray-precision arithmetic. Big integers represent only whole numbers. Big integers in Datameer are treated differently than in Hive because Datameer allows a larger range of values, so they are written as strings into a Hive table if you export. By default, the precision for big integers is set at 32 upon import. This can updated if needed in the default properties by changing the value of *das.big-decimal.precision=. *

### Float

In mathematics, there are real numbers that represent fractions (1/2, 12/68) or numbers with decimal places (12.75, -18.35). Datameer uses double precision floating-point representation (aka float) to manipulate and represent real numbers. The complete range of numbers that can be represented this way is approximately 2 ^{-1022} through (1+(1-2 ^{-52}))x2 ^{1023}. During import/upload, Datameer automatically recognizes a number with either a single period (.) or single comma (,) as a decimal separator and defines this data as a float data type. After ingestion, Datameer stores float and big decimal values using a period (.) character. The auto schema detection for the float data type works with CSV, JSON, XML, Key/value files.

### Big decimal

Big decimals are similar to float values. The main advantage of this data field type is that they are exact to the number of decimal places for which they are configured, float values might be inaccurate in certain cases. If a number has more decimal places than big decimal was configured for, then the number is rounded. The number of decimal places can be configured in `conf/default.properties`

:

# Maximum precision used for BIG_DECIMAL types. Precision is equal to the maximum number of digits a BigDecimal # can have. system.property.das.big-decimal.precision=32

32 digits is the default precision used by Datameer for big decimal values upon import.

### Date

In Datameer, data in the DATE primitive data type is always represented in a Gregorian, month-day-year (MDY) format (e.g., "Sep 16, 2010 02:56:39 PM"). Datameer detects if your data should be parsed into the DATE data type during ingest. This can also be done after ingest as other data types can be converted to the DATE primitive data type using workbook functions.

### String

When using information other than numbers or dates in Datameer, it is represented as a string. This includes text, unparsed date patterns, URLs, JSON arrays, etc.

### Boolean

Boolean data in computing has two values, either true or false. It is used in many logical expressions and is derived from Boolean algebra created by George Boole in the 19th century.

### List

In Datameer multiple values can be combined into a list. Lists are a series of values of a single data type, which starts counting from zero (0).

### Number

In Datameer integers, big integers, floats, and big decimals are considered to be numbers.

### Any

Some visualizations and functions are able to use data represented by any data field type. These can be either a number, a string, a date, or a Boolean.

## Exporting to Hive

When exporting data to hive, data types are mapped in the following way:

In Datameer | In Hive (Datameer classic) | In Hive (Hive specific) | ||||
---|---|---|---|---|---|---|

Field type | Description | Internal representation | Field type | Description | Field type | Description |

64-Bit integer value | Java Long | BIGINT | 64-Bit signed integer value | BIGINT | 64-Bit signed integer value | |

Unlimited integer value | Java BigInteger | STRING | STRING | |||

64-Bit float value | Java Double | DOUBLE | 64-Bit double precision floating point number | DOUBLE | 64-Bit double precision floating point number | |

High-precision float value | Java BigDecimal | STRING | DECIMAL | |||

Date object | Java Date | STRING | TIMESTAMP | |||

String object | Java String | STRING | STRING | |||

Boolean object | Java Boolean | BOOLEAN | BOOLEAN | |||

list | a collection of multiple values of one data type | ARRAY<data_type> | ARRAY<data_type> |

Select Datameer classic or Hive specific mapping options from the Hive Plug-in Configurations.

## Parquet Storage Formats

When exporting data to Parquet, data types are mapped in the following way:

Datameer field type | Parquet field type |
---|---|

integer | INT64 |

date | BINARY |

big integer | BINARY |

float | DOUBLE |

big decimal | BINARY |

string | BINARY UTF8 |

Boolean | BOOLEAN |

list (integer) | INT64 |

list (float) | DOUBLE |

list (string) | BINARY |

list (Boolean) | BOOLEAN |

Available as of Datameer version 6.3

Parquet files using the INT96 format are interpreted as time stamps. Datameer accepts those columns, but cuts off the nanoseconds. If the workbook has Ignore Errors enabled, then those error messages are stored in a separate column and the column with the error is NULL. Refer to this chart for more Parquet storage mapping details.

Datameer Type | Parquet Type | Description |
---|---|---|

date | TIMESTAMP_MILLIS | Stored as INT64 |

integer | INT_64 | |

float | DOUBLE | |

string | UTF8 | BINARY format with UTF-8 encoding |

big decimal | DECIMAL | BINARY format |

big integer | DECIMAL | BINARY format with precision of 1 and scale of 0 |

list | Repeated elements of group of the list element type. | Optional group of a repeating group of optional element types. Nested lists are supported. |