Glossary

Here are some terms that have a specific meaning to Datameer.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A

Definition

One of the users registered in Datameer with unrestricted access who is responsible for managing the system. I.e., By configuring the system, monitoring the system, adding more users and assigning users both roles and groups.

Aggregate functions combine and then operate on all the values in a group. I.e., The function returns one value for each group.

AMI (Amazon Machine Image)

A virtual machine that is used in Amazons EC2 or Amazon S3.

One of the users in Datameer with restricted access who can configure data sources, analyze data, and create infographics and reports.

Source code based specifications used to interact with or to add functionality to a program. E.g., In Datameer include custom functions, parsing scheme for import and export jobs, or custom plug-ins.

argument

An argument is a constant, a placeholder, or a data field used as input in a function.

The system used to authenticate users in Datameer. Besides a default internal user management, Datameer ships with plug-ins to use LDAP/Active Directory. It is also possible to create custom plug-ins for authentication purposes.

B

Definition

One of the primitive data types used in Datameer. These are also known as high-precision float values.

One of the primitive data types used in Datameer. These are also known as unlimited integer values.

blank (blank cell)

A blank cell can contain either an empty string value, a string with only white spaces, or a null value.

One of the primitive data types used in Datameer. Based on Boolean algebra, these are either TRUE or FALSE.

C

Definition

configuration ID

A unique ID for each job that does not update if that job is run again. Once a job has been given a configuration ID it always hold that number.

connections
Where the data is stored such as a database, a file such as an S3 Amazon Web Services connection, or a Hive.

constant

Static values, e.g., a fixed number or string, used as function arguments, not to be confused with placeholders.

D

Definition

data ID

A new ID is created each time a job runs which produces new data.

A data link lets you feed data into a workbook without using an import job. Data links are not imported into HDFS, but are streamed into Datameer on demand.

data set

A collection of data which is either in a tabular of non-tabular form. Data can be structured, semi-structured, or unstructured. In Datameer, data sets are the source of data, e.g. databases, server error logs, or Twitter feeds.

One of the data types used in Datameer. These are dates in a form recognized by Datameer, rather than recognized as strings.

E

Definition

EC2 (Amazon Elastic Compute Cloud)

Amazon Elastic Compute Cloud is a scalable web service offered by Amazon Web Services for computing data remotely.

EMR (Amazon Elastic MapReduce)

Amazon Elastic MapReduce is a hosted Hadoop framework running either on Amazon EC2 or Amazon S3

empty (empty string)

An empty string is a /wiki/spaces/DAS60/pages/4620163738 with the length of zero. A cell with an empty string appears blank.

This is a job which exports the results of a workbook to an external resource, e.g., a file or a database, that can be used independently of Datameer. Adaptors for several remote systems are included out of the box, and others can be added with plug-ins.

A complete formula including defined functions and required arguments. An expression can contain multiple (nested) formulas.

F

Definition

Field parameters including data field type, name, and acceptance of null values for a given data set.

fixed-width
A font whose letters and characters each occupy the same amount of horizontal space.

One of the primitive data types used in Datameer. These are 64-bit float values (also called doubles).

A formula is created by a data analyst and is similar to macros in other programs. It consists of a function and its required arguments.

The graphical user interface to create expressions and formulas by selecting functions.

G

Definition

Group series functions operate row-wise within a group. I.e., The function is applied to every row and therefore returns a value for every argument in the group.

Google Cloud Storage

Google Cloud Storage is a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure.

H

Definition

This is the primary storage system used by Hadoop applications. It is used either in a cluster or as a stand-alone distributed file system.

I

Definition

Infographics is a visualization tool that consolidates, aggregates, and arranges measurements and metrics (measurements compared to a goal) in the form of charts, graphs, reports, and sometimes scorecards on a single screen.

One of the primitive data types used in Datameer. These are 64-bit integer values (also called longs).

Imports data sets into Datameer. Many adapters for various connections are available straight out of the box.

J

Definition

Jaccard Distance
Measures dissimilarity between sample sets. Complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union.

JDBC (Java Database Connectivity)

This is a Java-specific API defining how a database may be accessed.

JDK (Java Development Kit)

This is a collection of programming tools which can be used to design products with the Java programming language.

job ID

A new ID is created each time a job runs whether it produces new data or not.

JSON (Javascript Object Notation)

A format for transmitting data from a server to a web application through a network using a pre-defined schema, while at the same time being easy to read.

JSON Map
A data structure that uses a hash function to map identified keys to corresponding values. (See below JSON Object)
JSON Object
An unordered collection of key:value pairs with the ':' character separating the key and the value, comma-separated and enclosed in curly braces; the keys must be strings and should be distinct from each other.

job (Datameer)

This is a general word referring to the configuration and executions needed to complete analyses in Datameer, e.g., import jobs, export jobs or workbook jobs. In Datameer every job configuration is numbered consecutively and independently of job executions. Datameer job executions usually correspond to one or more MapReduce jobs.

job configuration

The settings necessary to execute a job in Datameer. Job configurations include e.g., file path, character encoding and schedule details for an import or export job and sheet names, formulas and connections for a workbook. Every job configuration is numbered consecutively with a unique identifying number, independently of the corresponding job executions.

job execution

These are the individual operations performed in Datameer according to a job configuration. Every job execution is numbered consecutively with a unique identifying number, independently of the corresponding job configurations.

The strategy used when combining two data sets, based on a given key.

K

Definition
An authentication protocol that provides mutual authentication and single sign-on capabilities.
L
Definition
In Datameer multiple values can be combined into a list. Lists are a series of values of a single data type.

M

Definition

MapReduce

MapReduce is a framework for processing data over a distributed file system. A 'map' step first splits the task into sub-tasks, and the 'reduce' step combines the results of the 'map' tasks into one result.

My Datameer is a web portal to login and manage your Datameer account. Here you can renew a subscription, manage data limits, download updates, submit feature requests, submit support tickets, and more.
N
Definition
null values (<null>)
Null values (sometimes represented as ω) show that there is not any information attached to a specific record, or that specified information is not found within a specified connection. A cell with a null value appears blank.

O

Definition

A category of database software providing an interface which users can use to quickly and interactively examine their data and results of processes in various dimensions.

These are special symbols which are used similarly to functions.

P

Definition

page

As Datameer is an analytics tool with a web interface, pages are information resources that can be seen using a web browser. In Datameer all components are embedded in pages, e.g., a workbook, data link configuration, or administrator controls.

Partitioning segments of similar data into individually stored, often hierarchical parts. Typically, these represent periods of time, e.g., months, days or hours. The division of data is typically done for ease of management and performance reasons.

These describe if a user is allowed to read, edit or execute a given page or content, e.g., a data set, infographics, a data link, or a workbook.

A placeholder is symbol that is replaced by a dynamically changing value, e.g., %day% for the current day or %user% for the current user. Placeholders are also known as wildcards or free variables

plug-in

Extensions to Datameer functionality, e.g., custom import/export adapters, custom functions, or custom infographic widgets.

An SDK shipped with Datameer to create custom plug-ins.

The total number of significant digits which can be included in a big decimal number.

R

Definition

record

A data entity corresponding to a row in a table of a specified data set, containing multiple data fields represented as one of the pre-defined data field types available in Datameer.

A sequence of characters that can be used to specify and recognize desired strings in a flexible and concise way.

REST-style architecture consists of clients and servers where clients initiate requests to servers, and servers process those requests and return appropriate responses.

S

Definition

This is a scalable web storage service offered by Amazon Web Services used to store data remotely.

The number of significant digits behind the decimal point in a big decimal number.

SDK (Software Development Kit)

A collection of development tools for creating applications for a software package.

security

A broad topic best described as information security, including the use of Datameer-specific credentials or LDAP/Active Directory when connecting to Datameer or using secure impersonation when connecting Datameer to a database. Another tool used for implementing security is setting permissions for individual pages.

semi-structured data

A form of structured data that doesn't conform with the formal tables or data models of relational databases.

sheet

A page or tab in a workbook. In datameer there are different types of sheets, e.g data sheet, formula sheet, join sheet, union sheet.

snowflake schema

A set of tables comprised of a single central fact table surrounded by normalized dimensional hierarchies.

One of the primitive data types used in Datameer. All data that is not a Boolean value, a big decimal, a big integer, a date, a float value or an integer is considered a string. Strings can contain any type of (unix) character and are used to represent, text, URLs, and date patterns.

star schema

A star schema is a set of tables comprised of a single, central fact table surrounded by de-normalized dimensions.

U

Definition

unstructured data

Any document, file, image, report, form, etc. that has no defined, standard structure that would enable convenient storage in automated processing devices.

The group that a user is assigned to, e.g., sales department or research and development.

The role a user is assigned to, e.g., administrator or analyst.

W

Definition

An infographic tool to present data. Examples include graphs, pie charts, and maps.

The spreadsheet-like view used for analyses of data.