Here are some terms that have a specific meaning to Datameer.
One of the users registered in Datameer with unrestricted access who is responsible for managing the system, i.e. by configuring the system, monitoring the system, adding more users and assigning users both roles and groups .
Aggregate functions combine and then operate on all the values in a group, i.e. the function returns one value for each group.
AMI (Amazon Machine Image)
One of the users in Datameer with restricted access who can configure data sources, analyze data and create dashboards and reports.
Source code based specifications used to interact with or to add functionality to a program - examples in Datameer include custom functions, parsing scheme for import and export jobs, or custom plug-ins.
The system used to authenticate users in Datameer. Besides a default internal user management, Datameer ships with plug-ins to use LDAP/Active Directory. It is also possible to create custom plug-ins for authentication purposes.
One of the primitive data types used in Datameer. These are also known as high-precision float values.
One of the primitive data types used in Datameer. These are also known as unlimited integer values.
One of the primitive data types used in Datameer. Based on Boolean algebra, these are either TRUE or FALSE.
Static values, e.g. a fixed number or string, used as function arguments, not to be confused with placeholders
A dashboard is a visualization tool that consolidates, aggregates and arranges measurements, metrics (measurements compared to a goal) in the form of charts, graphs, reports and sometimes scorecards on a single screen.
A data link lets you feed data into a workbook without using an import job. Data links are not imported into HDFS, but are streamed into Datameer on demand.
Where the data is stored such as a database, a file such as an S3 Amazon Web Services data store, or a Hive.
A collection of data which is either in a tabular of non-tabular form. Data can be structured, semi-structured, or unstructured. In Datameer, data sets are the source of data, e.g. databases, server error logs, or Twitter feeds.
One of the data types used in Datameer. These are dates in a form recognized by Datameer, rather than recognized as strings.
EC2 (Amazon Elastic Compute Cloud)
Amazon Elastic Compute Cloud is a scalable web service offered by Amazon Web Services for computing data remotely.
EMR (Amazon Elastic MapReduce)
This is a job which exports the results of a workbook to an external resource, e.g. a file or a database, that can be used independently of Datameer. Adaptors for several remote systems are included out of the box, and others can be added with plug-ins.
A complete formula, including defined functions and required arguments. An expression can contain multiple (nested) formulas.
Field parameters including data field type, name, acceptance of null values for a given data set
|fixed-width||A font whose letters and characters each occupy the same amount of horizontal space.|
One of the primitive data types used in Datameer. These are 64-bit float values (also called doubles).
A formula is created by a data analyst and is similar to macros in other programs. It consists of a function and its required arguments.
Group series functions operate row-wise within a group, i.e. the function is applied to every row and therefore returns a value for every argument in the group.
This is the primary storage system used by Hadoop applications. It is used either in a cluster or as a stand-alone distributed file system.
One of the primitive data types used in Datameer. These are 64-bit integer values (also called longs).
|Jaccard Distance||Measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union|
JDBC (Java Database Connectivity)
This is a Java-specific API defining how a database may be accessed.
JDK (Java Development Kit)
This is a collection of programming tools which can be used to design products with the Java programming language.
A format for transmitting data from a server to a web application through a network using a pre-defined schema, while at the same time being easy to read.
This is a general word referring to the configuration and executions needed to complete analyses in Datameer, e.g. import jobs, export jobs or workbook jobs. In Datameer every job configuration is numbered consecutively and independently of job executions. Datameer job executions usually correspond to one or more MapReduce jobs.
The settings necessary to execute a job in Datameer. Job configurations include e.g. file path, character encoding and schedule details for an import or export job and sheet names, formulas and data stores for a workbook. Every job configuration is numbered consecutively with a unique identifying number, independently of the corresponding job executions.
These are the individual operations performed in Datameer according to a job configuration. Every job execution is numbered consecutively with a unique identifying number, independently of the corresponding job configurations.
The strategy used when combining two data sets, based on a given key.
|Kerberos||An authentication protocol that provides mutual authentication and single sign-on capabilities.|
MapReduce is a framework for processing data over a distributed file system. A 'map' step first splits the task into sub-tasks, and the 'reduce' step combines the results of the 'map' tasks into one result.
A category of database software providing an interface which users can use to quickly and interactively examine their data and results of processes in various dimensions.
These are special symbols which are used similarly to functions.
As Datameer is an analytics tool with a web interface, pages are information resources that can be seen using a web browser. In Datameer all components are embedded in pages, e.g. a workbook, data link configuration, or administrator controls.
Partitioning segments of similar data into individually stored, often hierarchical parts. Typically, these represent periods of time (e.g. months, days or hours). The division of data is typically done for ease of management and performance reasons.
A placeholder is symbol that is replaced by a dynamically changing value, e.g. %day% for the current day or %user% for the current user. Placeholders are also known as wildcards or free variables
An SDK shipped with Datameer to create custom plug-ins.
The total number of significant digits which can be included in a big decimal number.
REST-style architecture consists of clients and servers where clients initiate requests to servers, and servers process those requests and return appropriate responses.
This is a scalable web storage service offered by Amazon Web Services, used to store data remotely.
The number of significant digits behind the decimal point in a big decimal number.
SDK (Software Development Kit)
A collection of development tools for creating applications for a software package.
A broad topic best described as information security, including the use of Datameer-specific credentials or LDAP/Active Directory when connecting to Datameer or using secure impersonation when connecting Datameer to a database. Another tool used for implementing security is setting permissions for individual pages.
A form of structured data, but it does not conform with the formal tables or data models of relational databases.
A page or tab in a workbook. In datameer there are different types of sheets, e.g data sheet, formula sheet, join sheet, union sheet.
A snowflake schema is a set of tables comprised of a single, central fact table surrounded by normalized dimensional hierarchies.
A sparkline is a small-embedded bar/line graph that illustrates a single trend. Sparklines commonly display trends over time, but they can be used to show any trend.
One of the primitive data types used in Datameer. All data that is not a Boolean value, a big decimal, a big integer, a date, a float value or an integer is considered a string. Strings can contain any type of (unix) character and are used to represent, text, URLs, and date patterns.
A star schema is a set of tables comprised of a single, central fact table surrounded by de-normalized dimensions.
A graph displaying hierarchical (tree-structured) data as a set of nested rectangles
Any document, file, image, report, form, etc. that has no defined, standard structure that would enable convenient storage in automated processing devices.
The group that a user is assigned to, e.g. sales department or research and development.
The role a user is assigned to, e.g. administrator or analyst.
A dashboard tool to present data. Examples, include graphs, pie charts, and maps.
The spreadsheet-like view used for analyses of data.