Guide for Analysts

Welcome to Datameer

Datameer builds on the power and scalability of Apache Hadoop to deliver an easy-to-use and cost-effective solution for big data analytics. Datameer integrates rapidly with existing and new datasources to deliver sophisticated analytics. Datameer tools simplify extraction, transformation and loading, and real-time data retrieval. For business users, Datameer offers a familiar spreadsheet interface and intuitive data visualization tools.

If you are an IT professional or a system administrator, see the Setup Guide and the Administration Guide. If you are getting started with Hadoop, see Hadoop and Datameer.

Overview

Before we get started, let's look at a few key terms in Datameer:

Term

Definition

Job

The complete set of data including the connections, associated analytics (workbooks), and visualization tools (infographics). The job also includes the schedule of when the data gets updated, and whether this happens automatically or manually.

Connections

The repository of structured, semi-structured, and unstructured data from one or more sources used to create analytics.

Workbook

Where you view a sample of your data and create analytics, using the built-in functions, sorting, filtering, and other tools to discover relationships in your data set.

Widgets

The reporting tools you use to easily create tables, charts, graphs, and other visual ways of looking at your data. Widgets let you quickly and visually manipulate your data.

Infographics

Where you can see at a glance the tables, charts, and graphs you create for visualizing your data.

To learn more, see the Glossary.

What types of data are supported?

You can use Datameer with any type of data such as log files, call details records, sales or transactional data, clickstream data, web site metrics, social networking data and more. You can combine multiple datasources and data types together to collect the raw data you need for analysis. You can import data or use data imported by a system analyst.

Data formats supported include:

  • Flat files such as Excel spreadsheets, comma-delimited text files (.CSV), FDFS (File Descriptor File System), Apache log files, and S3 (Amazon Simple Storage Service), and unstructured data such as Twitter data
  • Relational databases such as Oracle (10g), HSQL-DB, DB2, or MySQL(5.1)
  • Other types such as Hive (a data warehouse infrastructure built on Hadoop)

See Types of Data Supported for more information.

How are large amounts of data managed?

Raw data is stored and processed using Hadoop, which manages and distributes both the data and the computational load over multiple computers networked together. The Datameer tools allow you to easily analyze and visualize relationships in the data.


To learn more about how Datameer works, see the Concepts Guide.