Install Guide for Datameer Enterprise on AWS

Install a Datameer Enterprise clustered instance from Amazon Marketplace. 

This guide steps through the requirements and process for installing Datameer Enterprise from the Amazon Marketplace.  

Overview

Datameer is a cloud-native analytic lifecycle platform that enables the transformation of raw data into analytics-ready datasets—without hours of engineering time. It leverages the power of AWS EMR compute clusters to handle the scale, volume, and complexity of big data and heavy-duty enterprise workloads. Datameer's intuitive spreadsheet-like interface enables analysis of the raw data on-the-fly, while built-in connectors allow for easy integration of data from a variety of data sources - including S3 and Redshift. 

Datameer Enterprise, available in the AWS Marketplace, utilizes the AWS CloudFormation Template (CFT) for launching a Datameer instance alongside relevant resources - VPC, Availability Zones, Subnets, ELB, Security Groups, Datameer EC2 instances, EMR, S3 bucket. 


Review the product information and pricing by visiting Datameer Enterprise on AWS Marketplace.

Learn more about Datameer for Amazon Web Services.


Follow the comprehensive Step-by-Step Guide to deploy Datameer in your AWS account within a few minutes. 

If you experience problems with launching via CFT, please contact Datameer Support at support@datameer.com.

Prerequisites and Recommendations

Before you begin, review the End User License Agreement (EULA)

New to AWS? See Getting Started with AWS

AWS Requirements

  • A valid AWS account is required for deploying Datameer Enterprise with abilities to create VPC, Security Groups, S3, EC2 and EMR assets.
  • Familiarity with the stack creation process using CloudFormation Templates.
  • A basic understanding of the purpose and function of Amazon Elastic MapReduce (EMR).

Deployment and Integration Architecture



Planning Guidance

Security 

By default CloudFormation uses permissions based on your user credentials to create, modify, or delete resources in the stack. Optionally, when configuring stack parameters, you can choose an IAM role to explicitly define how CloudFormation can manage stack resources. For more information see Controlling Access with AWS Identity and Access Management. 

Datameer Enterprise stack provides an option to enable secure access to the Datameer instance via HTTPS (port 443). You will be required to provide security certificate that can be created via AWS Certificate Manager

During the stack creation process you must specify a key pair name - name of an existing EC2 key pair to authorize SSH access to the instances. If you don't already have the key pair, refer to Amazon EC2 Key Pair documentation on how to create new key pair.  

Once the stack is created and the Datameer instance is launched, it can be accessed via the web browser using default credentials. See Accessing Datameer Enterprise. By default the instance is not reachable via public Internet. This can be changed during the stack creation by enabling Auto-assign Public IP option.

Within the Datameer application, user authentication and authorization can be handled via the Admin interface. Datameer Enterprise offers both Internal User Management and Remote Authentication (Active Directory and LDAP) options. 

Sizing 

Datameer Enterprise can be deployed on AWS using different types of EC2 instances. During the stack creation process you can select which type you would like to use for the Datameer instance as well as for the creation of the EMR cluster. The following instance type families are supported for deployments: 

M4 Series - provides a balance of compute, memory, and network resources and is suitable for most data processing tasks and cluster computing.

C3/C4 Series - optimized for compute-intensive workloads and delivers a very cost-effective high performance at a low price per compute ratio. Recommended for batch processing, distributed analytics, high performance science and engineering applications. 

R3 Series - memory optimized, recommended for real time big data analytics. 

For a complete list of EC2 instance types, visit Amazon EC2 Instance Types


Various instance types offer different combinations of the number of vCPU per instance, memory, and network performance - this affects job execution/data processing in Datameer Enterprise. For example, an EMR cluster of 6 instances using the recommended m4.4xlarge type (16 vCPU, 64GiB per instance) processes a medium workload of 1.5GB (100MM records) in approximately 10 minutes. Processing of the same workload can be expedited to ~6 minutes by doubling the infrastructure to 12 m4.4xlarge instances. 

Note: An EMR cluster is created with a single "master" instance and x number of "worker" instances. When configuring your Datameer stack, the "Number of EC2 instances in the EMR cluster" defines the number of "worker" instances. For example, leaving this setting with the default value of 2 will trigger creation of the EMR cluster with total of 3 instances of the same type (1 "master" and 2 "worker"). 

Costs 

Datameer Enterprise is offered with the 14-day free trial. During the trial period you will only be responsible for the AWS services and infrastructure costs. 

After the trial period ends, Datameer Enterprise is available at a fixed hourly price irrespective of the instance type or deployment region. The total cost of running Datameer Enterprise on AWS is calculated as following: 

Total Deployment Cost = Datameer Enterprise Software Cost + AWS Services & Infrastructure Cost 


The Datameer CloudFormation Template exposes stack configuration parameters (e.g., instance type, number of EC2 instances in the EMR cluster) that you can customize. Some of these settings, such as instance type and the number of instances in the EMR cluster, will affect the total cost of the deployment. For AWS infrastructure cost estimates see AWS Pricing Calculator.

Installation Steps

Launching Datameer Enterprise from AWS 

  1. Locate and select Datameer Enterprise from the AWS Marketplace.
  2. From the Datameer Enterprise product page, select Continue to Subscribe.



  3. After accepting the EULA, select Continue to Configuration.
  4. Select Datameer Enterprise Clustered Edition fulfillment option, the desired product version, and the region in which to launch the product. 
  5. Select Continue to Launch.

    It is recommended for customers to select the most recent version of the product and the region in which they reside.



  6. On the Launch this Software page, review your configuration, select Launch CloudFormation action, and click Launch.


Creating Stack from CloudFormation Template

  1. Stack creation starts with specifying the source of the CloudFormation Template (CFT). The Amazon S3 URL field is pre-populated with a URL to the corresponding CFT.



  2. Click Next to navigate to specify stack details.
  3. Enter the Stack Name.



  4. Select a Datameer Instance Type. See the list of recommended types (see sizing recommendations above).



  5. Configure admin password for Amazon Relational Database Service (RDS) . Password must contain 8 to 41 any alpha-numeric characters, except "/", """, or "@".



  6. Setting EMR cluster parameters:
    1. Enter an Instance Count - a number in the range from 1 - 20 of EC2 instances for the "worker" nodes in the EMR cluster. A single "master" node is created by default in addition to "worker" nodes (e.g., if you entered 3 for the Instance Count setting, the total number of instances created will be 4. There will be 1 "master" node + 3 "worker" nodes.)
    2. Select the Instance Type - the EMR cluster EC2 instance type. See the list of recommended type sizing recommendations above.
    3. Enter a S3 Bucket Name - the name of the S3 bucket used for EMR related storage. It defaults to aws-datameer-bucket. Datameer CFT automatically appends unique identifier to the entered bucket name to ensure its uniqueness across all regions.



  7. Specify Networking and Security parameters:
    1. Provide the Amazon Resource Name (ARN) of the security certificate to enable HTTPS access to Datameer Enterprise instance (optional). Security certificate can be created via AWS Certificate Manager. If you do not provide the certificate, the Datameer Enterprise instance will be accessible via HTTP on port 80. 



    2. Enter the VPC IPv4 Addresses Range - the IPv4 addresses range in the form of Classless Inter-Domain Routing (CIDR) block. See VPCs and Subnets AWS documentation for guidance. 
    3. Specify the the public subnet CIDR block for 2 Public Subnets. The CIDR block must be within the VPC CIDR range.
    4. Select Availability Zones (AZ) for each of the 2 subnets. The AZ must be within the current region.

      Availability Zones, subnets must be different from each other. There can only be one subnet per AZ.





    5. Configure Keypair Name - the name of the existing EC2 Keypair to authorize SSH access to the instances. If you don't already have a key pair, refer to Amazon EC2 Key Pair documentation on how to create new key pair. 
    6. Specify SSH Addresses Range in the form of CIDR block. This enables SSH access to the Datameer EC2 instance. 
    7. Enable Auto-assign Public IP, if you want Datameer instance to be accessible over the Internet.



    8. Click Next.
  8. Configure advanced Stack Options (optional). We recommend taking default values for all parameters. 
  9. Click Next.
  10. Review the stack configuration, click Create Stack.
  11. The stack configuration is now complete. Stack creation process is initiated and the new stack appears in the list of stacks in the CloudFormation console with CREATE_IN_PROGRESS status.



  12. Once all stack components are created, the status will change to CREATE_COMPLETE.



  13. Now that the Datameer instance is up and running, you can access it via a web browser. 

Accessing Datameer Enterprise

  1. Navigate to the CloudFormation Console and locate your Datameer stack. 

  2. Outputs panel of the Stack details provides parameters needed for accessing your Datameer instance.


  3. Access Datameer Enterprise using a web browser.
    • Enter the Datameer URL in the browser address bar (the URL is provided in the Stack Outputs panel).
  4. On the Datameer login screen, enter the following credentials:
    1. Username: admin
    2. Password: <cluster ID - obtained from the Stack Outputs panel>



  5. Navigate to View > Admin Tab in the application menu to configure the platform (including user management).



  6. To change the default Admin password, navigate to the Users module under the Admin tab. For more details see Managing User Accounts.