With the Cloud platform software market expected to reach $90 billion by 2020, even the largest and most conservative organizations are moving some or all of their analytics to the Cloud. According to the Gartner 2017 Magic Quadrant Business Intelligence and Analytics Platforms report:
“Interest in cloud deployments will continue to grow…we expect this trend to continue, with the majority of new license buying (more than half) likely to be for cloud deployments by 2020.”
What Makes the Cloud Attractive?
Perhaps the first question you might ask is: Why should I even move to the cloud? A Q4 2016 TDWI Best Practices Report on BI, Analytics and the Cloud explored this question, asking organizations their top 3 reasons for doing analytics in the cloud.
The responses were very revealing and not unexpected, with companies citing scalability, flexibility and cost as the top three reasons.
Another important reason to move analytics to the cloud is to follow data gravity – the notion that processing and analysis of data is best done closest to where the data resides.
The same TDWI report and survey also showed that hybrid architectures were important to the majority of organizations. Therefore, supporting a hybrid architecture that spans on-premises and cloud resources will be critical to success.
But, as with anything new, there are always stumbling blocks. The TDWI survey also asked organizations about the barriers to cloud analytics adoption. Not surprisingly, the two most important concerns were security and data privacy – issues that have always haunted the adoption of cloud computing.
What Does It Mean to Be Cloud-native?
With an ever-growing array of options for data preparation, exploration and analytics being offered in the cloud, it is important to dig deep and understand how they are put together and whether they are cloud-native. Let’s explore the six ways big data platforms should be architected to be truly cloud-native.
1. Separating Compute from Storage
At its heart, a big data platform architected for the cloud requires one key underpinning – the separation of compute from storage. In fact, in the report, Gartner Predicts 2017: Data Distribution and Complexity Drive Information Infrastructure Modernization from November 2016, Gartner explicitly spells out this critical need, and:
“Recommends that data and analytics leaders should “design [their] information architecture for the separation of storage and compute to take advantage of rich compute options for processing the data in object storage.”
Hadoop is the most prominent “data operating system” for big data. And, while Hadoop supports a highly distributed clustering architecture, there has always been a tight integration between the compute and storage resources – YARN and HDFS.
Separating compute from storage for big data processing would require a platform to re-architect for the cloud by switching the distributed storage from HDFS to a cloud object store such as Amazon Simple Storage Service – S3.
Datameer for AWS is explicitly architected to be cloud-ready because it separates compute from storage using Amazon S3. This provides the critical foundation for supporting a hybrid architecture that is elastic, agile, and possesses the ability to secure data effectively.
2. Following Data Gravity
As referenced in our previous blog post, The Cloud Changes Big Data Analytics, and Big Data Analytics Needs to Change, many organizations have unwittingly created shadow data lakes –massive reservoirs of stored data on the cloud. And, unless they are a new emerging company, they will also have large troves of traditional data on-premises.
It is important that a cloud-native architecture support the ability to process data directly in the cloud, following data gravity. Why?:
- Moving large volumes of cloud-born data can be resource intensive from both a network and systems standpoint
- Processes to move the data would need to be created, operationalized and monitored, adding to the ever growing list of IT jobs
- Compute and storage resources in the cloud can be used on-demand, gaining flexibility and cost advantages
Why take the time, expense and resources to move data that’s already in the cloud on-premises? Why not prepare, explore and process this data in the cloud?
Datameer for AWS lets you follow data gravity and process your big data where it lies – in the cloud while dramatically shrinking the time it takes to process and analyze your cloud-born data, delivering business results faster.
3. Delivering Agility and Elasticity
Cloud computing offers easy access to powerful and instantly available compute and storage resources. This frees your already taxed IT resources and helps you create an agile data architecture that can be quickly adapted to your needs.
On Amazon Web Services, processing and storage resources of any scale can quickly be spun up using EC2 and S3 as needed without taxing the IT team. These services are easily optimized for various processing tasks – in this case, data preparation and exploration.
Datameer for AWS provides rapid deployment on EC2 instances and natively uses S3 so you can take advantage of easily available compute and storage resources. It also allows you to stop, start, expand and contract your Amazon EMR clusters without any reconfiguration for the most effective use of resources.
Another major advantage of cloud computing is elasticity – the ability to scale resources as needed and pay only for what you use. But, while many are intrigued by the notion of paying for resources by the minute or hour, the reality is that the vast majority or organizations have steady workloads. They really desire elasticity for two main reasons:
- Seasonal or short term expansion where the business needs extra processing and storage power for busy periods or,
- “Burst processing”, where organizations want to process and analyze extra datasets for a short period for new analytic experiments or initiatives
Datameer for AWS separates compute and storage, using persistent S3 services to manage the data being processed. This enables Datameer on AWS to provide complete elasticity, letting you scale your resources on-demand, based on the processing power and scale you need.
4. Hybrid Architecture
As indicated earlier, the TDWI survey showed that 72% of organizations had a strong desire to create and use hybrid architectures for their analytics and big data processing. This also hearkens back to notion of data gravity and the fact that the vast majority of organizations will have both cloud-borne and on-premises data.
Datameer enables you to create a seamless hybrid architecture that manages, prepares and analyzes individual datasets close to where they reside, then unifies the results into a broader 360-degree view to optimize big data processing and limit unnecessary movement of data. This unified hybrid architecture provides a comprehensive view of your curated data and jobs in a single location, ideal for applying governance policies and auditing how data and analytics are used to ensure regulatory compliance.
5. Integrated with Cloud Services
Perhaps the most essential part of being “cloud-native” is deep integration with the underlying cloud services offered by the cloud platform to create a seamless experience for users and IT teams alike.
To deliver this seamless experience, Datameer integrates with AWS critical underlying services:
- Rapid deployment on EC2 instances optimized for big data processing
- Elastic use of Amazon Elastic Map Reduce (EMR) compute services
- Use of Amazon S3 for underlying storage to separate compute and storage resources
- The ability to import/export data with S3 where many organizations land cloud-born data
- A connector to Amazon Redshift for import/export of data for big data preparation that feeds cloud data warehouses
- Integration with Amazon Authentication & Access Control for secure access and data management
- Control and billing integration with the AWS Management Console
In addition, in early 2018, Datameer for AWS will be available on the Amazon Marketplace for single-click deployment of Datameer in the cloud.
6. Remember – Enterprise-grade is Still Needed
While it is critical for a platform to be architected for the cloud to ensure a seamless experience and optimized execution, one also needs to recognize that having the right enterprise-grade features is also essential and, in some cases, even more important in the cloud.
As shown earlier in the TDWI survey, the most critical concern and barrier to deploying analytics in the cloud is security and data privacy. This is especially true in the modern era of government and data privacy regulations such as GDPR and the Barney Frank Act.
Datameer provides the most robust suite of enterprise-grade features for advanced security, complete governance and processing scale:
- Complete role-based security and integration with a range of enterprise security systems including LDAP, Active Directory, Kerberos and SAML
- Easy to apply data retention policies to ensure that data is well managed and used only as necessary
- End-to-end lineage to track how data is used, how it is transformed, and when artifacts and datasets change
- Complete user and behavior auditing to track any actions and changes to artifacts and datasets
- Ability to integrate with enterprise governance tools for the application of enterprise governance policies
Cloud computing has become an essential part of the infrastructure for any organization and is increasingly gaining attraction for analytics. But to fully take advantage of the cloud for analytics, organizations need a platform that is cloud-native, enables hybrid architectures and offers critical enterprise-grade features for security and governance.
Being cloud-native means that a platform provides the agility organizations seek, which includes both starting new initiatives and extending initiatives for burst processing needs. As Gartner mentioned, separating compute from storage is the key architectural underpinning to being cloud-native. Without this core architectural aspect, a platform cannot be cloud-native.
Datameer for AWS is the first big data platform to truly be cloud-native. The enhanced, new cloud-native hybrid data architecture now separates compute from storage and delivers on the promise of the cloud, giving enterprises the ability to span on-premises and the cloud. To learn more, please visit our Datameer for AWS web page.