Datameer Blog post
Cloud Build or Buy? Answers for the Enterprise
by John Morrell on Jun 27, 2018
In today’s fast-paced world, organizations must constantly adapt and innovate to maintain their competitive advantage. Cloud computing has revolutionized the technological landscape, allowing companies to “rent” time on online services without having rapidly-obsolescing hardware on-site.
Nowhere is this clearer than the realm of data analytics, where a wealth of cloud solutions await. They can be catered to every type of organization from the smallest small business to the largest corporate enterprise. But wait, there’s a catch– how do you choose what cloud analytic data management approach will be most effective for your needs?
Cloud Build vs Cloud Buy
You’ve decided to take your analytics to the cloud. That’s not the problem. There’s one question: do you build your own solution with “building blocks” from AWS and other third-party vendors, or do you buy a turnkey data pipeline platform? There are pros and cons to each, but what you choose depends on your organization’s priorities, goals, and resources. Let’s look into the options below.
Building Your Cloud with AWS
A public cloud analytics stack can have few or many components. They work together to store and analyze your data. There are many cloud providers out there offering their own versions of these services, but for the sake of simplicity we will focus on Amazon’s offerings. Amazon provides a variety of services called “primitives” for these tasks, making it easy to select only the ones your organization needs.
While the services themselves can be incredibly valuable, the real power comes from combining them into an analytics workflow to seamlessly process your data.
The Analytics Stack
These are the most common components of any public cloud analytics stack. In the parentheses we list the Amazon offering for these services.
- Object storage (S3)
- SQL over object storage (Athena)
- NoSQL database management (DynamoDB)
- Relational database services (RDS)
- data warehouse services (Redshift)
- big data processing services (Elastic MapReduce)
- extract transform and load (ETL) systems (Glue and Data Pipeline)
- streaming data processing services (Kinesis Firehose and Kinesis Analytics)
- business intelligence and data visualization services (QuickSight) – also works with Tableau, Qlik, Looker, and MicroStrategy
Integration Pairs on AWS
Amazon provides bilateral interfaces to communicate across services. These interfaces allow teams to connect commonly used services in an integration pair. Most services interact natively with S3 so that customers can use Amazon as their “data lake.” But what about their other services?
Depending on what services you want to utilize, Amazon offers varying levels of integration support. For example, Kinesis Firehose can load streaming data directly into Redshift, and then Redshift’s COPY command can load data directly from DynamoDB. That data can then be visualized by QuickSight.
While these popular combinations work pretty well out of the box, other permutations are not so simple. What if you wanted to replicate data from DynamoDB into Aurora in real-time? No pre-packaged integration pair exists, so you’d need to roll your own using Kinesis Firehose and the serverless compute service Lambda that Amazon provides. It’s doable, but not for the faint of heart.
A Laundry List of Skills
Piecing together and maintaining an analytics architecture in the cloud isn’t as easy as one might think. Cloud providers focus their expertise – providing best in breed tools for developers in a self-contained fashion. However, they don’t always concern themselves with the more holistic aspect of a seamless, integrated architecture that solves a specific problem.
Therefore, teams need a number of skills in order to successfully build and maintain a cloud analytics solution. Here are just a few of the competencies a team would need to have for a project of this magnitude:
- Database and SQL skills
- ETL skills
- Spark and/or Hadoop knowledge, as well as scripting and job management expertise
- In-house development and project management resources to see the project through
Organizations can engage with a systems integration partner to help set these kind of systems up, but this comes at a non-trivial cost. In addition, you will either need to keep them on retainer or facilitate knowledge transfer to your team so that they can continue to upgrade and maintain the system as needed.
Of course, this is not to say that a bespoke solution is always the wrong choice. There are cases and companies who could benefit from in-house development. Technology companies with complex workflows and the engineering resources to back it up often take this path. Many of today’s open source analytics solutions were once developed in-house. For enterprise customers who use tech to further their business ends, however, there’s a better way.
Buy, Don’t DIY
A well-engineered third-party software product that sits on top of these cloud primitives could be the answer to your build vs. buy questions. Instead of cobbling together multiple tools and maintaining the architecture, an end-to-end analytic data management platform allow teams to create and manage their own personalized data pipelines using a higher level visual interface – no coding – speeding the time to insight. These platforms are designed to abstract the lower level services with an easy to use interface so that teams can hit the ground running.
This can cut down on your risk, maintenance time, and time-to-market significantly. Esoteric open source and cloud technologies become more friendly and usable when integrated into these packages, further reducing risk. Analytic data management platforms such as Datameer provide a complete, end-to-end platform that delivers the right information to the front lines of business without complex coding or job management.
As an example, Datameer provides higher level analytic data management and preparation features and at the same time leverages underlying AWS services to deliver that rich, powerful functionality in the cloud. It utilizes key AWS services including Elastic Compute Cloud (EC2), Simple Storage Service (S3), Elastic Map Reduce (EMR), Identity and Access Management (IAM) and Key Management System (KMS), can consume data from S3, Redshift, Aurora and other cloud data sources, and creates result sets that can be delivered into Redshift and other cloud data warehouses, or consumed by Athena and Quicksight.
The end result is a streamlined platform for self-service creation and management of analytic data pipelines without the need to code or the need to build and maintain complex integrations with tools. Not only does this speed the time to insight, but it also removes the cost of building and maintaining a complex data architecture in the cloud.
The wealth of public cloud technologies available today are incredibly powerful and innovative, but many disparate packages put together does not make a cohesive data strategy. That’s why unified data pipeline platforms are more important than ever. They can mean the difference between project success and failure. Between truly becoming data-driven or simply paying lip service to the idea.
By exploiting the power of the public cloud and hiding the complexity, third-party software offerings provide a low-risk, high-reward solution for enterprise customers. For all but the most esoteric use cases, third-party platforms the cover the complex analytic data management lifecycle are the recommended solution.
Remember: cloud providers can make you most successful if you combine their products with those from their partners. First party + third party = customer solution success. Follow that equation, and watch the results unfold.