Alibaba Cloud E-MapReduce vs AWS EMR vs. Azure HDInsight

Big Data is among the biggest IT trends of the last five years. The idea behind this trend is that given a sufficiently large volume of data, it is possible to derive crucial business insights that could not be discovered through other methods.

Of course, running Big Data analytics at scale requires a significant amount of processing power, and tools for parsing the data. While you can do this on premise, it is not always feasible to set up and maintain a sufficiently large infrastructure. That is why several of the leading cloud service providers have begun offering solutions for processing large volumes of data via Hadoop clusters, or similar solutions. Some of the providers offering such solutions include Amazon Web Services (AWS), Microsoft Azure, and Alibaba Cloud.

This article compares the major cloud-based Big Data platforms offered by each of these public cloud providers.

Hadoop Basics

For those who might not be familiar, Hadoop is an open source framework that is designed to run distributed applications across nodes in a cluster. One of the key differences between a Hadoop cluster and other types of clusters (such as a Microsoft failover cluster) is that Hadoop is designed specifically for use with data-intensive applications.

Hadoop clusters utilize a technique called MapReduce. MapReduce works by breaking a data analytics job into numerous fragments, which are then distributed across the cluster’s nodes for execution.


Amazon Web Services’ solution for Big Data analytics is AWS EMR. Amazon EMR uses a series of EC2 virtual machine instances to form an Apache Hadoop cluster. A cluster can contain as many as 20 EC2 instances, and Amazon gives subscribers the option of creating multiple Hadoop clusters.

Amazon EMR is designed to work with Hive, Impala, Pig, HBase, and Kinesis Connector. The data that is to be analyzed by the AWS EMR service is stored on Amazon S3 storage. S3 storage can be accessed programmatically through an API, but administrators also have the option of using any of the available third-party S3 clients to upload the data to Amazon EMR.

Azure HDInsight

Microsoft’s Big Data analytics solution within the Azure cloud is Azure HDInsight. HDInsight uses Azure virtual machine instances to create clusters for Hadoop, Spark, Hive, HBase, Storm, Kafka, and Microsoft R Server. The service is designed to work with a variety of development environments, including Microsoft’s Visual Studio, and third-party solutions such as Eclipse and IntelliJ. The service is also designed to work with the Jupyter and Zeppelin notebooks.

Azure HDInsight is designed to be as flexible as possible. When creating a cluster, administrators can choose from any of the Azure virtual machine types, thereby allowing the cluster capabilities to be closely matched to the requirements of the job, while helping to control costs. It is also worth noting that Microsoft provides a 99.9% SLA that extends end-to-end, across the entire workload (not just the VMs that form the cluster).

Azure HDInsight is designed to integrate with Azure Active Directory, and supports multi-factor authentication. The service is also compliant with the requirements of the Health Insurance Portability and Accountability Act (HIPAA) and the Payment Card Industry (PCI).

Alibaba Cloud E-MapReduce

Alibaba Cloud’s data analytics solution is E-MapReduce. Like some competing solutions, Alibaba Cloud E-MapReduce is based around the use of a Hadoop cluster. However, Alibaba Cloud does not use a one-size- fits-all solution. Subscribers are able to choose the ECS model that they wish to use (CPU or Memory), and then the entire cluster is created within a matter of minutes. Once online, the Hadoop cluster is able to dynamically add nodes on an as needed basis. Even so, administrators have the ability to configure and tune the cluster as required.

Alibaba Cloud E-MapReduce is designed to work with the Apache Spark, MapReduce, and Apache Pig frameworks. The data that is being analyzed can be stored on Apache HDFS or HBase, and the cluster supports the use of tools such as Apache Sqoop and Spark SQL. Furthermore, workloads can be scheduled to run automatically. Alibaba Cloud E-MapReduce supports a variety of computational processes, including machine learning, process orchestration, stream processing, and even graph analytics.

Alibaba Cloud enforces data security through a role-based access control mechanism in which a primary account has the option of creating additional accounts, and granting specific service permissions to those accounts. Additionally, the data that is to be analyzed can be encrypted, and the ECS instances that make up the cluster can be protected with a firewall.

Alibaba Cloud is currently offering a free trial of their cloud services. You can register for a free trial subscription at: The trial includes a $300 credit that can be used to explore Alibaba Cloud’s various cloud offerings.


The success of an organization’s Big Data analytics initiative hinges on having the right computational tools for the job. A variety of public cloud service providers offer MapReduce services that can help with analytics. However, these services are not created equally. When selecting a cloud-based MapReduce service, it is important to consider things like platform compatibility and integration, security, and of course, overall flexibility.

Author Bio:

Brien Posey is a Fixate IO contributor, and a 16-time Microsoft MVP with over two decades of IT experience. Prior to going freelance, Brien was CIO for a national chain of hospitals and healthcare facilities. He also served as lead network engineer for the United States Department of Defense at Fort Knox. Brien has also worked as a network administrator for some of the largest insurance companies in America. In addition to his continued work in IT, Brien has spent the last three years training as a Commercial Scientist-Astronaut Candidate for a mission to study polar mesospheric clouds from space. You can follow Posey’s spaceflight training at

时间: 2022-12-17

Alibaba Cloud E-MapReduce vs AWS EMR vs. Azure HDInsight的相关文章

Alibaba Cloud CDN vs. AWS CDN and Akamai China Coverage

One cloud service that has gained rapid popularity over the last few years is the Content Delivery Network, or CDN. A CDN is essentially a platform for optimizing web content delivery performance, usually by distributing the content across a series o

Web Application Firewall Cloud Options: Alibaba Cloud WAF & AWS WAF

A web application or a REST API hosted in a cloud is a common scenario for most developers. However, not every application has the same level of security. Adding a Web Application Firewall (WAF) to your web application is a helpful way to improve you

Working with Big Data on Alibaba Cloud

You know Alibaba Cloud can be used to deploy applications. You may be less familiar with its Big Data storage and management options. In fact, Alibaba offers a range of Big Data solutions. This article outlines them and explains which types of Big Da

Breakthrough in Alibaba Cloud Computing Capabilities - BigBench Reaches 100 TB World Record

In the first day of the 2017 Hangzhou Computing Conference on Oct. 11, Alibaba Cloud President Hu Xiaoming introduced a next-generation computing platform MaxCompute + PAI. In the main forum on the 12th, Zhou Jingren, Alibaba Group Vice President and

Alibaba Cloud Server Guard: A Comprehensive Assessment

Servers are the most desirable zombie for a hacker. The massive amounts of computing resources available in current cloud environments are extremely attractive to hackers. Open-source systems tend to have serious system vulnerabilities, making them e

Interview with Alibaba Cloud Chief Quantum Technology Scientist Shi Yaoyun: A Long Journey to a Bright Future for Quantum Computing

The 2017 Hangzhou Computing Conference (get your tickets now!) will be held once again in Hangzhou's Yunqi township. As one of the world's most influential technology expos, this conference will include brilliant lectures by many Alibaba Group's expe

Alibaba Cloud releases MaxCompute big data platform in the U.S.

On November 16, 2017, Alibaba Group's cloud computing platform, Alibaba Cloud, officially launched its MaxCompute big data platform in the United States. This platform was independently developed by Alibaba Cloud and possesses many features, includin

WordPress with LEMP on Alibaba Cloud – Part 1 Provision and Secure an Ubuntu 16.04 Server

By Jeff Cleverly, Alibaba Cloud Tech Share Author This is the first tutorial of a series that will culminate with a fresh WordPress site running on an Alibaba Cloud ECS Instance running a highly performant LEMP Stack. In this tutorial, we will create

Large-Scale Instant Messaging Hosting on Alibaba Cloud

Abstract: How can we build a stable, high-concurrency instant messaging (IM) system architecture? This is a common requirement when building a social networking IM or apps like WeChat Moments. In such scenarios, it is a basic requirement to update a