Introduction to Oracle Big Data Cloud Service – Compute Edition (Part III) – Ambari

This is my third blog post about Oracle Big Data Cloud Service – Compute Edition. I continue to guide you about the “Big Data Cloud Service – Compute Edition” and its components. In this blog post, I will introduce Ambari – the management service of our hadoop cluster.

The Apache Ambari simplifies provisioning, managing, and monitoring Apache Hadoop clusters. It’s the default management tool of Hortonworks Data Platform but it can be used independently from Hortonworks. After you create your big data service, SSH and 8080 (port used by Ambari) is blocked. You need to enable the rules to allow access through these ports. In my first blog post about Oracle Big Data Cloud Service – Compute Edition, I showed how to enable these ports.

Introduction to Oracle Big Data Cloud Service – Compute Edition (Part II) – Services

In my previous post, I gave a list of installed services on a “Oracle Big Data Cloud Service – Compute Edition” when you select “full” as deployment profile. In this post, I’ll explain these services and software.

HDFS: HDFS is a distributed, scalable, and portable file system written in Java for Hadoop. It stores data so it is the main component of the our cluster. A Hadoop (big data) cluster has nominally a single namenode plus a cluster of datanodes, but there are redundancy options available for the namenode due to its criticality. Both namenode and datanode services can run in same server (although it’s not recommended on a production environment). In our small cluster, we have 1 active namenode, 1 standby namenode and 3 datanodes – distributed to 3 servers.

YARN + MapReduce (v2): MapReduce is a programming model popularized by Google to process large datasets in a parallel and scalable way. is a framework for cluster resource management and job scheduling. YARN contains a Resource Manager and Node Managers (for redundancy we can create a standby Resource Manager). The Resource Manager tracks how many live nodes and resources are available on the cluster and coordinates which applications submitted by users should get these resources. Each datanode should have a nodemanager to run MapReduce jobs.

Introduction to Oracle Big Data Cloud Service – Compute Edition (Part I)

Over the last few years, Oracle has dedicated to cloud computing and they are in a very tough race with its competitors. In order to stand out in this race, Oracle provides more services day by day. One of the services Oracle offers to the end user is “Oracle Big Data Cloud Service – Compute Edition”. I examined this service by creating a trial account, and I decided to write a series of blog posts for those who would like to use this service.

In my opinion, the most difficult part of creating a Big Data ecosystem is to run many open source software projects together, and integrate them with each another. There are 3 major players on the market to help end-users to build an integrated and tested solution for big data: Cloudera, Hortonworks and MapR. Oracle has partnered with Cloudera to build the Oracle Big Data Appliance and Oracle Big Data Cloud Service. They also offer “Oracle Big Data Cloud Service – Compute Edition” based on Hortonworks. Creating “Oracle Big Data Cloud Service – Compute Edition” is simple. You get a ready-to-use big data cluster in about 15 minutes after giving the basic information such as the name of the cluster, the number of servers (nodes), CPU and disk sizes for each node, and the administrator password.

First, let’s create an “Oracle Big Data Cloud Service – Compute Edition”. After you create our test account for Oracle Cloud, you are log in to the “Oracle Cloud” dashboard. Using this dashboard you can see all your services and add new services at the same time.