This is my third blog post about Oracle Big Data Cloud Service – Compute Edition. I continue to guide you about the “Big Data Cloud Service – Compute Edition” and its components. In this blog post, I will introduce Ambari – the management service of our hadoop cluster.
The Apache Ambari simplifies provisioning, managing, and monitoring Apache Hadoop clusters. It’s the default management tool of Hortonworks Data Platform but it can be used independently from Hortonworks. After you create your big data service, SSH and 8080 (port used by Ambari) is blocked. You need to enable the rules to allow access through these ports. In my first blog post about Oracle Big Data Cloud Service – Compute Edition, I showed how to enable these ports.
Ambari has a client-server architecture. The Ambari agents run on each node in Hadoop cluster, and there’s an Ambari server to manage these agents and collect the data from them. In Oracle Big Data Cloud – Compute Edition, each node (we created 3 for trial) has ambari-server software but only the one ambari-server (which is on the first node) is configured and active. On the service overview page, you can see the IP address of the ambari-server.
The IP address of my first node is 188.8.131.52, so the URL of the Ambari server is https://184.108.40.206:8080. If we are already logged in to “Big Data Console”, the ambari page doesn’t ask any credentials but if we are not logged in to “Big Data Console”, we need to enter “bdcsce_admin” as username and the password of the admin user (we defined it while creating the service).
After we login to Ambari web interface, a detailed dashboard welcomes us. We can see performance graphs, alerts (warnings + criticals), existing services (on the left side) in this page.
We’ll use the top menu to access components of Ambari. On the very left, we see “ambari” logo and text, when we click it, we go to the current dashboard. On the right of ambari logo, we see our cluster name (bilyonveri) and “0 ops 2 alerts”. When we click “0 ops” text, we can see the operations running on background (such as starting/stopping services). There’s no operations running right now so we see “0”. When we click 2 alerts, we can see the details of current alerts.
On the top right side, we see link for dashboard, services (you can also same list on the left panel), hosts (to see the hosts in our cluster), alerts (to manage settings of alerts and see the current alerts), admin (to see software versions, service accounts, kerberos settings), views (matrix icon), and user menu. You can access “ambari management page” from the user menu by clicking “manage ambari”.
Each service has its own home page. You can access a service home page from the left panel or through the service menu on top. On that page, you can stop/start a service, turn on/off maintenance, move service components between nodes, run service checks, and delete the service (after we stop it). We can also edit configuration of the services from the “configs” link.
One of the first thing I noticed after I login to ambari web interface is, there are no views available. In normal conditions, there should be views for accessing Hive, Tez, Pig, Files and other services. I do not know the reason why Oracle didn’t include those views, maybe there are some compatibility issues but I want to show you how we can add one of them, the “Hive” view – so we can run Hive commands through Ambari web interface.
The Ambari views are java applications, and it’s easy to get their source codes from github, but it is a little bit complex to compile them for non-Java users (like me). So instead of dealing source codes, we’ll download ambari-server RPM from hortonworks, extract the jar file from the rpm, and then put that file in a special folder – Ambari continuously checks that special folder and load the view.
In my first post, I said Oracle uses Hortonworks DP 2.4.2. I found this information from admin menu, versions page. Now we know the version, we can go to hortonworks installation documents to get the URL of hortonworks repository for Oracle Linux 6. Please note that “Oracle Big Data Cloud Service – Compute Edition” runs on Oracle Linux 6.8.
We need to login to the first node (which hosts ambari-server) via SSH, switch to root user and download the ambari repository configuration, and use it to download installation RPM of ambari server:
sudo su -
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/220.127.116.11/ambari.repo -O /etc/yum.repos.d/ambari.repo
After the above command, yum will check the repositories and update the list of available packages. Because we do not want to touch current installation, we will use “yumdownloader” tool to download the RPM.
yumdownloader ambari-server --destdir=/root
After the above command, the “ambari-server-18.104.22.168-136.x86_64.rpm” file will be downloaded into the /root folder. I recommend to remove the ambari repository when it’s done, to prevent any confusion when applying “potential patch updates” to the system.
We go to “/root” folder and extract the content of the RPM, then go the view folder:
4.0K drwxr-xr-x 2 root root 4.0K Jun 15 07:33 .
4.0K drwxr-xr-x 11 root root 4.0K Jun 15 07:33 ..
900K -rwxrwxrwx 1 root root 897K Nov 23 2016 ambari-admin-22.214.171.124.136.jar
44M -rwxrwxrwx 1 root root 44M Nov 23 2016 capacity-scheduler-126.96.36.199.136.jar
40M -rwxrwxrwx 1 root root 40M Nov 23 2016 files-188.8.131.52.136.jar
2.4M -rwxrwxrwx 1 root root 2.4M Nov 23 2016 hawq-view-184.108.40.206.136.jar
95M -rwxrwxrwx 1 root root 95M Nov 23 2016 hive-220.127.116.11.136.jar
121M -rwxrwxrwx 1 root root 121M Nov 23 2016 hive-jdbc-18.104.22.168.136.jar
34M -rwxrwxrwx 1 root root 34M Nov 23 2016 hueambarimigration-22.214.171.124.136.jar
45M -rwxrwxrwx 1 root root 45M Nov 23 2016 pig-126.96.36.199.136.jar
51M -rwxrwxrwx 1 root root 51M Nov 23 2016 slider-188.8.131.52.136.jar
1.2M -rwxrwxrwx 1 root root 1.2M Nov 23 2016 storm-view-184.108.40.206.136.jar
48M -rwxrwxrwx 1 root root 48M Nov 23 2016 tez-view-220.127.116.11.136.jar
47M -rwxrwxrwx 1 root root 47M Nov 23 2016 wfmanager-18.104.22.168.136.jar
44M -rwxrwxrwx 1 root root 44M Nov 23 2016 zeppelin-view-22.214.171.124.136.jar
We will copy the hive-jdbc-126.96.36.199.136.jar file to “/var/lib/ambari-server/resources/views/” folder. Ambari will check the content of this folder automatically and load the jar file.
After we copy the file, “Hive View” will be automatically created and added to Ambari web interface. You may need to refresh the page to be able to see it. When we click “Hive View”, we’ll get an error “Service ‘userhome’ check failed: User: root is not allowed to impersonate bdcsce_admin”. Ambari runs as root user, we logged in as “bdcsce_admin” so ambari tries to access bdcsce_admin user home (HDFS) directory with root user. If you are a Linux user, you may say “root” can access everywhere but for HDFS, the king is “hdfs” user. So we have to configure hdfs and say that “root” can use any user (or bdcsce_admin).
We go the configs page of the HDFS, and search for “hadoop.proxyuser.root.users” parameter. We will see hadoop.proxyuser.root.users is oracle, we need to enter “*” (star) to enable root to access all users’ files. After we modify the setting, we click SAVE button.
You’ll see the HDFS services require restart to be able to apply configuration changes. We also need to restart other services depending on HDFS service. To restart all these services together, I click “restart all required” button on the actions menu on the left panel.
We wait until all services restarted and then try to click “Hive View” again. After the modification, we should be able to access Hive View with bdcsce_admin user. You can add other views but you may need to troubleshoot some errors. In my opinion, big data administrators should be familiar with all these troubleshooting process, so get used to it 🙂
In my next blog, I’ll mention about Zeppelin.