Oracle Big Data Cloud Service CE: Working with Hive, Spark and Zeppelin 0.7

In my previous post, I mentioned that Oracle Big Data Cloud Service – Compute Edition started to come with Zeppelin 0.7 and the version 0.7 does not have HIVE interpreter. It means we won’t be able to use “%hive” blocks to run queries for Apache Hive. Instead of “%hive” blocks, we can use JDBC interpreter (“%jdbc” blocks) or Spark SQL (“%sql” blocks).

The JDBC interpreter lets you create a JDBC connection to any data source. It has been tested with both popular RDBMS and NoSQL databases such as Postgres, MySQL, Amazon Redshift, Apache Hive. To be able to connect a data source, we first need to define it on Zeppelin interpreter settings. In normal conditions, we access Zeppelin trough Big Data Cloud – Compute Edition Console, and it prevents us to see the menu to reach the interpreter settings but we can easily bypass the console with a little trick. After we opened a notebook at the console, get the URL we connected, remove “?#notebook/XXXXX” part from the URL, and add “/zeppelinui/”, so our URL should be like this “https://bigdataconsoleip:1080/zeppelinui/”. This is the address we can access Zeppelin’s native user interface.

In this page, we can use the drop-down menu on the upper-right to access the interpreters page. We can search the interpreters, edit the settings and then restart the interpreter. For now, we don’t need to change anything. Hive is already defined in our Cloud Service so we can use JDBC interpreter to connect Hive.

Introduction to Oracle Big Data Cloud Service – Compute Edition (Part IV) – Zeppelin

This is my forth blog post about Oracle Big Data Cloud Service – Compute Edition. In my previous blog posts, I showed how we can create a big data cloud service compute edition on Oracle Cloud, which services are installed by default, ambari management service and now it’s time to write about how we can work with data using Apache Zeppelin. Apache Zeppelin is a web-based notebook that enables interactive data analytics. Zeppelin is not the only way to work data but it’s surely very friendly for end-users and (as I said before) it’s already installed to our big data cloud service compute edition.

We can create a rule to allow access to TCP port 9995 for accessing Zeppelin directly, or we can use “big data console” provided by Oracle. I’ll prefer the second one because our Ngix proxy will let only authenticated users to access Zeppelin.

After you reach the console, go to the notebooks page. Click “new note”, enter a name and then click “OK” – this will create a new empty notebook, and you’ll start editing it. My new notebook’s name is “MyFirstNote”. As you can see there are some sample notebooks, you can examine them to learn how you can use java and spark with Zeppelin.