I though I would stop writing about “Oracle Big Data Cloud Service – Compute Edition” after my fifth blog post, but then I noticed that I didn’t mention about the Apache Hive, another important component of the Big Data. Hive is a data warehouse infrastructure built on top of Hadoop, designed to work with large datasets. Why is it so important? Because it includes support for SQL (SQL:2003 and SQL:2011), and helps users to utilize existing SQL skillsets to quickly derive value from big data.
Although new improvements of Hive project enables sub-second query retrieval (Hive LLAP) but it’s not designed for online transaction processing (OLTP) workloads. Hive is best used for traditional data warehousing tasks.
In this blog post, I’ll demonstrate how we can import data from CSV files into hive tables, and run SQL queries to analyze the date stored in these tables.