Build a Cassandra Cluster on Docker

In this blog post, I’ll show how we can build a three-node cassandra cluster on Docker for testing. I’ll use official cassandra images instead of creating my own images, so all process will take only a few minutes (depending on your network connection). I assume that you have Docker installed on your PC, have internet connection (I was born in 1976 so it’s normal for me to ask this kind of questions) and your PC has at least 8 GB RAM. First of all, we need to assign about 5 GB RAM to Docker (in case it has less RAM assigned), because each node will require 1.5+ GB RAM to work properly.

Open the docker preferences, click the advanced tab, set the memory to 5 GB or more, and click “apply and restart” docker service. Launch a terminal window, run “docker pull cassandra” command to fetch the latest official cassandra image.

Introduction to Apache Cassandra

On Friday, I gave a presentation about Apache Cassandra at Big Talk event organized by Komtaş Information Management company. Cassandra is a top level Apache project which is born at Facebook. It is a distributed database for managing large amounts of structured data. It provides highly available service and no single point of failure, even running on commodity harware.

In my previous company, we used Cassandra to store our social platform data. It performs well on even medium-size instances running on Amazon Cloud, so our development team wanted to use it on more projects. I managed both production and test environments, and I can say that it is easy to operate as long as you understand the Cassandra internals. So in this event, I wanted give some introductory information about Apache Cassandra.

By the way, I have to say that audience was great. The room was full. People asked lots of questions during the session, took photos of slides, and gave great feedback. I would like to thank people who joined my session, and Komtaş for organizing the event.

Oracle Cloud Day Istanbul

Yesterday, I spoke at the Oracle Cloud Day Istanbul. It was an amazing event. The venue (Swissotel the Bosphorus) was great, the conference rooms were comfortable, the presentations were attractive and well-balanced (DB, Middleware, Development), and the audience was great. This year, the event was much more crowded than previous years.

As usual, Oracle Turkey set a separate track for TROUG (Turkish Oracle User Group) presentations, and I was one of the speakers of TROUG. As TROUG, we appreciate Oracle Turkey’s support to us. Personally, I would like to thank them for this successful organisation. As I said, everything was great.

Oracle Berkeley DB Java Edition

I was searching NoSQL databases and see that Oracle provides a NoSQL database called Berkeley DB. I examined it and wrote a blog to give quick tips to start Java Edition of Berkeley DB. This blog was published in Turkish, about 1 year ago. I’ve decided to translate (in fact, re-write) and publish it in English, and this is what you are reading now.

Berkeley DB is a high-performance embedded database originated at the University of California, Berkeley. It’s fast, reliable and used in several applications such as Evolution (email client), OpenLDAP, RPM (The RPM Package Manager) and Postfix (MTA). In contrast to most other database systems, Berkeley DB provides relatively simple data access services. Berkeley DB databases are B+Trees (like indexes in Oracle RDBMS) and can store only key/value pairs (there are no columns or tables). The keys and values are byte arrays. Databases are stored as files within a single directory which is called “environment”.

There are three versions of Berkeley DB:

  • Berkeley DB (the traditional database, written in C)
  • Berkeley DB Java Edition (native Java version)
  • Berkeley DB XML (for storing XML documents)

As a hobbyist Java programmer, I prefer Berkeley DB Java Edition (JE). Berkeley DB JE supports almost all features of traditional Berkeley DB such as replication, hot-backups, ACID and transactions. It is written in pure Java so it’s platform-independent.

Berkeley DB JE provides two interfaces:

  1. Traditional Berkeley DB API (with DB data abstraction of key/value pairs)
  2. Direct Persistence Layer (DPL) which contains “Plain Old Java Objects” (POJO)

Because I’m an old-school (ex)programmer, I’ll show how to use the traditional Berkeley DB API. Traditional Berkeley DB API will help you understand how Berkeley DB works.