Khurram Aziz - ZooKeeper

ZooKeeper Series

Apache ZooKeeper is an open-source server which enables highly reliable distributed coordination. It helps us by providing a distributed synchronization service that can be used for maintaining configuration information, naming, group service and other similar aspects of distributed applications. This itself is distributed and is highly reliable and instead of reinventing the wheel; we can use this foundation service in our distributed applications. It was a subproject of Apache Hadoop but now it is a top level project. In a nutshell; its a distributed hierarchical key-value store, and in a distributed environment we typically setup multiple ZooKeeper servers to which clients; nodes running our distributed application; connect and retrieve or set information.

Picture from cwiki.apache.org

It stores the information in “znodes” and provides the namespace that is much like a file system. Znode data typically is less than a megabyte; and we can also have ACLs at Znode level. If there are multiple Zookeeper servers; they need to know about each other and they then maintain a quorum and write requests are forwarded to other servers and go through consensus before a response is generated. It also maintains the update order; updates are identified by the unique zxid; the transaction id; and we can have “watches” that Zookeeper server will trigger accordingly.

Its a Java application that can run on Linux, Solaris or FreeBSD operating system. The simplest way to have it running in lab, development or production environment is no doubt using Docker! With two commands; we can have a server up and running and a connected client!

Docker

zookeeper is an official Docker image and we can run its two instance; one as a server and another as a client; zkCli.sh is its CLI client that we can use
The image exposes; 2181, 2888 and 3888; ZooKeeper client, follower and election ports; and we can use Docker standard linking
Visit the image page to learn about how we can further configure it using the environment variables and volume information where it stores the data and log

zkCli We can use zkCli.sh / ZooKeeper CLI to create / read znodes.

We can create three types of znodes using create PATH command; simple, ephemeral (with –e flag) and sequential (with –s flag)
Ephemeral node will automatically get deleted when the session expires; we can disconnect and reconnect and use ls command to verify this

zkCli-help zkCli-ls

The Ephemeral node might continue to appear for a while; the node gets deleted after the connection time out and by default its 30sec

Similarly; we can update data in an existing node using set. We can check the stat of the znode using stat to know the zxid and time values. There are two transaction and timestamp values; cZxid and ctime for create and mZxid and mtime for modification.

delete is used to delete the node that has no children and to delete any znode recursively we use rmr

We can also set ACL to znodes; restrict it to certain IP for write or read; there’s also plugin based authentication support and we can define ACLs accordingly. There’s quota support as well

To connect from our application; there exists language bindings and client libraries. C. Java, Perl and Pythn language bindings are official supported. https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZKClientBindings has list of client bindings.

https://github.com/shayhatsor/zookeeper is a .NET async client also available as Nuget at https://www.nuget.org/packages/ZooKeeperNetEx; the good thing about this is that its not only .NET async friendly (Task based APIs) but also compatible with .NET Core

https://marketplace.visualstudio.com/items?itemName=ksubedi.net-core-project-manager is .NET Core Project Manager (Nuget) that allows us to search, install and remove Nuget package right from Visual Studio Code; that we know is a free, open source, runs everywhere, lightweight code editor with debugging and git support. Here’s the .NET Core client code using this Nuget

We need to map the Zookeeper’s 2181 port to Docker host so we can access it at known IP address; run the Zookeeper using docker run –-rm –p 2181:2181 zookeeper
Notice we are specifying the connection time out when connecting to Zookeeper and also need a watcher; a null watcher code is at https://github.com/khurram-aziz/HelloDocker/blob/master/Zoo/ZooHelper.cs

dotnet-core

The project is available at https://github.com/khurram-aziz/HelloDocker/tree/master/Zoo

We can now use docker-compose and can easily run more instances of Zookeeper in our lab/development environment. Here’s a docker-compose YAML file to run three instances for the Zookeeper cluster

Notice that we have mapped container’s 2181 ports to Docker host’s 2181, 2182 and 2183 ports; we can now use 127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183 as the connection string and our client will connect to the one Zookeeper instance out of this cluster automatically; or we can specify one or two nodes of our choice

We can stop one instance of the Zookeeper server; and write the value using available nodes, then bring back the node and check if the updated value gets replicated! We can try writing after stopping two instances. Will it allow to write if quorum is not complete?