ZooKeeper Series
ZooKeeper provides a solid foundation to implement higher order functions required for “Clustering Applications” / Distributed Systems. In this post; we will implement “Barrier” that Distributed systems uses to block processing of a set of nodes until a condition is met at which time all the nodes are allowed to proceed. Barriers are implemented in ZooKeeper by designating a barrier node. The barrier is in place if the barrier node exists. For the modern scalable applications; often we don't know how many nodes are participating; this is something that is decided at the runtime and is expected to be changeable when required. If there is more load on the applications; we should have an option to add more nodes to meet the demand. In such scenarios; its important to know at runtime how many nodes are participating so each node wait at barrier accordingly and also an option of node enrollment is required so we allow some time to nodes to come online / participate and then calculate how many nodes will participate in the barrier!
To keep things interesting; we will be implementing a proof of concept in Dotnet Core; and given Dotnet Core applications can be run in Linux; we will use Docker to run ZooKeeper as well as our Core CLR nodes. For the sake of simplicity; we will use single instance of ZooKeeper and run all the nodes as separate Docker containers on a single host machine. We can deploy the containers across multiple machines using Rancher, Swarm or Kubernetes etc. You can check out Rancher—First Application post on how to deploy the Docker application across multiple hosts. We will use Barrier example from Visual Studio 2010 Training Kit and re implement accordingly.
Here’s the modified DriveToBoston() function that’s using Barrier helper class that we will write. We will pass-on the ZooKeeper connection string to it in the constructor and it will have EnrollIntoBarrier, GetParticipantCount, ReachBarrier and WaitAtBarrier functionalities. Given containers takes random times to come online based on the host resources and what’s in the container; we are simulating it as “Decision Time”; this is also important given ZooKeeper takes couple of seconds to start accepting connection; similar to any other database. “Roll Time” is simulating the wait time to allow participating nodes to join; “Time To Gas Station” is from the Training Kit example and is simulating the different time nodes will take to reach barrier; where they will sync and proceed.
using System;
using System.Threading;
using System.Threading.Tasks;
class Barrier
{
public Barrier(string connectionString) { }
public Task<bool> EnrollIntoBarrierAsync(TimeSpan timeToRoll, string name) { return Task.FromResult(false); }
public Task<int> GetParticipantCountAsync() { return Task.FromResult(0); }
public Task<object> ReachBarrierAsync(string name) { return Task.FromResult(new object()); }
public void WaitAtBarrier(int participant) { }
}
class HelloWorld
{
static void DriveToBoston(string connectionString, string name, TimeSpan timeToLeaveHome, TimeSpan timeToRoll, TimeSpan timeToGasStation)
{
try
{
Console.WriteLine("[{0}] Leaving house", name);
Thread.Sleep(timeToLeaveHome); //let zookeeper come online and decision time
var barrier = new Barrier(connectionString);
bool envrolled = barrier.EnrollIntoBarrierAsync(timeToRoll, name).Result;
if (!envrolled)
{
Console.Write("[{0}] Couldnt join the caravan!", name);
return;
}
Console.WriteLine("[{0}] Going to Boston!", name);
int participants = barrier.GetParticipantCountAsync().Result;
Console.WriteLine("[{0}] Caravan has {1} cars!", name, participants);
// Perform some work
Thread.Sleep(timeToGasStation);
object o = barrier.ReachBarrierAsync(name).Result;
Console.WriteLine("[{0}] Arrived at Gas Station", name);
// Need to sync here
barrier.WaitAtBarrier(participants);
// Perform some more work
Console.WriteLine("[{0}] Leaving for Boston", name);
}
catch (Exception ex)
{
Console.WriteLine("[{0}] Caravan was cancelled! Going home!", name);
Console.WriteLine(ex);
}
}
}
For the Barrier Helper; we will be using /dotnetcoreapp as the application root node in the ZooKeeper; and /dotnetcoreapp/barrier as the Barrier node. Our Barrier node has two children; participants and reached. All these nodes are Persistent. Each node will create a child node under /dotnetcoreapp/barrier/participants node when enrolling itself; and after a roll time; we will count number of children to determine the number of participants. And when the processing will start; each node will report itself when it will reach barrier by creating a node under /dotnetcoreapp/barrier/reached. When children under reached node becomes equal to the number of participant each node will get sync and proceed with any further processing.
We will use “watcher” functionality that ZooKeeper provides to watch the reached node; the watch will get triggered whenever there is any change; a new child is created.
One of the most interesting things about ZooKeeper is that even though ZooKeeper uses asynchronous notifications, we can use it to build synchronous consistency primitives. We will use this for the roll call situation. After a roll call time out; the node will create /dotnetcoreapp/barrier/rollcomplete; each node will first check its existance; if its not there; will enroll itself; and then check again for its existance and if its there; will check the Czxid of the two nodes; the create zookeeper transaction id; and as ZooKeeper stamps all the node in sequential way; if rollcomplete id is less than the node’s enrollment node id; it means node failed to get itself enrolled before roll get completed.
Here’s the code of our Barrier helper class
- ZooKeeper exists() api returns the Stat structure that we can use to determine number of children easily
- All the Zookeeper nodes created by nodes for participation and reporting itself for eaching barrier are Ephemeral; they will get deleted automatically when node will disconnect from the Zookeeper server
The code of the Dotnet Core project is available at https://github.com/khurram-aziz/HelloDocker/tree/master/Zoo; you can clone the code and then run dotnet restore to restore the used packages including the ZooKeeper Client library. We will run three docker containers of this app providing different parameters to simulate the Training Kit example. To build the container image of our application; first run dotnet publish –c Release –o out to build + publish the release confiiguration of our app into the “out” folder; and then use this Dockerfile to build the container image
We can use docker-compose to run the ZooKeeper and the instances of our Dotnet Core for simulation. Here’s the YML file that simulating the three nodes as per Training Kit original example
- I have specified the dockerfile for the Dotnet Core application; we can use docker-compose up –build (dash dash build) and it will build and run the containers with single command
- Also note that dennis node parameters are such that they will not be able to join the “caravan” given its taking too much time to decide and by that time; enrollment gets complemented
If everything goes smoothly; you will see an output similar to this
…
mac_1 | [Mac] Leaving house
dennis_1 | [Dennis] Leaving house
charlie_1 | [Charlie] Leaving house
…
mac_1 | [Mac] Going to Boston!
mac_1 | [Mac] Caravan has 2 cars!
charlie_1 | [Charlie] Going to Boston!
charlie_1 | [Charlie] Caravan has 2 cars!
…
dennis_1 | [Dennis] Couldnt join the caravan!zoo_dennis_1 exited with code 0
…
charlie_1 | [Charlie] Arrived at Gas Station
mac_1 | [Mac] Arrived at Gas Station
charlie_1 | [Charlie] Leaving for Boston
mac_1 | [Mac] Leaving for Boston
…
zoo_mac_1 exited with code 0
zoo_charlie_1 exited with code 0
…
We can create other higher order constructions; in the ZooKeeper; its called ZooKeeper Recipes; their pseudo codes are discussed at https://zookeeper.apache.org/doc/trunk/recipes.html; some of these recipes are available in their official Java client library and given the Apache ZooKeeper .NET async client library we are using is based on the Java client library; they have also made available ZooKeeperNetEx.Recipes nuget package that we can use. Leader Election and Queue are available in there.
Happy Containering / Clustering / Distributing your app!