Mutex, or mutual exclusion, is a mechanism of concurrency control in computer science to prevent race conditions. It is an important practice in any concurrent or distributed computing system.
A race condition is a behavior of software where the output is dependent on the sequence or timing of other events. When events do not take place in sequence or right timing, it becomes a bug which is difficult to trace, reproduce and debug. The result is nondeterministic as it depends on the relative timing of interfering threads.
A semaphore is a variable or abstract data type to control the access and support of the implementation of mutual exclusion.
My first exposure to semaphore was during my first job as a programmer at a computer shop in Kuantan, a small seaside town on the east coast of peninsular Malaysia. Facing the beautiful South China Sea, Kuantan was a peaceful town where time was running slow. That time, in the early 1990s, dBASE was the first and most successful database management software for microcomputers. It includes the core database engine, query system, forms engine and programming language. dBASE stored structured data in a simple flat file format with file extension .dbf.
dBASE was designed as a standalone database management system. It lacks the semaphore and mutual exclusion for multiple network access. At that time, I was designing and developing a Jobcard Management System for a large German industrial company with its country branch located in Segambut, Kuala Lumpur.
The development of the project went smoothly until the implementation. Nightmares began to emerge from every dark corners. Client called to complain that some records were not updated properly or corrupted. My boss and I rushed over to Kuala Lumpur to take a look.
After spending two days at their office, we found nothing. I made a copy of their data and we went back to Kuantan for analysis.
With one particular job card record, I found out that two operators were updating the same record concurrently – race conditions occurred. After many testing, we concluded that we needed to implement semaphore to prevent race conditions but dBASE had no such implementation. I went through its documentation and there were functions to lock and unlock a record or a table. So, I wrote a simple semaphore to lock and unlock a record or table before and after updating them.
The problem was solved and we had a happy client. Unfortunately, the client called us as soon as we reached home. What now? Some records could not be updated because they were locked. Deadlock occurred. I went back to rewrite the semaphore. This time, I implemented timeout and retry. It worked beautifully. After that, the company had many “networked database” projects coming in.
In distributed computing environment, it is common where multiple nodes and/or threads to simultaneously access the same resources. Lamport’s bakery algorithm is one of many mutual exclusion algorithms designed to prevent multiple threads from entering critical sections of code concurrently to eliminate the risk of data corruption. The algorithm is devised by computer scientist, Leslie Lamport.
I stumbled upon Lamport’s bakery algorithm back in 2008 but I did not really work on it until last night (March 27, 2017). What is Lamport’s bakery algorithm?
Lamport’s bakery algorithm
Imagine a small bakery shop with one cashier and one counter for order collection. Customers come into the shop, queue up at the cashier, order and pay. After paying, the customer will proceed to the next counter and wait to collect his/her order. This sequential process is simple and works well for a small bakery shop.
However, problems will begin to arise when business grows and more customers coming to purchase all kinds of freshly baked artisan breads. Some breads take longer time to bake.
Multiple cashiers and collection counters are needed to serve the growing number of customers. Sequential process can no longer work well.
Customers enter the bakery shop and take a number from a ticketing machine. This ticketing machine serves as the semaphore and no customer will receive a ticket with same number.
Each cashier will call a number in the queue. Customer holding the number will proceed to the cashier to make order and pay. After paying, customer will return to the waiting pool until his/her number is called to collect the order.
Remember, some customers have to wait a little longer depending on the type of breads and quantity they ordered, or whether the breads are readily available on the shelves or what is baking in the oven.
Lamport’s bakery algorithm takes care of the race conditions by assigning each thread a queue number when they are ready to access the critical sections of the program.Then, the threads will wait for their turn to enter the critical sections.
This diagram illustrates how Lamport’s bakery algorithm works.
Whenever a node wants to access a shared resource, it requests a token from master node. At first, node 2 becomes ready to enter CS (critical section) then node 3 and node 1 follow. Master node 0 queues token requests FIFO. Request and token are sent and received using MPI (Messaging Passing Interface).
AVA – An artificial intelligence project
AVA is an artificial intelligence project with a small supercomputer which will sport 16, 32 or 64 (or more) nodes of Raspberry Pi 3. Data mining applications will be concurrently distributed across the nodes. Each node has a local storage to cache the results temporarily while waiting for its turn to write the results to a share storage on the storage node. The transfer of result is atomic.
Communication and rendezvous among nodes will be synchronized using MPI. AVA will be developed in mixed-language environment. Ada is the main programming language to implement the distributed aspects of it with other components developed in Fortran and Python.
The nodes in AVA are organized in star topology with node 0 as the center or master node as depicted in the diagram above. The semaphore consists of several mesh files which contain information in the following format:
<node ID> <task ID> <message> <from node> <to node>
For example, if node 2 wants to send a REQ to node 1, followed by ACK sent from node 1 to node 2, the following will be presented:
0 T12 REQ 2 1 1 T12 REQ 2 1 0 T12 ACK 1 2 2 T12 ACK 1 2
T12 is the task identifier. In this case, the REQ is sent by task T12 from node 2 to node 0 requesting a rendezvous with node 1. Then, node 0 sends REQ to node 1. After receiving ACK from node 1, node 0 responds with an ACK to node 2 establishing the handshaking, ready to rendezvous.
The <message> is defined in package Semaphore as:
type Msg is (REQ, ACK, LCK, REL, ICS, OCS, STO, TRX, RCX, TIO, KIL, ZOM, ORP); package Msg_IO is new Enumeration_IO (Msg); use Msg_IO;
Besides the REQ (Request) and ACK (Acknowledge), other messages are also implemented as LCK (Lock), REL (Release), ICS (In CS), OCS (Out CS), STO (Store to request for storage node), TRX (Transmit), RCX (Receive), TIO (Timeout), KIL (Kill dead processes), ZOM (Zombie or dead process) and ORP (Orphaned process). Do not confuse zombie process with orphaned process.
An example of storage request is demonstrated as below:
0 T12 REQ STO 2 3 3 T12 REQ STO 2 3 0 T12 ACK STO 3 2 2 T12 ACK 3 2
When a more complex state (or requirement) arises, for example, node 2 request storage to node 3 and request to transmit the same data to node 1, a more complex syntax is required:
<node ID> <task ID> <message> [<message>] <from node> <to node> [<message> [<message>] <from node> <to node>]
0 T12 REQ STO 2 3 REQ TRX 2 1 3 T12 REQ STO 2 3 1 T12 REQ TRX 2 1 0 T12 ACK STO 3 2 2 T12 ACK STO 3 2 0 T12 ACK TRX 1 2 2 T12 ACK TRX 1 2
In this case, the result from node 2 must be stored in a shared storage (node 3) and an exact copy of the result must be transmitted to node 1 for further processing. This is an atomic operation, no rendezvous can happen until both ACK STO and ACK TRX are received.
Any node can send REQ and wait indefinitely for ACK. To prevent this, Timeout can be implemented with the following syntax:
<node ID> <task ID> <message> [<message>] <from node> <to node> [<message> [<message>] <from node> <to node>] :<message> <parameter>
The mesh presentation will be:
0 T12 REQ STO 2 3 REQ TRX 2 1 :TIO 10.0 3 T12 REQ STO 2 3 :TIO 10.0 1 T12 REQ TRX 2 1 :TIO 10.0 0 T12 ACK STO 3 2 :TIO 10.0 2 T12 ACK STO 3 2 :TIO 10.0 0 T12 ACK TRX 1 2 :TIO 10.0 2 T12 ACK TRX 1 2 :TIO 10.0
The listener task on node 0 is defined as:
task type Listener is
entry Start (ID : in Integer; Payload : in MutableArray); -- Initialization
entry Forward (Destination : in Integer; Message : in Msg; From : in Integer; To : in Integer); -- Propagate message through the network
entry KILL; entry REQUEST; entry ACKNOWLEDGE; entry LOCK; entry RELEASE; entry IN_CS; entry OUT_CS; entry STORE; entry TRANSMIT; entry RECEIVE; end Listener;
The implementation on AVA is still work in progress.
What do you think? If you have any suggestions or find this article helpful, please drop me a note.