Distributed Server Pattern

Distributed Server Pattern

by Eugene Nelson

ABSTRACT

This paper presents a pattern for a distributed server providing a service. Such a server costs less to create, administer, operate, and use than a best performance variation. A simplified reservation system illustrates this pattern.

REQUIREMENTS

- Make it easy to embed internal security checking.

- Make it easy to scale workload across computers.

- Make it easy to distribute workload across geographically dispersed computers in order to survive many natural or man made disasters.

- Continue normal operation, with no more impact than a small time delay, before, during and after any single point of failure.

- Automatically recover state information from a saved context after a complete system shutdown.

- Automatically recover state information from a prior version after an upgrade for configuration change or software fix.

- Make it easy to test a new system version on the same production environment where the current production version continues to operate.

SOFTWARE SYSTEM ENVIRONMENT

A server instance is a program running within an OS. A server is a set of identical server instances on different computers. One server instance is ACTIVE while other instances are in a DOWN, START or STANDBY state.

A server infrastructure monitors the state of each server instance and directs only one instance to become active. The server infrastructure provides message communication between a server instance and the active server instance. The infrastructure provides a recoverable and secure request and response message session between a client server and a service server.

AIRLINE FLIGHT SERVER

A flight is managed by a server with one server per flight. A client server provides functionality to an external user by obtaining service from a flight server. Each instance of a flight server executes the following algorithm.

DOWN State:

- Obtain configuration and initial state information from a persistent copy. If this is a first time start up, as determined from initial state information, then look for a prior version. If a prior version is found then obtain initial state information from a prior version.

- Down state is when there is no communication to a server infrastructure. An instance periodically attempts to communicate with a server infrastructure. Start state is entered when communication becomes available.

START State:

- Request a message channel to the active flight server instance. Note that a periodic attempt is made to create a message channel, whenever none exists, until a request is cancelled.

- Synchronize flight state information from the active instance when a message channel becomes available. Enter standby state after synchronize is complete.

- Enter standby state when it becomes obvious that there is no active instance such as after an elapsed time of 30 seconds.

STANDBY State:

- If a pending reservation is received from the active instance then update the local flight state information and send an acknowledge of the pending reservation.

- If a reservation complete is received from the active instance then update the local flight state information.

- Periodically, or upon change, save the local flight state information to a persistent copy.

ACTIVE State:

- Become active when directed by the infrastructure. Cancel the request for a message channel to the active instance. Create a listen for a message channel from a standby instance.

- Upon a new message channel to a standby instance, add to a list of standby instance message channels. Send the current flight state information.

- Upon disconnect of a message channel, remove it from a list of standby instance message channels.

- Upon infrastructure request, create a server session for a remote client. If recovery of a prior server session then recover server session state.

- Upon client request via a server session:

- Validate request. If invalid request or client not allowed to make request then send error information.

- If the request is a query then examine local flight information and send an appropriate response.

- If the request is to book a flight and the request matches a prior pending reservation then accept the reservation, send a response to the book request and send a reservation complete to each connected standby instance.

- If the request is a new book request that can not be honored based on current policy or flight state information then send an error response. Otherwise, create a pending reservation, update server session state and send the pending reservation to each connected standby instance.

- Upon acknowledge of a book request from all connected standby instances (test for this condition after receiving an ack message or a disconnect), accept a reservation, send a response to the book request and send a reservation complete to each connected standby instance.

- If a standby instance does not respond to a pending reservation within an elapsed time of 30 seconds then disconnect the message channel and remove it from a list of standby instance message channels. Note that the server infrastructure will time out an active server that does not respond to a client request in a timely manner.

- Periodically, or upon change, save the local flight state information to a persistent copy.

SUMMARY

This paper shows a pattern for a distributed server providing a service. Reliability, security and cost are better than with a best performance approach. Overall system performance is excellent by scaling work across many computers.

This pattern works for most any situation. A suitable infrastructure that supports this pattern is available at SoftEcoSDK.com.