Distributed Server Pattern
Distributed Server Pattern
by Eugene Nelson
ABSTRACT
This paper presents a pattern for a distributed server providing a service. Such a server costs less to create, administer, operate, and use than a best performance variation. A simplified reservation system illustrates this pattern.
REQUIREMENTS
- Make it easy to embed internal security checking.
- Make it easy to scale workload across computers.
- Make it easy to distribute workload across geographically dispersed computers in order to survive many natural or man made disasters.
- Continue normal operation, with no more impact than a small time delay, before, during and after any single point of failure.
- Automatically recover state information from a saved context after a complete system shutdown.
- Automatically recover state information from a prior version after an upgrade for configuration change or software fix.
- Make it easy to test a new system version on the same production environment where the current production version continues to operate.
SOFTWARE SYSTEM ENVIRONMENT
A server instance is a program running within an OS. A server is a set of identical server instances on different computers. One server instance is ACTIVE while other instances are in a DOWN, START or STANDBY state.
A server infrastructure monitors the state of each server instance and directs only one instance to become active. The server infrastructure provides message communication between a server instance and the active server instance. The infrastructure provides a recoverable and secure request and response message session between a client server and a service server.
AIRLINE FLIGHT SERVER
A flight is managed by a server with one server per flight. A client server provides functionality to an external user by obtaining service from a flight server. Each instance of a flight server executes the following algorithm.
DOWN State:
- Obtain configuration and initial state information from a persistent copy. If this is a first time start up, as determined from initial state information, then look for a prior version. If a prior version is found then obtain initial state information from a prior version.
- Down state is when there is no communication to a server infrastructure. An instance periodically attempts to communicate with a server infrastructure. Start state is entered when communication becomes available.
START State:
- Request a message channel to the active flight server instance. Note that a periodic attempt is made to create a message channel, whenever none exists, until a request is cancelled.
- Synchronize flight state information from the active instance when a message channel becomes available. Enter standby state after synchronize is complete.
- Enter standby state when it becomes obvious that there is no active instance such as after an elapsed time of 30 seconds.
STANDBY State:
- If a pending reservation is received from the active instance then update the local flight state information and send an acknowledge of the pending reservation.
- If a reservation complete is received from the active instance then update the local flight state information.
- Periodically, or upon change, save the local flight state information to a persistent copy.
ACTIVE State:
- Become active when directed by the infrastructure. Cancel the request for a message channel to the active instance. Create a listen for a message channel from a standby instance.
- Upon a new message channel to a standby instance, add to a list of standby instance message channels. Send the current flight state information.
- Upon disconnect of a message channel, remove it from a list of standby instance message channels.
- Upon infrastructure request, create a server session for a remote client. If recovery of a prior server session then recover server session state.
- Upon client request via a server session:
- Validate request. If invalid request or client not allowed to make request then send error information.
- If the request is a query then examine local flight information and send an appropriate response.
- If the request is to book a flight and the request matches a prior pending reservation then accept the reservation, send a response to the book request and send a reservation complete to each connected standby instance.
- If the request is a new book request that can not be honored based on current policy or flight state information then send an error response. Otherwise, create a pending reservation, update server session state and send the pending reservation to each connected standby instance.
- Upon acknowledge of a book request from all connected standby instances (test for this condition after receiving an ack message or a disconnect), accept a reservation, send a response to the book request and send a reservation complete to each connected standby instance.
- If a standby instance does not respond to a pending reservation within an elapsed time of 30 seconds then disconnect the message channel and remove it from a list of standby instance message channels. Note that the server infrastructure will time out an active server that does not respond to a client request in a timely manner.
- Periodically, or upon change, save the local flight state information to a persistent copy.
SUMMARY
This paper shows a pattern for a distributed server providing a service. Reliability, security and cost are better than with a best performance approach. Overall system performance is excellent by scaling work across many computers.
This pattern works for most any situation. A suitable infrastructure that supports this pattern is available at SoftEcoSDK.com.