This is the second part of the discussion about deploying OpenSSO in a high availability configuration across two data centers.
Although, I’ve not tested these configurations with OpenAM I believe they should work.
Configuration with Two Groups of OpenSSO Servers
The environment consists of two independent data centers, each of them having two OpenSSO instances deployed in an HA configuration behind a load balancer.
This configuration limits the load-balancer in each data center to connect only to the OpenSSO servers in the same geographical site. This avoids having to make the local load-balancer aware of the other OpenSSO servers in the remote data center and simplifies the deployment. However, in the case of a fail-over, this configuration increases the possibility of back channel communication between the OpenSSO servers in the two data centers.
The OpenSSO configuration to implement this solution is:
- Site configuration (single site):
- Server List: Svr1, Svr2, Svr3, Svr4
- Primary URL: http://LB1.xyz.com/opensso
- Secondary URL: http://LB2.xyz.com/opensso
- Session Failover is enabled for the site.
- Service/Naming failover configuration for Client A:
- http://LB1.xyz.com/opensso/naming service , http://LB2.xyz.com/opensso/naming service
- Service/Naming failover configuration for Client B:
- http://LB2.xyz.com /opensso/namingservice , http://LB1.xyz.com/opensso/naming service
Use Cases
The following use cases describe what happens during fail-over with this configuration.
These use cases assume that the user session is initially created on Svr1 via LB1 and the fail-over occurs when ClientA tries to validate the user session later on.
Case 1: LB1 is up, Svr1 is down and Svr2 is up
- The session validation user request goes to Svr2 via LB1.
- Svr2 detects the host server for this particular session (Svr1) is down and selects the backup OpenSSO server to serve the request . Based on the OpenSSO IRR logic the user session could be recovered on one of the three running OpenSSO instances Svr2, Svr3 or Svr4.
- If the backup server is Svr2,
- The user session is recovered on Svr2. The session stickiness cookie is now set to Svr2.
- All subsequent requests for this session will go to Svr2 via LB1.
- If the backup server is Svr3,
- The user session is recovered on Svr3. The session stickiness cookie is now set to Svr3.
- All subsequent requests for this session will first land on Svr2 via LB1 and then get forwarded to Svr3.
- If the backup server is Svr4,
- The user session is recovered on Svr4. The session stickiness cookie is now set to Svr4.
- All subsequent requests for this session will first land on Svr2 via LB1 and then get forwarded to Svr4.
Case 2: LB1 is up and both Svr1 and Svr2 are down
- The Site Monitor in the Client SDK detects that no OpenSSO servers behind LB1 are available so the session validation request is forwarded to LB2.
- Either Svr3 or Svr4 receives the request.
- If Svr3 receives the request:
- Svr3 detects the primary server hosting this session (Svr1) is down.
- Svr3 selects the backup OpenSSO server to serve the request . Based on the OpenSSO IRR logic the user session could be recovered on either Svr3 or Svr4.
- If the backup server is Svr3:
- The user session is recovered on Svr3.
- All subsequent requests for this session will go to Svr3 via LB2.
- If the backup server is Svr4,
- The request is forwarded to Svr4.
- The user session is recovered on Svr4.
- All subsequent requests for this session will go to Svr4 via LB2.
- If Svr4 receives the request:
- Svr4 detects the primary server hosting this session (Svr1) is down.
- Svr4 selects the backup OpenSSO server to serve the request . Based on the OpenSSO IRR logic the user session could be recovered on either Svr3 or Svr4.
- If the backup server is Svr3:
- The request is forwarded to Svr3.
- The user session is recovered on Svr3.
- All subsequent requests for this session will go to Svr3 via LB2.
- If the backup server is Svr4:
- The user session is recovered on Svr4.
- All subsequent requests for this session will go to Svr4 via LB2.
Case 3: LB1 is down and Svr1 is down
- The Site Monitor in the Client SDK detects LB1 is unreachable so the session validation request is forwarded to LB2.
- Either Svr3 or Svr4 receives the request.
- If Svr3 receives the request:
- Svr3 detects the primary server hosting this session (Svr1) is down.
- Svr3 selects the backup OpenSSO server to serve the request . Based on the OpenSSO IRR logic the user session could be recovered on one of the three running OpenSSO instances Svr2, Svr3 or Svr4.
- If the backup server is Svr2:
- The request is forwarded to Svr2.
- The user session is recovered on Svr2.
- All subsequent requests for this session will first land on Svr3 via LB2 and then get forwarded to Svr2.
- If the backup server is Svr3:
- The user session is recovered on Svr3.
- All subsequent requests for this session will go to Svr3 via LB2.
- If the backup server is Svr4:
- The user session is recovered on Svr4.
- All subsequent requests for this session will go to Svr4 via LB2.
- If Svr4 receives the request:
- Svr4 detects the primary server hosting this session (Svr1) is down.
- Svr4 selects the backup OpenSSO server to serve the request . Based on the OpenSSO IRR logic the user session could be recovered on one of the three running OpenSSO instances Svr2, Svr3 or Svr4.
- If the backup server is Svr2:
- The request is forwarded to Svr2.
- The user session is recovered on Svr2.
- All subsequent requests for this session will first land on Svr4 via LB2 and then get forwarded to Svr2.
- If the backup server is Svr3:
- The user session is recovered on Svr3.
- All subsequent requests for this session will go to Svr3 via LB2.
- If the backup server is Svr4:
- The user session is recovered on Svr4.
- All subsequent requests for this session will go to Svr4 via LB2.
Case 4: LB1 is down and Svr1 is up
- The Site Monitor in the Client SDK detects that LB1 is unreachable so the session validation request is forwarded to LB2.
- Either Svr3 or Svr4 receives the request.
- Svr3/Svr4 detects the primary server hosting this session (Svr1) is up.
- Svr3/Svr4 then forwards the request to Svr1 for session validation.
- All subsequent requests for this session will first land on Svr3/Svr4 via LB2 and then get forwarded to Svr1.
The OpenSSO IRR logic will also identify when the primary server hosting the user session is back in service and resume routing traffic to it.