Introduction:
IT infrastructure and operations are critical assets for businesses to function smoothly. Disaster recovery management involves planning and implementing strategies to ensure that an organization can quickly recover from disruptive events, such as natural disasters, cyberattacks or equipment failures. IT disaster recovery management is a way to save the business from negative consequences of these risks.
Such scenarios can present a direct threat to business continuity and survival. The impact can be in the form of financial losses, operation disruptions, reputation loss, or even legal consequences.
This blog post discusses disaster recovery management and the best practices to adopt.
Disaster Recovery:
Disaster recovery is the process by which an organization attempts to prevent or minimize the loss of business and data in the event of a disaster. It is about how an organization bounces back and regains normalcy after the catastrophic impact of such events.
Disasters can have significant impacts on software systems, affecting both the functionality and security of applications.
Some key impacts:
- Data Loss: Disasters can lead to the loss of critical data, especially if proper backup systems are not in place.
- Downtime: Software systems may experience prolonged downtime, disrupting business operations and leading to financial losses.
- Security Breaches: Disasters can expose vulnerabilities, making systems more susceptible to cyberattacks and data breaches.
- Corrupted Data: Data corruption can occur during disasters, leading to inaccurate or unusable information.
- Service Disruptions: Essential services and applications may become unavailable, affecting users and customers.
For example, in 2024, OpenAI experienced a major outage due to a misconfiguration in their Kubernetes system, which disrupted key services like ChatGPT and Sora for several hours. This incident highlighted the importance of proper configuration management and disaster recovery planning.
SailPoint Disaster recovery plan for business continuity. It refers to the processes and procedures to ensure the uninterrupted functioning of the business’s during and after a disruptive event.
The simple flow illustrates the DC-DR strategy.

Data Center-Disaster Recovery (DC-DR) architecture has several advantages.
Advantages:
- Business Continuity: Ensures that critical business operations can continue during and after a disaster, minimizing downtime.
- Data Protection: Provides robust data backup and recovery solutions, safeguarding against data loss.
- Compliance: Helps meet regulatory requirements for data protection and disaster recovery.
- Scalability: Can be scaled to accommodate growing business needs and data volumes.
This video explains the SailPoint IdentityIQ Production Architecture and business continuity plan strategies.
Prerequisites: (DC-DR works for all latest versions).
The below software’s are used by our ENH environment.
- SailPoint IIQ
- JDK
- Tomcat (any application servers).
- NGINX (Load Balancer)
- Database (Mysql)
- Linux (OS)
The Key points of Disaster recovery is Data Replication and Load balancing.
Database Replication:
Steps – How database replication works:
- Step 1: Identify the Primary Database (Source): A primary (or master) database is chosen as the main source of truth where data changes originate.
- Step 2: Set Up Replica Databases (Targets): One or more replicas (or secondary databases) are configured to receive data from the primary database.
- Step 3: Data Changes Captured: Any updates, inserts, or deletes in the primary database are recorded, typically through a transaction log or change data capture mechanism.
- Step 4: Transmit Changes to Replicas: The captured changes are sent to replica databases over the network in real-time or at scheduled intervals.
- Step 5: Apply Changes on Replicas: The replicas apply these updates to keep their data in sync with the primary database.
Load Balancing:
In an active-standby (or active-passive) load balancer setup, the primary load balancer (active) handles all the traffic under normal conditions, while the secondary load balancer (standby) remains on standby, ready to take over if the primary load balancer fails.
Steps – How Load balancers works.
- Primary Load Balancer (Active):
- Actively manages and distributes incoming traffic to the servers in the primary data center (DC).
- Continuously monitors the health and performance of the servers and the network.
- Secondary Load Balancer (Standby):
- Remains on standby, not handling any traffic under normal conditions.
- Regularly synchronizes with the primary load balancer to stay updated with the current state and configurations.
- Failover Process:
- If the primary load balancer detects a failure or significant issue, it triggers the failover process.
- The secondary load balancer becomes active and starts handling the traffic, ensuring minimal disruption to services.
- Failback Process:
- Once the primary load balancer is restored and verified to be fully operational, traffic can be redirected back.
- The secondary load balancer returns to standby mode, ready for any future failover events.
This setup ensures high availability and reliability by providing a backup load balancer that can quickly take over in case of a failure.
The following demo video is a deep dive demonstration of SailPoint Disaster recovery plan and failover configurations using our ENH environment.
Recommendations:
NGINX (Load Balancer):
- Configure only UI servers in Load Balancer.
- Sticky Sessions: Configure the load balancer for sticky sessions (also known as session persistence) to ensure that user sessions are consistently routed to the same application server.
- We recommend active-standby (or active-passive) load balancer setup.
Database:
- Configure your database for replication to ensure high availability and disaster recovery. Use native database replication features like MySQL replication or Oracle Data Guard.
- Data base must be one phase commit.
- While replicating make sure only identityiq, identityiqah, identityiqPlugin are replicated.