The Critical Factor in Business Value Continuity : Failover
Bitnine Global Marketing team
Mon Jul 08 2024

Failover: A Critical Element for Business Continuity
Imagine your business without failover capabilities. What impact would disasters such as ransomware attacks, natural catastrophes, hardware failures, file corruption, human errors, and numerous other issues have? Without proper preventive measures, these disruptions can halt production and seriously degrade your business value.
What is Failover? Understanding the Basics
Failover is crucial for backup and disaster recovery because it involves the process of switching essential workloads, systems, and applications to a standby or secondary site when the main site is down or unavailable.
This ensures business operations can continue almost or entirely uninterrupted, even during disasters or scheduled maintenance.
Why is Failover Crucial for Your Business?
Failover is vital for business continuity as it reduces or prevents total system failures, enhancing system resilience against single points of failure. This resilience ensures services continue despite component failures, contributing significantly to a business's uptime and reliability.
Failover is particularly mission-critical systems that must always be available. It allows employees to continue their work without interruption and ensures easy access to files and systems, even during unplanned outages. This is especially important for businesses with strict uptime requirements.
Failover Configurations: How to Set Them Up
Firstly, High Availability (HA) can be set up in Active-Active or Active-Standby modes, and here's how AHM configures it:
Active-Active(Independent Disks)
In an Active-Active HA configuration, two or more nodes perform the same tasks simultaneously. This setup ensures that the workload is evenly distributed and balanced across all nodes, preventing any node from being overloaded. Active-Active clusters can utilize more nodes, thus improving throughput and response times. For the HA cluster to operate smoothly, each node's configuration and settings must be identical.
The AHM(Agens High Availability Manager) system manages and provides high availability for servers, consisting of two types of servers. If there is an issue with the primary server, the secondary server takes over as the primary server and operates accordingly.
Primary, master, read/write server: This server allows data input, modification, deletion, and query
Secondary, slave, standby server: This server reflects changes made on the primary server
Active-Passive (Standby - Independent Disks)
Similar to an Active-Active cluster, an Active-Passive or Active-Standby High Availability (HA) configuration also consists of at least two nodes.
However, as the name suggests, not all nodes are active. In a typical two-node setup, one node is always active while the other remains passive or on standby, ready to take over if the active node fails. For a smooth failover process, both nodes must have identical settings.
Benefits of Effective Failover Strategies
Implementing a robust failover solution is critical for maintaining business continuity and preserving value, offering numerous benefits, including:
Guaranteed Business Continuity: Failover ensures that business operations continue as usual, even if disasters occur or key components go offline
Improved Uptime: The failover process quickly switches from the primary system to a redundant standby system when the main system is unavailable, minimizing downtime and allowing work to continue without interruption
Cost Savings: implementing a failover solution reduces costs associated with downtime, such as lost revenue, productivity, opportunities, and brand reputation damage
Considerations for Implementing Efficient Failover
While properly implemented failover can benefit a company, it is essential to be aware of potential drawbacks and make efforts to mitigate them:
Cost: Setting up, managing, and monitoring a failover system involves significant costs, including hardware and software expenses. Ensuring that failover operations run smoothly and automatically may require substantial capital investment in high-bandwidth systems with synchronous data transmission capabilities
Expertise Needed: Just like the primary systems, failover systems require professional maintenance, testing, and validation to operate smoothly. If deploying and managing a failover system requires more expertise, businesses might need to rely on external experts, which can significantly increase costs
Activating Failover with AHM: A Step-by-Step Guide
AHM monitors the health of nodes within a cluster to detect and initiate failover. If an active node fails and cannot provide services, the HA solution detects this and switches the standby server to an active role.
During this time, the standby server, which is already running, replicates data from the active server. This minimizes downtime and allows the standby server to continue services seamlessly when the active database is down.
Steps Involved in Failover Process:
AHM on each node detects a failure in the active database
The standby AHM attempts to connect to the primary database
The failover process initiates
The most recently updated node executes a pre-promotion script
Standby AHM promotes the standby database to primary
AHM assigns a virtual IP address to the new leader node to normalize services
Failover - Recovery and Reversion to Primary:
The node affected by the failure is restored as a standby and then can be reverted back to primary
The restored node calls the `ahm_promote_node` command to re-promote the node to the primary role
AHM: The Ultimate Solution for Ensuring Stable Business Continuity
Architecture of AHM
AHM operates with components that monitor cluster node failures to detect failovers. It includes:
Failover Process
Real-time monitoring of failures in systems, networks, storage, and applications
Consensus-building on decisions related to failover
Executing failover during failures
Heartbeat
Health checks between redundant nodes for Database and AHM
Provides a single connection point to client applications (using virtual IP)
Manages cluster status
Executes pre and post-scripts provided by the user
AgensSQL's HA Solution, Pgpool-II:
Supports high availability through the HA (High Availability) extension solution
Utilizes a distributed mechanism at the DB session level through the expansion of readonly nodes
Detects failures in primary or standby databases and automatically takes appropriate actions to ensure service continuity
Features of Pgpool-II:
Connection pool management: Enhances overall performance through the reuse of connections
Load balancing: Distributes queries across multiple servers if they hold the same data using the replication function
Failover: In the event of a failure in the Master Server, the Slave Server assumes its functions
Agens High Availability Manager(AHM)
High Availability Components: AHM is a high-availability component developed by AGEDB to resolve Single Points of Failure (SPOF) in AgensSQL database servers
Fault Detection and Automatic Measures: Uses a distributed mechanism to detect failures in primary or standby databases and automatically takes appropriate measures to ensure service continuity
Guaranteed Availability: Ensures the availability of database servers for client applications, installed alongside each primary and standby database. If a failure occurs in the primary database or system, AHM coordinates to designate one of the available standby database servers as the new primary and quickly restores services by initiating the VIP on the previous primary server
Support for Stable System Operation: Provides features necessary for cluster operation such as fail-over handling, thus ensuring stable system operation and ease of system expansion
This session covered the critical component of failover in preserving business continuity and value. In our next session, we will delve into more technical content and share success stories of Bitnine's failover implementations.
For more information about Bitnine Global's advanced technology and training or if you have any inquiries, please contact us at marketing@bitnineglobal.com for professional consultation.