
Achieving Rapid Ceph OSD Recovery Using SBB Technology
In modern data centers, uninterrupted data availability is critical. While Ceph’s CRUSH algorithm effectively handles failures and protects data integrity, hardware redundancy remains crucial for ensuring high availability. Introducing Storage Bridge Bay (SBB) servers into the Ceph infrastructure significantly improves resilience by minimizing service interruptions during hardware failures.
Challenges of Traditional Ceph Deployments
In a conventional Ceph deployment, each storage server typically hosts multiple Object Storage Daemons (OSDs). If a single server experiences hardware failures, such as a motherboard malfunction or network card failure, all OSDs on that host go offline simultaneously. This situation triggers a recovery process, causing Placement Groups (PGs) to become degraded and potentially compromising data redundancy.
Recovering from such an event can take a significant amount of time, depending on the volume of data and available resources, which can lead to prolonged degraded performance and an increased risk of data loss or service disruption.
Introducing Storage Bridge Bay (SBB) Servers
Storage Bridge Bay (SBB) is a standardized dual-node server architecture designed for high availability. An SBB server houses two independent computing nodes connected to shared storage in a JBOD (Just a Bunch of Disks) configuration. Typically, these servers support dual-port NVMe or SAS drives, providing robust hardware redundancy.
How SBB Enhances Ceph High Availability
In an SBB-based Ceph deployment, each node operates in an active-active mode, meaning both nodes simultaneously run Ceph OSD services. For example, a typical SBB server equipped with 24 NVMe SSD drives distributes these equally between the two nodes, with each node initially managing 12 OSDs.
This design ensures that if one node fails, only half of the OSDs become temporarily unavailable, rather than all at once, significantly reducing the severity and impact of the failure.
Rapid OSD Failover Scenario
When a failure occurs on one node within an SBB server, half of the OSDs become inaccessible. Ambedded Technology has developed a robust script designed to rapidly migrate and reactivate the affected OSDs onto the surviving node.
Here's how the rapid migration process occurs:
1. Obtain the Ceph Container Image:Quickly retrieve the container image reference required for Ceph operations.
2. Remove OSD-specific CRUSH Location: Update the OSD configuration by removing node-specific CRUSH location details.
3. Activate OSD with ceph-volume: Reactivate the OSD services using the ceph-volume utility.
4. Adopt OSD using cephadm: Integrate the activated OSDs back into the Ceph cluster, restoring service swiftly.
Benefits of Using SBB for Ceph
1. Minimized Downtime: Rapid OSD reactivation significantly reduces the time spent in degraded states, swiftly restoring PGs to an active and clean status.
2. Enhanced Service Continuity: Prevents prolonged interruptions, maintaining consistent service delivery.
3. Simplified Maintenance: Immediate hardware repairs become less urgent, as services remain operational on the surviving node.
4. Reduced Risk of Data Loss and Performance Degradation: Accelerated recovery processes and hardware redundancy minimize potential risks associated with hardware failures.
Summary and Conclusion
Integrating Storage Bridge Bay (SBB) servers with Ceph deployments dramatically enhances the resilience and operational efficiency of storage infrastructures. By leveraging an active-active configuration and rapid OSD reactivation capabilities, organizations can significantly reduce downtime and simplify management.
Ambedded Technology's Ceph appliance, Mars 624, exemplifies this integration by offering a turnkey solution that harnesses the benefits of SBB architecture. Organizations looking to improve Ceph availability and streamline maintenance should consider upgrading to Mars 624 to achieve unparalleled storage reliability and efficiency.
Additionally, Ambedded's UniVirStor full-stack Ceph software fully supports any storage servers built on SBB technology, such as Supermicro's SSG-640SP-DE2CR60, ensuring flexibility and compatibility for diverse infrastructure environments.
- Related Products
Mars 624 SBB 24x NVMe Two Hot-Swap Nodes Ceph Storage Appliance
Mars 624 SBB
Mars624 SBB server accommodates two hot-swappable Intel Xeon server nodes in a 2U chassis. Two server nodes can simultaneously connect to all 24x dual-port...
Details