High Data Availability and Durability

Ceph offers replication, EC code, CRUSH, scrubbing for data protection and HA

Ceph offers replication, EC code, CRUSH, scrubbing for data protection and HA

High Data Availability and Durability

Ceph object storage achieves data availability through replication and advanced erasure coding whereby data is combined with parity information and then sharded and distributed across the storage pool.
When a storage device fails, only a subset of the shards are needed to reheal the data, there is no rebuild time or degraded performance, and failed storage devices can be replaced when convenient.
Ceph combines widely distributed data and data scrubbing technology that continuously validates the data written on the media can enable you to achieve 15 nines of data durability.

Data Replication, Erasure Coding & Scrubbing

Object Replication

When a client is going to write data, it uses object ID and pool name to calculate which OSD it shall write to. After the client writes data to the OSD, the OSD copies the data to one or more OSDs. You can configure as many replications as you want to make the data be able to survive in case multiple OSDs fail concurrently. The Replication is similar to the RAID-1 of disk array but allows more copies of data. Because at scale, a simple RAID-1 replication may not sufficiently cover the risk of hardware failure anymore. The only downside of storing more replicas is the storage cost.

Ceph clients write data randomly to OSDs based on the CRUSH algorithm. If OSD disk or node fails, Ceph can re-heal the data from other replications stored in healthy OSDs.

You can define the failure domain to make Ceph store replicated data in different servers, racks, rooms, or data centers for avoiding data loss due to one or more failures of the whole failure domain. For example, if you have 15 storage servers installed in 5 racks ( 3 servers in each rack), you can use replica three and rack as the failure domain. Data write to the ceph cluster will always have three copies stored in three of the five racks. Data can survive with up to any 2 of the racks fail without degrading the client service. The CRUSH rule is the key to make Ceph Storage has no single point of failure.

CRUSH rules ensure replicated data are distributed to different server nodes by following the failure domain

Erasure Coding

Replication offers the best overall performance, but it is not much storage space-efficient. Especially if you need a higher degree of redundancy.
To have high data availability is why we used RAID-5 or RAID-6 in the past as an alternative to RAID-1. Parity RAID assures redundancy with much less storage overhead at the cost of storage performance (mostly write performance). Ceph uses erasure encoding to achieve a similar result. When the scale of your storage system becomes large, you may feel unconfident with allowing just one or two disks or failure domains to fail at the same time. The erasure code algorithm enables you to configure a higher level of redundancy but with less space of overhead.
Erasure coding chunks the original data into K data chunks and calculated extra M coding chunks. Ceph can recover the data maximum M failure domains fail in the meantime. Total K+M of chunks are store in the OSDs, which are in different failure domains.

use Erasure coding K+M=4+2 for it data protection.


As part of maintaining data consistency and cleanliness, Ceph OSD Daemons can scrub objects within placement groups. That is, Ceph OSD Daemons can compare object metadata in one placement group with its replicas in placement groups stored on other OSDs. Scrubbing (usually performed daily) catches bugs or filesystem errors. Ceph OSD Daemons also perform deeper scrubbing by comparing data in objects bit-for-bit. Deep scrubbing (usually performed weekly) finds bad sectors on a drive that weren’t apparent in a light scrub.

Data Healing

Due to the data placement design of Ceph, data is healed by all healthy OSDs. There is no spare disk required for data re-heal. This can make the time to re-heal become much shorter compared to the disk array, which has to rebuild the lost data to the spare disk.

one server nodes fails the cluster will self-heal by applying same data protection method.

Config CRUSH map and rules

Use UVS manager to define the data distribution and failure domain.

Ambedded - A professional Ceph storage solution to the market.

Located in Taiwan since 2013, Ambedded Technology Co., LTD. is a Ceph storage solution provider that offers ARM based software defined storage appliances with 60% CapEX saving, 70% less power consumption and 100% scale out advantages.

For Software-defined storage designing and manufacturing, Ambedded team has broad experience to help customers to adopt the ceph solution into versatile industries, such as Telecom, medical, military, edge datacenter and HA required enterprise storage.

Ambedded has been offering customers Ceph storage solutions since 2013, both advanced technology and 20 years of experience, Ambedded ensures each customer's demands are met.