Data Replication, Erasure Coding & Scrubbing | ARM microserver Ceph storage solutions | Ambedded

Ceph offers replication, EC code, CRUSH, scrubbing for data protection and HA | High-Performance Ceph Appliances

Ceph offers replication, EC code, CRUSH, scrubbing for data protection and HA

High Data Availability and Durability

Ceph object storage achieves data availability through replication and advanced erasure coding whereby data is combined with parity information and then sharded and distributed across the storage pool.
When a storage device fails, only a subset of the shards are needed to reheal the data, there is no rebuild time or degraded performance, and failed storage devices can be replaced when convenient.
Ceph combines widely distributed data and data scrubbing technology that continuously validates the data written on the media can enable you to achieve 15 nines of data durability.


Data Replication, Erasure Coding & Scrubbing

Object Replication

When a client is going to write data, it uses object ID and pool name to calculate which OSD it shall write to. After the client writes data to the OSD, the OSD copies the data to one or more OSDs. You can configure as many replications as you want to make the data be able to survive in case multiple OSDs fail concurrently. The Replication is similar to the RAID-1 of disk array but allows more copies of data. Because at scale, a simple RAID-1 replication may not sufficiently cover the risk of hardware failure anymore. The only downside of storing more replicas is the storage cost.

Ceph clients write data randomly to OSDs based on the CRUSH algorithm. If OSD disk or node fails, Ceph can re-heal the data from other replications stored in healthy OSDs.

You can define the failure domain to make Ceph store replicated data in different servers, racks, rooms, or data centers for avoiding data loss due to one or more failures of the whole failure domain. For example, if you have 15 storage servers installed in 5 racks ( 3 servers in each rack), you can use replica three and rack as the failure domain. Data write to the ceph cluster will always have three copies stored in three of the five racks. Data can survive with up to any 2 of the racks fail without degrading the client service. The CRUSH rule is the key to make Ceph Storage has no single point of failure.

CRUSH rules ensure replicated data are distributed to different server nodes by following the failure domain

Erasure Coding

Replication offers the best overall performance, but it is not much storage space-efficient. Especially if you need a higher degree of redundancy.
To have high data availability is why we used RAID-5 or RAID-6 in the past as an alternative to RAID-1. Parity RAID assures redundancy with much less storage overhead at the cost of storage performance (mostly write performance). Ceph uses erasure encoding to achieve a similar result. When the scale of your storage system becomes large, you may feel unconfident with allowing just one or two disks or failure domains to fail at the same time. The erasure code algorithm enables you to configure a higher level of redundancy but with less space of overhead.
Erasure coding chunks the original data into K data chunks and calculated extra M coding chunks. Ceph can recover the data maximum M failure domains fail in the meantime. Total K+M of chunks are store in the OSDs, which are in different failure domains.

use Erasure coding K+M=4+2 for it data protection.

Scrubbing

As part of maintaining data consistency and cleanliness, Ceph OSD Daemons can scrub objects within placement groups. That is, Ceph OSD Daemons can compare object metadata in one placement group with its replicas in placement groups stored on other OSDs. Scrubbing (usually performed daily) catches bugs or filesystem errors. Ceph OSD Daemons also perform deeper scrubbing by comparing data in objects bit-for-bit. Deep scrubbing (usually performed weekly) finds bad sectors on a drive that weren’t apparent in a light scrub.

Data Healing

Due to the data placement design of Ceph, data is healed by all healthy OSDs. There is no spare disk required for data re-heal. This can make the time to re-heal become much shorter compared to the disk array, which has to rebuild the lost data to the spare disk.

one server nodes fails the cluster will self-heal by applying same data protection method.

Config CRUSH map and rules

Use UVS manager to define the data distribution and failure domain.




High Data Availability and Durability | ARM microserver Ceph storage solutions | Ambedded

Located in Taiwan since 2013, Ambedded Technology Co., LTD. has been a block and object storage solutions provider. Their major data storage management include, Ceph storage technology, ARM server integration, Software-defined storage, Enterprise storage optimization, Ceph appliance cost savings, storage management software and block and object storage solutions. They provider professional Ceph support, scalable storage systems with high storage efficiency in the data center.

Ambedded offers cutting-edge Ceph storage solutions on ARM microservers, tailored for B2B buyers seeking to optimize their enterprise storage systems. Our turnkey Ceph appliances reduce total cost of ownership (TCO) and simplify storage management, supporting block, file system, and object storage in a unified platform. With a commitment to innovation and customer support, Ambedded is your trusted partner for scalable and efficient SUSE Enterprise Storage Appliance solutions. Experience seamless integration and professional support to leverage the full potential of Ceph technology in your business.

Ambedded has been providing customers with scalable storage systems and cost-effective Ceph storage management since 2013, and with both advanced technology and 20 years of experience, Ambedded ensures that each customer's needs are met.