How long will Ceph take to recover a Ceph disk faiure?
The time Ceph takes to recover from a disk failure depends on data size, recovery settings, cluster resources, and disk type—ranging from a few hours to a couple of days.
The time required for recovering data from a storage device failure depends on the following conditions:
- The time for recovering the data is not related to the size of the hard disk. The time is proportional to the amount of data stored in the disk. Ceph only needs to restore the damaged data. The less data damaged, the faster the repair. Ceph does not rebuild a disk like the RAID controller.
- Ceph re-heals the data to the healthy disks in the cluster. The more disks and hosts in the cluster, the faster the recovery.
- The recovery speed can be adjusted by software parameters. The higher the recovery speed is set, the faster the recovery. Accelerating the recovery will occupy more CPU and network hardware resources.
- CPU performance and network bandwidth will also affect the recovery speed.
- The recovery speed of a replicated pool will be faster than the erasure code pool.
- Generally, administrators can slow down the recovery speed to reduce the use of server resources
Reference time for recovery:
- NVMe SSD could take about a few hours.
- HDD could take about one to two days