Veeam backup and archive to work with Ceph | Cost-effective Ceph storage management | Ambedded

Why and How to Use the Ceph Storage Appliance as the Repositories of Veeam Backup & Replication | Mars 400 - High-Density Ceph Storage

Why and How to Use the Ceph Storage Appliance as the Repositories of Veeam Backup & Replication

Veeam backup and archive to work with Ceph

This case study explains how to use the Mars 400 ceph storage appliance as the backup repositories of Veeam backup and replication.
Ceph supports object storage, block storage, and the POSIX file system all in one cluster. According to backup requirements, customers can select different storage protocols to support the needs of various storage backup strategies.

In this article, we use Ceph block storage (Ceph RBD) and Ceph file system (Cephfs) as the backup repositories and compare their backup job durations of backing up virtual machines from Hyper-V and VMWare.


Backup Solution Architecture by using Veeam and Ceph

The architecture of backing up virtual machines on VMWare and Hyper-V are similar. Veeam uses data movers to transfer data from source hosts to backup repositories. The data movers run on the proxy server and the repository server. To use Ceph as the backend storage of a backup repository, you can mount RBD or CephFS on a Linux physical server or virtual machine as the repository server. 

If the proxy and repository servers are virtual machines inside the hypervisor cluster, you can get the benefit of network-free high-speed data transporting between VM disk, proxy server, and the repository server. The best configuration of a large hypervisor cluster is to deploy one proxy server VM and one repository server VM on each VMWare host. Otherwise, you can deploy one backup proxy VM on every VMWare host and one off-host repository host to remove the workload from your production VMWare.

There are three ways to use the Ambedded Ceph appliance as the repositories for Veeam Backup and Replication. CephFS and RBD block devices can be used as the on-premises backup repository. The S3 object storage can be used as the capacity tier for a remote location.

For how to set up the Ceph RBD block device and the CephFS file system as the backup repository of Veeam for backing up virtual machines and files, you can find the details on the white paper in the end of this page.


Veeam  proxy and repository servers are virtual machines inside the hypervisor cluster, with Mars 400 ceph storage to provide RBD and cephfs
a large hypervisor cluster is to deploy one proxy server VM and one repository server VM on each VMWare host, to have backup data to save into ceph RBD or cephfs

Testing Environment

ceph cluster

  • Three Mars 400 with 3x monitors, 20 OSDs, and 1x MDS (metadata server)
  • Each Ceph daemon runs on one dual-core Arm A72 microserver
  • Operating System: CentOS 7
  • Ceph software: Nautilus 14.2.9 Arm64
  • Network: 4x 10Gb network per Mars 400 

Veeam Backup & Replication 10, Version: 10.0.1.4854

Veeam Backup Server

  • CPU: Intel Xeon E5-2630 2.3GHz DUAL
  • DRAM: 64GB
  • Network: 2x 10Gb sfp+ bonding
  • Disk: 1TB for system, 256GB SATA3 SSD for volume
  • Windows Server 2019

 Veeam Proxy Server

  • collocate with Veeam Backup Server

 Repository Server

  • Virtual Machine

◇  CPU: 4 cores 2.3GHz

◇  DRAM: 8GB

◇  Network: bridge

◇  Disk: 50GB virtual disk

◇  OS: CentOS 7.8.2003

  • Baremetal Server

◇  CPU: Intel Xeon X5650 2.67GHz DUAL

◇  DRAM: 48GB

◇  Network: 2-port 10Gb sfp+ bonding

◇  Disk: 1TB for system

◇  OS: CentOS 7.6.1810

Hyper-V Host

◇  CPU: Intel Xeon E5-2630 2.3GHz DUAL

◇  DRAM: 64GB

◇  Network: 2-port 10Gb sfp+ bonding

◇  Disk: 1TB for system

◇  Windows Server 2019

VMWare Host

◇  CPU: Intel Xeon E5-2630 2.3GHz DUAL

◇  DRAM: 64GB

◇  Network: 2-port 10Gb sfp+ bonding

◇  Disk: 1TB for system

◇  ESXi 6.5

Network: 10GbE switch

Benchmark on Various Setups

To benchmark the backup performance of various backup repositories, we set up tests with different backup repositories and three backup sources. 

Backup sources we use for the tests are a SATA SSD based volume on a server, a Windows VM of Hyper-V, and a CentOS 7 VM and a Windows VM of VMWare.

(1)Backup a volume on a SSD Drive

Table 1. Backup a volume from a server with a SATA SSD.

Disk Size (Data processed)237.9GB
Data Read from the source200.1GB
Data Transfered to Ceph after Deduplication and Compression69.7GB
Deduplication1.3X
Compression2.7X


Table 2.

Backup
Repository
Duration
(sec)
Source
(%)
Proxy
(%)
Network
(%)
Target
(%)
Processing
Rate (MB/s)
Average
Data Write
Rate (MB/s)
Linux VM,
RBD-replica 3
64683338421554110
Linux VM,
CephFS-replica 3
5219725315564137
Linux VM,
RBD, EC
64582348324554111
Linux VM,
CephFS, EC
5369726274564133
Linux Server,
RBD, EC
5269721163561136

Note: The Average Data Write Rates are calculated by Data Transferred divided by Duration. These rates represent the workloads of the Ceph cluster in these backup jobs.

(2)Backup a Windows 10 VM on Hyper-V  on HDD

In this benchmark, we backup a Hyper-V instance that is stored on a SATA hard drive. The processing rates of these jobs reach the upper limit of HDD bandwidth. We can also find the bottleneck is on the source because their loads are busy during 99% of the job duration. Ceph cluster, the target, workload from the Veeam backup jobs is light. Ceph cluster is only busy at 6% to 1% of the working time.

Compared to the previous benchmark, the processing rate of the VM backup is much lower than the SSD backup. This is mainly because the VM data is stored in a hard drive. 

Table 3.

Disk Size (HDD)127GB
Data Read from source37.9GB
Data Transfered to Ceph after
Deduplication and Compression
21.4GB
Deduplication3.3X
Compression1.8X


Table 4. Backup a virtual machine image on SATA3 HDD

Backup
Repository
Duration
(sec)
Source
(%)
Proxy
(%)
Network
(%)
Target
(%)
Processing
Rate (MB/s)
Average
Data Write
Rate (MB/s)
Linux VM,
RBD volume, EC
3639973614560
Linux VM,
CephFS volume,
EC
3779972114258.1
Linux Server,
RBD volume, EC
3759962214058.4

Note: The Average Data Write Rates are calculated by Data Transferred divided by Duration. These rates represent the workloads of the Ceph cluster in these backup jobs.

(3)Backup Virtual Machines on ESXi on HDD

This test backs up a CentOS 7 and a Windows 10 Virtual machines running on a HDD of VMWare ESXi 6.5 host to a repository backed by a Ceph RBD with 4+2 erasure code protection. 

Table 5.

SourceCentOS VMWindows 10 VM
Disk Size (HDD)40GB32GB
Data Read from source1.8GB12.9GB
Data Transfered to Ceph after
Deduplication and Compression
966MB7.7GB
Deduplication22.1X2.5X
Compression1.9X1.7X


Table 6.

Backup
Source
Duration
(sec)
Source
(%)
Proxy
(%)
Network
(%)
Target
(%)
Processing
Rate (MB/s)
Average
Data Write
Rate (MB/s)
CentOS 7122991050888
Windows 102449911519332

Note: The Average Data Write Rates are calculated by Data Transferred divided by Duration. These rates represent the workloads of the Ceph cluster in these backup jobs.

Conclusions

According to the test results, Ceph RBD and CephFS have similar performance. This meets our experience regarding the benchmark of RBD and CephFS performance. Comparing the characteristics of CephFS and RBD, they have their advantages and disadvantages. If you need to deploy multiple repository servers, you have to create an RBD image for each backup repository server as you can only mount Ceph RBD on one host. Compared to CephFS, using RBD is simpler as it does not need the metadata servers. We have to assign the RBD capacity size when created, so you have to resize its capacity when you need more space.

If you use CephFS as the repository, you have to deploy at least one metadata server (MDS) in the Ceph cluster. We also need a standby metadata server for high availability. Compared to the Ceph RBD, you don’t need to give the file system a quota. So, you can treat the CephFS as an unlimited storage pool.

In this use case demonstration, our tests backup only one VM in each backup job. According to the above test reports, we know the average data writing rate is related to the processing rate and data deduplication and compression efficiency. A faster source disk reduces the backup job duration and results in a faster processing rate. Depending on users’ infrastructure, users can deploy several concurrent jobs to back up different objects simultaneously. Ceph storage performs very good at supporting multiple concurrent jobs. 

A 20x HDD OSD Ceph cluster powered by 3x Ambedded Mars 400 can offer up to 700MB/s aggregated writing throughput to the 4+2 erasure code pool. Deploying multiple current backup jobs gets the benefit of reducing the overall backup duration. The maximum performance of a Ceph cluster is almost linearly proportional to the total number of disk drives in the cluster.

In this use case, we didn’t test using S3 object storage as the backup repository. S3 object storage can be used as the capacity tier in the Veeam Scale-Out backup repository and target archive repository for NAS backup. You can easily set up a RADOS gateway and create object storage users easily using the Ambedded UVS manager, the ceph management web GUI.

Download
Use Ceph as the repository for Veeam Backup & Replication white paper
Use Ceph as the repository for Veeam Backup & Replication white paper

How to set up the Ceph RBD block device and the CephFS file system as the backup repository of Veeam for backing up virtual machines and files

Download
Related Products
Mars 400PRO Ceph storage appliance - Ceph Storage Appliance Mars 400 front.
Mars 400PRO Ceph storage appliance
Mars 400PRO

Mars 400 Ceph Appliance is designed to meet high capacity cloud-native data storage needs. It utilizes HDD to benefit from low cost per TB. Mars 400 provides...

Details

Veeam backup and archive to work with Ceph | ARM microserver Ceph storage solutions | Ambedded

Located in Taiwan since 2013, Ambedded Technology Co., LTD. has been a block and object storage solutions provider. Their major data storage management include, Ceph storage technology, ARM server integration, Software-defined storage, Enterprise storage optimization, Ceph appliance cost savings, storage management software and block and object storage solutions. They provider professional Ceph support, scalable storage systems with high storage efficiency in the data center.

Ambedded offers cutting-edge Ceph storage solutions on ARM microservers, tailored for B2B buyers seeking to optimize their enterprise storage systems. Our turnkey Ceph appliances reduce total cost of ownership (TCO) and simplify storage management, supporting block, file system, and object storage in a unified platform. With a commitment to innovation and customer support, Ambedded is your trusted partner for scalable and efficient SUSE Enterprise Storage Appliance solutions. Experience seamless integration and professional support to leverage the full potential of Ceph technology in your business.

Ambedded has been providing customers with scalable storage systems and cost-effective Ceph storage management since 2013, and with both advanced technology and 20 years of experience, Ambedded ensures that each customer's needs are met.