Ceph开发每周谈 Vol 55|Hammer RBD Clone 问题

2016年12月 · 麦子迈

这是Ceph开发每周谈的第五十五篇文章,记录从16年12月19号到16年12月25号的社区开发情况。笔者从前年开始做Ceph的技术模块分析到今年中告一段落,想必有挺多人期待下一篇Ceph技术分析。考虑到Ceph的发展已经从前年的一穷二白到现在的如火如荼,但对于社区的方向和实况仍有所脱节,笔者考虑开始Ceph开发每周谈这个系列。每篇文章都会综述上周技术更新,围绕几个热点进行深度解析,如果正好有产业届新闻的话就进行解读,最后有读者反馈问题的话并且值得一聊的话,就附上答疑部分。

一句话消息

“The XFS barrier and nobarrier mount options don’t actually do anything in current kernels; they have been deprecated…” – Sage Weil

第一届 Cephalocon 会在 23-25 Aug 2017, Boston, MA 举行

Proxmox Virtual Environment 4.4 Linux OS 支持新的 Ceph Dashboard

Ceph Hammer/Jewel Clone Bug

上周用户汇报了一个比较严重的 RBD Bug,这是第二次用户确认这个问题,在上个月实际上主线已经修复了,考虑到大量的 Hammer 用户,在这里列出重现步骤方便理解:

* Created new cluster (tested in hammer 0.94.6 and jewel 10.2.3)
* Created two pools: test and rbd
* Created base image in pool test, created snapshot, protected it and created clone of this snapshot in pool rbd:
# rbd -p test create –size 10 –image-format 2 base
# rbd -p test snap create base@base
# rbd -p test snap protect base@base
# rbd clone test/base@base rbd/destination
* Created new user called “test” with rwx permissions to rbd pool only:
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=rbd
* Using this newly creted user I removed the cloned image in rbd pool, had errors but finally removed the image:
# rbd –id test -p rbd rm destination
2016-12-21 11:50:03.758221 7f32b7459700 -1 librbd::image::OpenRequest: failed to retreive name: (1) Operation not permitted
2016-12-21 11:50:03.758288 7f32b6c58700 -1 librbd::image::RefreshParentRequest: failed to open parent image: (1) Operation not permitted
2016-12-21 11:50:03.758312 7f32b6c58700 -1 librbd::image::RefreshRequest: failed to refresh parent image: (1) Operation not permitted
2016-12-21 11:50:03.758333 7f32b6c58700 -1 librbd::image::OpenRequest: failed to refresh image: (1) Operation not permitted
2016-12-21 11:50:03.759366 7f32b6c58700 -1 librbd::ImageState: failed to open image: (1) Operation not permitted
Removing image: 100% complete…done.

At this point there’s no cloned image but the original snapshot still has reference to it:

# rbd -p test snap unprotect base@base
2016-12-21 11:53:47.359060 7fee037fe700 -1 librbd::SnapshotUnprotectRequest: cannot unprotect: at least 1 child(ren) [29b0238e1f29] in pool ‘rbd’
2016-12-21 11:53:47.359678 7fee037fe700 -1 librbd::SnapshotUnprotectRequest: encountered error: (16) Device or resource busy
2016-12-21 11:53:47.359691 7fee037fe700 -1 librbd::SnapshotUnprotectRequest: 0x7fee39ae9340 should_complete_error: ret_val=-16
2016-12-21 11:53:47.360627 7fee037fe700 -1 librbd::SnapshotUnprotectRequest: 0x7fee39ae9340 should_complete_error: ret_val=-16
rbd: unprotecting snap failed: (16) Device or resource busy

# rbd -p test children base@base
rbd: listing children failed: (2) No such file or directory2016-12-21
11:53:08.716987 7ff2b2eaad80 -1 librbd: Error looking up name for image
id 29b0238e1f29 in pool rbd

主要原因是 RBD 在处理 Parent Image 的数据的时候存在逻辑问题,导致不能找到正确的 Parent 对象进行操作。这个重现步骤只是通过不同用户去展现,实际上同一个用户如果使用 Parent Clone,然后继续写操作,也会造成不一致,而不是这个现象中删除卷的问题。

相关的 Fix: https://github.com/ceph/ceph/pull/12446/files

Ceph 一周一问题

Q: 当 PG 出现 inconsistent 的问题时候,ceph osd pg repair xx 不能够修复?

A: 在 Hammer 版本的时候,有些情况 ceph osd repair 命令并不能修复,导致 repair 过后仍然会显示 inconsistent 状态。这主要是不同对象间某些属性的不一致 repair 并没有完全覆盖解决,这时候要通过手动修复方式,可以 Google 得到手动修复方案。在 Jewel 以上,基本没有 repair 不能修复的问题了。另外,在 Jewel 可以通过  # rados list-inconsistent-obj ${PG ID} 来显示所有 inconsistent 对象来方便了解。