BTRFS: The Linux B-tree Filesystem

Copyright © (2013) by Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distrubuted for profit or commericial advantage. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

BTRFS is a Linux filesystem, headed towards mainline default status. It is based on copy-on-write, allowing for efficient snapshots and clones. It uses b-trees as its main on-disk data-structure. The design goal is to work well for many use cases and workloads. To this end, much effort has been directed to maintaining even performance as the filesystem ages, rather than trying to support a particular narrow benchmark use case.

A Linux filesystem is installed on smartphones as well as enterprise servers. This entails challenges on many different fronts.

  • Scalability: The filesystem must scale in many dimensions: disk space, memory, and CPUs.
  • Data integrity: Losing data is not an option, and much effort is expended to safeguard the content. This includes checksums, metadata duplication, and RAID support built into the filesystem.
  • Disk diversity: the system should work well with SSDs and harddisks. It is also expected to be able to use an array of different sized disks; posing challenges to the RAID and striping mechanisms.

This paper describes the core ideas, data-structures, and algorithms of this filesystem. It sheds light on the challenges posed by defragmentation in the presence of snapshots, and the tradeoffs required to maintain even performance in the face of a wide spectrum of workloads.

By: Ohad Rodeh, Josef Bacik, Chris Mason

Published in: ACM Transactions on Storage, volume 9, (no 3), pages 10.1145/2501620.2501623 in 2013

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rj10501.pdf

Questions about this service can be mailed to reports@us.ibm.com .