Skip to content

Commit

Permalink
btrfs-progs: docs: add more chapters (part 2)
Browse files Browse the repository at this point in the history
The feature pages share the contents with the manual page section 5 so
put the contents to separate files. Progress: 2/3.

Signed-off-by: David Sterba <[email protected]>
  • Loading branch information
kdave committed Dec 17, 2021
1 parent b871bf4 commit c6be848
Show file tree
Hide file tree
Showing 19 changed files with 772 additions and 332 deletions.
6 changes: 5 additions & 1 deletion Documentation/Auto-repair.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
Auto-repair on read
===================

...
Data or metadata that are found to be damaged (eg. because the checksum does
not match) at the time they're read from the device can be salvaged in case the
filesystem has another valid copy when using block group profile with redundancy
(DUP, RAID1, RAID5/6). The correct data are returned to the user application
and the damaged copy is replaced by it.
2 changes: 1 addition & 1 deletion Documentation/Convert.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Convert
=======

...
.. include:: ch-convert-intro.rst
42 changes: 41 additions & 1 deletion Documentation/Deduplication.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,44 @@
Deduplication
=============

...
Going by the definition in the context of filesystems, it's a process of
looking up identical data blocks tracked separately and creating a shared
logical link while removing one of the copies of the data blocks. This leads to
data space savings while it increases metadata consumption.

There are two main deduplication types:

* **in-band** *(sometimes also called on-line)* -- all newly written data are
considered for deduplication before writing
* **out-of-band** *(sometimes alco called offline)* -- data for deduplication
have to be actively looked for and deduplicated by the user application

Both have their pros and cons. BTRFS implements **only out-of-band** type.

BTRFS provides the basic building blocks for deduplication allowing other tools
to choose the strategy and scope of the deduplication. There are multiple
tools that take different approaches to deduplication, offer additional
features or make trade-offs. The following table lists tools that are known to
be up-to-date, maintained and widely used.

.. list-table::
:header-rows: 1

* - Name
- File based
- Block based
- Incremental
* - `BEES <https://github.com/Zygo/bees>`_
- No
- Yes
- Yes
* - `duperemove <https://github.com/markfasheh/duperemove>`_
- Yes
- No
- Yes

Legend:

- *File based*: the tool takes a list of files and deduplicates blocks only from that set
- *Block based*: the tool enumerates blocks and looks for duplicates
- *Incremental*: repeated runs of the tool utilizes information gathered from previous runs
20 changes: 19 additions & 1 deletion Documentation/Defragmentation.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,22 @@
Defragmentation
===============

...
Defragmentation of files is supposed to make the layout of the file extents to
be more linear or at least coalesce the file extents into larger ones that can
be stored on the device more efficiently. The reason there's a need for
defragmentation stems from the COW design that BTRFS is built on and is
inherent. The fragmentation is caused by rewrites of the same file data
in-place, that has to be handled by creating a new copy that may lie on a
distant location on the physical device. Fragmentation is the worst problem on
rotational hard disks due to the delay caused by moving the drive heads to the
distant location. With the modern seek-less devices it's not a problem though
it may still make sense because of reduced size of the metadata that's needed
to track the scattered extents.

File data that are in use can be safely defragmented because the whole process
happens inside the page cache, that is the central point caching the file data
and takes care of synchronization. Once a filesystem sync or flush is started
(either manually or automatically) all the dirty data get written to the
devices. This however reduces the chances to find optimal layout as the writes
happen together with other data and the result depens on the remaining free
space layout and fragmentation.
16 changes: 14 additions & 2 deletions Documentation/Flexibility.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
Flexibility
===========

* dynamic inode creation (no preallocated space)
The underlying design of BTRFS data structures allows a lot of flexibility and
making changes after filesystem creation, like resizing, adding/removing space
or enabling some features on-the-fly.

* block group profile change on-the-fly
* **dynamic inode creation** -- there's no fixed space or tables for tracking
inodes so the number of inodes that can be created is bounded by the metadata
space and it's utilization

* **block group profile change on-the-fly** -- the block group profiles can be
changed on a mounted filesystem by running the balance operation and
specifying the conversion filters

* **resize** -- the space occupied by the filesystem on each device can be
resized up (grow) or down (shrink) as long as the amount of data can be still
contained on the device
2 changes: 1 addition & 1 deletion Documentation/Qgroups.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Quota groups
============

...
.. include:: ch-quota-intro.rst
27 changes: 26 additions & 1 deletion Documentation/Reflink.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,29 @@
Reflink
=======

...
Reflink is a type of shallow copy of file data that shares the blocks but
otherwise the files are independent and any change to the file will not affect
the other. This builds on the underlying COW mechanism. A reflink will
effectively create only a separate metadata pointing to the shared blocks which
is typically much faster than a deep copy of all blocks.

The reflink is typically meant for whole files but a partial file range can be
also copied, though there are no ready-made tools for that.

.. code-block:: shell
cp --reflink=always source target
There are some constaints:

- cross-filesystem reflink is not possible, there's nothing in common between
so the block sharing can't work
- reflink crossing two mount points of the same filesystem does not work due
to an artificial limitation in VFS (this may change in the future)
- reflink requires source and target file that have the same status regarding
NOCOW and checksums, for example if the source file is NOCOW (once created
with the chattr +C attribute) then the above command won't work unless the
target file is pre-created with the +C attribute as well, or the NOCOW
attribute is inherited from the parent directory (chattr +C on the directory)
or if the whole filesystem is mounted with *-o nodatacow* that would create
the NOCOW files by default
10 changes: 9 additions & 1 deletion Documentation/Resize.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@
Resize
======

...
A BTRFS mounted filesystem can be resized after creation, grown or shrunk. On a
multi device filesystem the space occupied on each device can be resized
independently. Data tha reside in the are that would be out of the new size are
relocated to the remaining space below the limit, so this constrains the
minimum size to which a filesystem can be shrunk.

Growing a filesystem is quick as it only needs to take note of the available
space, while shrinking a filesystem needs to relocate potentially lots of data
and this is IO intense. It is possible to shrink a filesystem in smaller steps.
2 changes: 1 addition & 1 deletion Documentation/Scrub.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Scrub
=====

...
.. include:: ch-scrub-intro.rst
25 changes: 22 additions & 3 deletions Documentation/Send-receive.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,23 @@
Balance
=======
Send/receive
============

...
Send and receive are complementary features that allow to transfer data from
one filesystem to another in a streamable format. The send part traverses a
given read-only subvolume and either creates a full stream representation of
its data and metadata (*full mode*), or given a set of subvolumes for reference
it generates a difference relative to that set (*incremental mode*).

Receive on the other hand takes the stream and reconstructs a subvolume with
files and directories equivalent to the filesystem that was used to produce the
stream. The result is not exactly 1:1, eg. inode numbers can be different and
other unique identifiers can be different (like the subvolume UUIDs). The full
mode starts with an empty subvolume, creates all the files and then turns the
subvolume to read-only. At this point it could be used as a starting point for a
future incremental send stream, provided it would be generated from the same
source subvolume on the other filesystem.

The stream is a sequence of encoded commands that change eg. file metadata
(owner, permissions, extended attributes), data extents (create, clone,
truncate), whole file operations (rename, delete). The stream can be sent over
network, piped directly to the receive command or saved to a file. Each command
in the stream is protected by a CRC32C checksum.
97 changes: 1 addition & 96 deletions Documentation/btrfs-convert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,102 +9,7 @@ SYNOPSIS
DESCRIPTION
-----------

**btrfs-convert** is used to convert existing source filesystem image to a btrfs
filesystem in-place. The original filesystem image is accessible in subvolume
named like *ext2_saved* as file *image*.

Supported filesystems:

* ext2, ext3, ext4 -- original feature, always built in

* reiserfs -- since version 4.13, optionally built, requires libreiserfscore 3.6.27

* ntfs -- external tool https://github.com/maharmstone/ntfs2btrfs

The list of supported source filesystem by a given binary is listed at the end
of help (option *--help*).

.. warning::
If you are going to perform rollback to the original filesystem, you
should not execute **btrfs balance** command on the converted filesystem. This
will change the extent layout and make **btrfs-convert** unable to rollback.

The conversion utilizes free space of the original filesystem. The exact
estimate of the required space cannot be foretold. The final btrfs metadata
might occupy several gigabytes on a hundreds-gigabyte filesystem.

If the ability to rollback is no longer important, the it is recommended to
perform a few more steps to transition the btrfs filesystem to a more compact
layout. This is because the conversion inherits the original data blocks'
fragmentation, and also because the metadata blocks are bound to the original
free space layout.

Due to different constraints, it is only possible to convert filesystems that
have a supported data block size (ie. the same that would be valid for
**mkfs.btrfs**). This is typically the system page size (4KiB on x86_64
machines).

**BEFORE YOU START**

The source filesystem must be clean, eg. no journal to replay or no repairs
needed. The respective **fsck** utility must be run on the source filesytem prior
to conversion. Please refer to the manual pages in case you encounter problems.

For ext2/3/4:

.. code-block:: bash
# e2fsck -fvy /dev/sdx
For reiserfs:

.. code-block:: bash
# reiserfsck -fy /dev/sdx
Skipping that step could lead to incorrect results on the target filesystem,
but it may work.

**REMOVE THE ORIGINAL FILESYSTEM METADATA**

By removing the subvolume named like *ext2_saved* or *reiserfs_saved*, all
metadata of the original filesystem will be removed:

.. code-block:: bash
# btrfs subvolume delete /mnt/ext2_saved
At this point it is not possible to do a rollback. The filesystem is usable but
may be impacted by the fragmentation inherited from the original filesystem.

**MAKE FILE DATA MORE CONTIGUOUS**

An optional but recommended step is to run defragmentation on the entire
filesystem. This will attempt to make file extents more contiguous.

.. code-block:: bash
# btrfs filesystem defrag -v -r -f -t 32M /mnt/btrfs
Verbose recursive defragmentation (*-v*, *-r*), flush data per-file (*-f*) with
target extent size 32MiB (*-t*).

**ATTEMPT TO MAKE BTRFS METADATA MORE COMPACT**

Optional but recommended step.

The metadata block groups after conversion may be smaller than the default size
(256MiB or 1GiB). Running a balance will attempt to merge the block groups.
This depends on the free space layout (and fragmentation) and may fail due to
lack of enough work space. This is a soft error leaving the filesystem usable
but the block group layout may remain unchanged.

Note that balance operation takes a lot of time, please see also
``btrfs-balance(8)``.

.. code-block:: bash
# btrfs balance start -m /mnt/btrfs
.. include:: ch-convert-intro.rst

OPTIONS
-------
Expand Down
Loading

0 comments on commit c6be848

Please sign in to comment.