-
Notifications
You must be signed in to change notification settings - Fork 250
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
btrfs-progs: docs: add more chapters (part 3)
All main pages have some content and many typos have been fixed. Signed-off-by: David Sterba <[email protected]>
- Loading branch information
Showing
26 changed files
with
561 additions
and
417 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,9 @@ | ||
Balance | ||
======= | ||
|
||
... | ||
.. include:: ch-balance-intro.rst | ||
|
||
Filters | ||
------- | ||
|
||
.. include:: ch-balance-filters.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,44 @@ | ||
Common Linux features | ||
===================== | ||
|
||
Anything that's standard and also supported | ||
|
||
- statx | ||
|
||
- fallocate modes | ||
|
||
- birth/origin inode time | ||
|
||
- filesystem label | ||
|
||
- xattr, acl | ||
|
||
- FIEMAP | ||
|
||
- O_TMPFILE | ||
The Linux operating system implements a POSIX standard interfaces and API with | ||
additional interfaces. Many of them have become common in other filesystems. The | ||
ones listed below have been added relatively recently and are considered | ||
interesting for users: | ||
|
||
birth/origin inode time | ||
a timestamp associated with an inode of when it was created, cannot be | ||
changed and requires the *statx* syscall to be read | ||
|
||
statx | ||
an extended version of the *stat* syscall that provides extensible | ||
interface to read more information that are not available in original | ||
*stat* | ||
|
||
fallocate modes | ||
the *fallocate* syscall allows to manipulate file extents like punching | ||
holes, preallocation or zeroing a range | ||
|
||
FIEMAP | ||
an ioctl that enumerates file extents, related tool is ``filefrag`` | ||
|
||
filesystem label | ||
another filesystem identification, could be used for mount or for better | ||
recognition, can be set or read by an ioctl or by command ``btrfs | ||
filesystem label`` | ||
|
||
O_TMPFILE | ||
mode of open() syscall that creates a file with no associated directory | ||
entry, which makes it impossible to be seen by other processes and is | ||
thus safe to be used as a temporary file | ||
(https://lwn.net/Articles/619146/) | ||
|
||
xattr, acl | ||
extended attributes (xattr) is a list of *key=value* pairs associated | ||
with a file, usually storing additional metadata related to security, | ||
access control list in particular (ACL) or properties (``btrfs | ||
property``) | ||
|
||
- XFLAGS, fileattr | ||
|
||
- cross-rename |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,21 @@ | ||
Custom ioctls | ||
============= | ||
|
||
Anything that's not doing the other features and stands on it's own | ||
Filesystems are usually extended by custom ioctls beyond the standard system | ||
call interface to let user applications access the advanced features. They're | ||
low level and the following list gives only an overview of the capabilities or | ||
a command if available: | ||
|
||
- reverse lookup, from file offset to inode | ||
- reverse lookup, from file offset to inode, ``btrfs inspect-internal | ||
logical-resolve`` | ||
|
||
- resolve inode number -> name | ||
- resolve inode number to list of name, ``btrfs inspect-internal inode-resolve`` | ||
|
||
- file offset -> all inodes that share it | ||
- tree search, given a key range and tree id, lookup and return all b-tree items | ||
found in that range, basically all metadata at your hand but you need to know | ||
what to do with them | ||
|
||
- tree search, all the metadata at your hand (if you know what to do with them) | ||
- informative, about devices, space allocation or the whole filesystem, many of | ||
which is also exported in ``/sys/fs/btrfs`` | ||
|
||
- informative (device, fs, space) | ||
|
||
- query/set a subset of features on a mounted fs | ||
- query/set a subset of features on a mounted filesystem |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
Subvolumes | ||
========== | ||
|
||
... | ||
.. include:: ch-subvolume-intro.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,53 @@ | ||
Tree checker | ||
============ | ||
|
||
Metadata blocks that have been just read from devices or are just about to be | ||
written are verified and sanity checked by so called **tree checker**. The | ||
b-tree nodes contain several items describing the filesystem structure and to | ||
some degree can be verified for consistency or validity. This is additional | ||
check to the checksums that only verify the overall block status while the tree | ||
checker tries to validate and cross reference the logical structure. This takes | ||
a slight performance hit but is comparable to calculating the checksum and has | ||
no noticeable impact while it does catch all sorts of errors. | ||
|
||
There are two occasions when the checks are done: | ||
|
||
Pre-write checks | ||
---------------- | ||
|
||
When metadata blocks are in memory about to be written to the permanent storage, | ||
the checks are performed, before the checksums are calculated. This can catch | ||
random corruptions of the blocks (or pages) either caused by bugs or by other | ||
parts of the system or hardware errors (namely faulty RAM). | ||
|
||
Once a block does not pass the checks, the filesystem refuses to write more data | ||
and turns itself to read-only mode to prevent further damage. At this point some | ||
the recent metadata updates are held *only* in memory so it's best to not panic | ||
and try to remember what files could be affected and copy them elsewhere. Once | ||
the filesystem gets unmounted, the most recent changes are unfortunately lost. | ||
The filesystem that is stored on the device is still consistent and should mount | ||
fine. | ||
|
||
Post-read checks | ||
---------------- | ||
|
||
Metadata blocks get verified right after they're read from devices and the | ||
checksum is found to be valid. This protects against changes to the metadata | ||
that could possibly also update the checksum, less likely to happen accidentally | ||
but rather due to intentional corruption or fuzzing. | ||
|
||
The checks | ||
---------- | ||
|
||
As implemented right now, the metadata consistency is limited to one b-tree node | ||
and what items are stored there, ie. there's no extensive or broad check done | ||
eg. against other data structures in other b-tree nodes. This still provides | ||
enough opportunities to verify consistency of individual items, besides verifying | ||
general validity of the items like the length or offset. The b-tree items are | ||
also coupled with a key so proper key ordering is also part of the check and can | ||
reveal random bitflips in the sequence (this has been the most successful | ||
detector of faulty RAM). | ||
|
||
The capabilities of tree checker have been improved over time and it's possible | ||
that a filesystem created on an older kernel may trigger warnings or fail some | ||
checks on a new one. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,41 @@ | ||
Trim | ||
==== | ||
Trim/discard | ||
============ | ||
|
||
... | ||
Trim or discard is an operation on a storage device based on flash technology | ||
(SSD, NVMe or similar), a thin-provisioned device or could be emulated on top | ||
of other block device types. On real hardware, there's a different lifetime | ||
span of the memory cells and the driver firmware usually tries to optimize for | ||
that. The trim operation issued by user provides hints about what data are | ||
unused and allow to reclaim the memory cells. On thin-provisioned or emulated | ||
this is could simply free the space. | ||
|
||
There are three main uses of trim that BTRFS supports: | ||
|
||
synchronous | ||
enabled by mounting filesystem with ``-o discard`` or ``-o | ||
discard=sync``, the trim is done right after the file extents get freed, | ||
this however could have severe performance hit and is not recommended | ||
as the ranges to be trimmed could be too fragmented | ||
|
||
asynchronous | ||
enabled by mounting filesystem with ``-o discard=async``, which is an | ||
improved version of the synchronous trim where the freed file extents | ||
are first tracked in memory and after a period or enough ranges accumulate | ||
the trim is started, expecting the ranges to be much larger and | ||
allowing to throttle the number of IO requests which does not interfere | ||
with the rest of the filesystem activity | ||
|
||
manually by fstrim | ||
the tool ``fstrim`` starts a trim operation on the whole filesystem, no | ||
mount options need to be specified, so it's up to the filesystem to | ||
traverse the free space and start the trim, this is suitable for running | ||
it as periodic service | ||
|
||
The trim is considered only a hint to the device, it could ignore it completely, | ||
start it only on ranges meeting some criteria, or decide not to do it because of | ||
other factors affecting the memory cells. The device itself could internally | ||
relocate the data, however this leads to unexpected performance drop. Running | ||
trim periodically could prevent that too. | ||
|
||
When a filesystem is created by ``mkfs.btrfs`` and is capable of trim, then it's | ||
by default performed on all devices. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
Volume management | ||
================= | ||
|
||
... | ||
.. include:: ch-volume-management-intro.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
Zoned mode | ||
========== | ||
|
||
... | ||
.. include:: ch-zoned-intro.rst |
Oops, something went wrong.