Altran.com

The history and future of Linux filesystems

Publicerad 2010-07-02

Filesystems have always been a very important concept in the world of Linux. One of the first decisions a new user is faced with is how to partition the hard drive, and which filesystem to use. For a very long time the Second Extended File System (ext2) was the de facto standard for most Linux installations, but when the Third Extended Filesystem (ext3) was merged to the mainline kernel in November 2001 this started to change.

The ext3 filesystem, which is now the dominant filesystem, is built on top of ext2. It adds a couple of features to its predecessor; the most noteworthy being journaling. A journaling filesystem records all changes to a journal before actually committing them to disk, thus becoming more crash-resistant. Even though ext3 is the default filesystem in most major Linux distributions it has some major competitors, including JFS, ReiserFS and XFS.

While ext3 is still the most common filesystem, it has been superseded by the Fourth Extended Filesystem (ext4). On 11 October 2008 the ext4 filesystem was marked as stable in the Linux kernel, but has only recently started to gain momentum. It is now presented as an alternative during installation of most major Linux distributions, although ext3 is still the default choice in many cases. Ext4 originally started as a set of extensions to ext3, but was later forked to a separate project to avoid affecting the stability of ext3.

It is possible to upgrade an existing ext3 (or ext2) filesystem to ext4, but doing so has some side effects. Ext4 adds new features which are not compatible with ext3, so if those features are used it will not be possible to easily convert back to ext3. However, since ext4 has a number of advantages over ext3, there should normally be little need to revert the change.

To further complicate the choice of filesystem another major filesystem called Btrfs, short for B-tree File System, is in development. Btrfs implements an impressive array of features such as snapshots, compression and online defragmentation and is thought to replace both ext3 and ext4 as the standard Linux filesystem. As of June 2010 it is, however, not ready for production use.

What about embedded systems?

All of these filesystems are primarily designed for normal hard drives, and have limited uses in the embedded world. The reason for this is that most embedded systems still use Memory Technology Devices (MTD), such as raw NAND or NOR flash chips, for storage. An MTD is different from a hard drive in several ways. The writable unit of a hard drive is the sector, which is small (typically 512 or 1024 bytes) and can be written directly. An MTD, on the other hand, has its data grouped in larger erase blocks (typically 128KiB), which need to be erased before writing. This means that the filesystems above cannot be used on MTD devices directly, since they do not make use of the erase operation.

Designing a filesystem for an MTD is complicated even more by the fact that these erase blocks only support a limited number of erases before becoming unusable. The filesystem must keep track of which blocks are bad and also distribute data writes evenly across the entire MTD, a technique called wear leveling, to increase the life length of the device.

There are currently two dominant filesystems in use for MTD's: JFFS2 and YAFFS2. YAFFS2 is often considered the superior alternative but it has one major drawback; it is currently not included in the mainline kernel sources, and is not expected to ever be. JFFS2 also has support for a wider range of devices.

LogFS is a new filesystem for MTD's which is thought to replace JFFS2. It was included in the recent 2.6.34 kernel release, but is still considered experimental. The main focus of the LogFS project is to improve the mount times and lower RAM usage compared to JFFS2. LogFS also works on regular block devices, such as hard drives.

Another project aiming to replace JFFS2 is a project called UBIFS. UBIFS is different from other MTD filesystems in that it doesn't run directly on top of the MTD subsystem. Instead, it uses an additional layer called Unsorted Block Images (UBI) which aims to simplify the handling of MTD's by making them look and behave more like a normal block device. It is not, however, trying to look exactly like a block device. UBIFS is included in the Linux kernel from version 2.6.27.

Note that MMC devices, SD cards and similar are normal block devices and should be handled as a hard drive, but they do suffer from the same reliability problem that MTD's suffer from. After a number of write cycles, they can become unreliable and it is therefore often considered a good practice to choose a filesystem with a low write intensity. However, this problem is very dependent on how the devices are constructed and how they handle wear leveling internally, and hence it is widely debated if it really is a problem.

Claes Gustafsson, Embedded Systems Consultant, Altran Technologies Sweden AB

Further reading:

http://en.wikipedia.org/wiki/Ext2

http://en.wikipedia.org/wiki/Ext3

http://en.wikipedia.org/wiki/Ext4

http://en.wikipedia.org/wiki/Btrfs

http://www.ibm.com/developerworks/linux/library/l-ext4/

http://linux-mtd.infradead.org/~dwmw2/jffs2.pdf

http://linux-mtd.infradead.org/doc/ubifs.html

http://www.yaffs.net/yaffs-2-specification-and-development-notes

http://www.storagesearch.com/ssdmyths-endurance.html