Digging into Linux FileSystems
UNIX Filesystem story goes back to the first implementation of the operating system. Since then, many different implementations and improvements were made. Due to that FileSystems became quite composed but also rock solid piece of software. Currently, most people treat FS as a black box or an indivisible part of OS. In this article, I will present basics structures and differences between Linux filesystems. This article is an extension to the talk that I gave at DLUG (Dublin Linux Users Group) meetup as a 15-minute speech. Here I will try to summarize everything as a short article. If you are interested in slides from meetup you can find them here
Why you should care about Filesystem?
A great resource that helped me to summarize my knowledge about Linux kernel (and about OS kernel in general) was Linux Kernel Map. To do not focus too much on this excellent reference by itself, we can just take a look of the kernel functionalities (the X-axis): HI (human Interfaces), System, Processing, Memory, Storage and Networking. These “functionalities” are pillars of the kernel or main responsibilities if you will. So essentially each kernel is responsible for serving Human Interface devices (to allow us to communicate with the hardware), managing system resources (from software interfaces way down to I/O), do processing (making use of the CPU) and managing: Memory, Storage and Networking.
Because in POSIX everything is a file and also after early days each serious OS is able to serve multiple FileSystems (including pseudo filesystem, that kind of FS that is not backed up by any permanent storage) things are going a little bit more interesting, or complicated if you will…
How we can describe Filesystems and what are differences between them?
If we will go to Linux kernel source code and open filesystem (‘fs’) folder we can see a lot of subdirectories, each of them corresponds to some filesystem implementation.
At this moment someone can aks an obvious question: “filesystem is the storage component in the kernel but why do we need so many of them and what is the difference between them?”
As I pointed out the first difference between is that we can divide them based on the fact that they are permanent (backed by permanent storage like HDD/SDD disk) or temporary (lost content of the file after every reboot). But still we have many permanent FS, so this criterium is not the only one. Next important class of Filesystems can be their design: there pure FS that does not come with own volume manager and more complex that embedded some of the features like device management inside them. But what that exactly mean?
Classical architecture:
Let’s briefly take a look on the classic design of storage stack. Starting from the top we have layer responsible for handling system calls from userspace (application space), no surprise here that is the way how UNIX based Operating Systems are handling any task from a user. Nest we have VFS layer, this layer is abstraction around all possible Filesystems which is independent of any particular implementation. Then we have a particular Filesystem implementation specific code and below it Volume Manager which is a layer that manages target physical devices. At the bottom of the stack, we do have a layer of drivers for devices that we want to use as a storage.
More modern approach, rule devices from FS:
Now we will review, a historically newer approach which came with Sun implementation of ZFS. The idea was to merge the Filesystem layer with Volume Manager. Thanks to this approach some features like snapshots, encryption or compression are also implemented in the common code base which makes them faster, better integrated and more reliable. Not only ZFS uses this approach also younger BTRFS uses such combined architecture.
What all of that mean for the system user?
We covered some theory, necessary to understand the basics concepts. Right now I want to show some difference in managing Classical double components (FS + VM) filesystem like ext4 or xfs and monolithic FS like ZFS or BTRFS. Here I will get my favourite two filesystems: xfs and ZFS and I will show the difference in their management on the Ubuntu machine, also I will point out some interesting features
First step: setup Filesystem.
As a First example, we will setup Filesystem using existing disk device. To do this on filesystem without embedded volume manager we need to separately setup logical volume and then create FS. Of course, you can just create Filesystem straight away on the device itself or physical partition, but this is not most likely the way to follow if you take care of flexibility and future scalability.
With ZFS because Volume manager is inside FS we do not need to think about devices itself but we operate with pools which are abstraction around physical hardware.
Next step: managing Snapshoots.
Snapshots are important Filesystem feature. By them, we can create a point in time image and hold it for future in case of failure, or just for reference. Implementation of this feature can be Filesystem dependent or handle externally by Volume Manager. XFS does not directly implement this feature inside the code base, while ZFS handle it internally. Let’s take a look at how to create a simple snapshot on ZFS, and then we will compare it with LVM snap shooting under xfs.
Here we see something specific to heterogeneous solutions: we should think about space for the snapshot at the beginning of the process, when we created a filesystem, now we run out of space because our FS occupied the whole device. To fix that we need to reduce space for FS, such an issue does not exist with ZFS as VM is integrated and handle space management internally
Extra: transparent compression:
One really useful feature that came with integrated volume manager file systems is transparent compression. Thanks to compression we can reduce the size of some of the files that we store in the FS. I don’t want to go to deep details of compression itself but files that are likely to benefit from this feature is especially text files, source code, some of the binary files or bitmaps and things that aren’t best fit for compression are: Images in compressed formats like jpg (because they already became compressed) or encrypted files.
Lets see how easly is to setup compression (we will use lz4 algorithm).
So as we can see we were able to save 10 times of our storage medium! It looks incredible but in reality, we used the trick by storing de facto same content copied many times. In reality, the ratio will depend on the files that you store. In example my FreeNAS based home storage server where I store photos compression ratio is 1.01
Future Reading:
In this article, I touch basics about Filesystems and Volume Manager. This material is the tip of the iceberg for everyone who wants to deepen his knowledge of FS. Either from the implementation or administration point of view. As an extra exercise curious reader: can try to find what is the COW design of FS which Filesystems are COW and aren’t. Then try to figure out how the design effect features that we discussed.
Additional resources:
Linux kernel Map: http://www.makelinux.net/kernel_map/ Test ASCI Book “Alice in Wonderland”: http://www.gutenberg.org/files/11/11-0.txt