Inside the Linux kernel 3.0
25th Oct 2011 | 09:00
Linux has gone from 10,000 lines of code to 15 million
Inside the Linux Kernel 3.0
The new Linux 3.0 kernel in all its shiny awesomeness will be finding its way into your favourite distro any day now. So what does this major milestone release contain to justify the jump in version number?
The short answer is nothing really - this is just 2.6.40 renamed. Linus Torvalds felt the numbers were getting too high, that the 2.6.* notation was getting out of hand, and that Linux had now entered its third decade, so a new number was called for.
Torvalds said: "We are very much not doing a KDE 4 or a Gnome 3 here, no breakage, no special scary new features, nothing at all like that. We've been doing time-based releases for many years now, this is in no way about features. If you want an excuse for the renumbering, you really should look at the time-based one (20 years) instead."
The old numbering system used the first digit for the major release - which has been 2 for what seems like forever - and the second for the minor release, using odd numbers to show development versions and even numbers for stable releases. While 2.4 was the main kernel in use, 2.5 was the development version of 2.6. The third digit was the minor version. Then they added a fourth digit for backported patches to the stable releases and it all got messy, especially as the first two numbers stopped changing.
Now 3 is the major version, the second number is the minor release and patches for stable releases get the third digit when needed. Currently we are on 3.0 - in Torvalds's own words "gone are the 2.6.<bignum> days".
The main reason for the escalating numbers, apart from the fact that the 2.6 line is almost eight years old, is a change in release philosophy. Releases used to be based on features; when the new features were stable and tested, a new kernel would be released - the "when it's ready" approach.
For example, 1.0 saw the introduction of networking (it's hard to imagine Linux without networking at its core nowadays), 1.2 saw support for multiple platforms - adding Alpha and m68k to the original i386, while 2.0 added support for SMP - multiple processors.
The last one is an example of how the landscape has changed yet Linux has remained relevant; SMP was added for the 'big iron' multiple processor servers yet now we have dual core CPUs in phones. Waiting for features to be complete could lead to long gaps between releases.
Nowadays the kernel undergoes steady evolution and has a new release every eight to ten weeks - more of a "whatever is ready" approach. If a new feature or driver isn't ready in time for the deadline, it's left until the next release.
If nothing major has changed since 2.6.39, what have the kernel developers been doing? With a new release every few months, the changes between adjacent releases are rarely outstanding on their own, but the cumulative effect is significant, so what has changed over the past year or two?
A list of the new hardware supported by the kernel would fill the magazine. With plenty of encouragement and guidance from the likes of Greg Kroah-Hartman, more hardware manufacturers than ever are working with kernel developers to ensure good support.
Drivers and support for hardware are being added almost as fast as the manufacturers can get it on the shelves. From wireless cards to webcams, and even the Microsoft Kinect in Linux 3.0, there is a huge range of hardware supported out of the box nowadays.
It's not just USB or PCI-connected hardware either; the ability to compile a slimmed-down kernel with only the features and drivers you need is what makes Linux ideal for embedded devices. From mobile phones and network routers to car entertainment systems, they all have their own hardware and the kernel supports it.
In November 2010, the internet lit up with discussion and counter-argument when news broke of "the 200 Line Linux Kernel Patch That Does Wonders". Designed to improve desktop responsiveness, this patch separates tasks launched from different terminals or environments into groups and ensures that no single group gets to monopolise the CPU.
In real terms, this means that an intensive task running in the background, such as software compilation (naturally, Linus tested it with a kernel compile) or video transcoding, will not bring your browser to its knees. It means the days of heavy system load manifesting in jerky windows or text scrolling are largely behind us.
What makes this so interesting (apart from the fact that a marked change to the desktop was made in a couple of hundred lines of code) is that it requires nothing from the user as long as you have a kernel with this code enabled, which means that any distro-supplied kernel from 2.6.38 onwards. It also makes a difference on any type of hardware, from an Atom-powered netbook to a six-cored monster.
The kernel has certainly grown over the years. One of the methods of measuring the size of a program's code base is the SLOC, or source lines of code, the amount of code that has been written. It is hardly surprising that this has grown with each release, although you may be staggered by the overall increase:
Version 0.01 had 10,239 lines of code
1.0.0 had 176,250 lines of code
2.2.0 had 1,800,847 lines of code
2.4.0 had 3,377,902 lines of code
2.6.0 had 5,929,913 lines of code
3.0 has 14,647,033 lines of code
Yes, you read that right, Linux has grown from 10,000 to 15 million lines of code. The code base has more than doubled since the introduction of the first 2.6 kernel back in December 2003.
These line counts are the sum total of all files in the source tarball, including the documentation. Considering that most programmers find writing documentation far more of a chore than writing code, this seems a reasonable measurement.
Before anyone starts crying "bloat!", remember that Linux is not a true monolithic kernel - drivers for most hardware are provided as loadable modules, which don't even need to be compiled for most systems, let alone installed or loaded.
Much of the growth in the size of the kernel source has come from the increasingly comprehensive hardware support. One of the misconceptions about open source software, fostered by its detractors, is that because it is free to use, it is somehow amateur and of lower quality. An analysis of the code contributed during 2009, which involved some 2.8 million lines and 55,000 major changes, showed that three quarters of the contributions came from developers employed to work on Linux.
It was no surprise that the top contributor was Red Hat (12%), followed by Intel (8%), IBM and Novell (6% each). Despite competing with one another, these companies also recognise the importance of co-operation. Naturally, each company develops areas that are beneficial to its own needs, but we all gain from that.
Microsoft's contribution to Linux
What may be more surprising is that the most prolific individual contributor to 3.0 worked for a rather different company - Microsoft.
This doesn't mean that the company has "seen the light" but that it needed additions to the kernel to enable Linux virtual machines to run on its Windows Server platform, so it was not about improving Linux but being able to sell more of its products. Whatever the reason, though, Microsoft will now find it more difficult to decry open source and GPL software.
And while we're on the subject of Microsoft, kernel 3.0 also supports the Xbox Kinect controller, which is finding great use in all sorts of non-gaming areas.
There are two things that Linux has plenty of: filesystems and software with confusing names. Btrfs adds to both scores, whether you call it 'Better F S', 'Butter F S' or 'B-Tree F S'. This is an advanced filesystem that implements some of the ideas from ReiserFS and some from Sun's ZFS, as well as covering some of the functions of volume management systems such as LVM. Even the kernel's ext3/4 principal developer, Theodore Ts'o, believes that Btrfs is the way forward.
Btrfs was considered very much an experimental filesystem when first included in version 2.6.29 of the kernel, but it has matured and Fedora has stated that it would like to make it the default filesystem for Fedora 16. Whether it actually makes that, or is delayed until 17, this is a sign of the filesystem's perceived maturity, although it is still under active development with additions at each kernel release.
What makes Btrfs so attractive is its scalability, being designed to "let Linux scale for the storage that will be available". This doesn't just mean large disks, but also working with multiple disks and, most importantly, doing this in a way that is easy to manage and transparent to applications.
Additional features, such as transparent compression, online defragmentation and snapshots, only add to its attraction.
A promise come true
Way back in 2006, a website appeared with the stated aim of creating open source 3D drivers for Nvidia graphics cards. Many of us were interested by the idea but expected it to get as far as so many of the projects on Sourceforge. There are times it feels good to be wrong.
The Linux kernel now contains the Nouveau drivers for Nvidia cards. If you are building your own kernel, they are tucked away in the staging area of the kernel configuration. This is an area, disabled by default, containing more experimental or unstable (in the sense of being subject to change, not necessarily falling over) drivers.
However, many distros now include these drivers in their default kernels and installations. While not giving the same ultimate 3D performance as Nvidia's own drivers, for most non-gaming use, these are more than sufficient, and you don't run into the problems that can occur when trying to mix proprietary, pre-compiled code with open source.
It is unlikely that these reverse-engineered drivers will ever be as fast as the ones written by people with intimate knowledge of the workings of the hardware, and the ability to view code that Nvidia is able to use but not divulge. However, they are far better than most people's initial expectations, and it is often worth sacrificing a little speed you may never use for the reliability of a properly integrated driver.
AppArmor (Application Armor) is a security module for the kernel. It allows the restriction of the capabilities of individual programs by means of a set of profiles. Assigning a profile to a program, or set of programs, determines what they are allowed to do. This means that even if a program has to be run with root privileges to do what it needs, it is not able to do whatever it pleases, a limitation of the standard user-based permissions system.
Now it is not enough for the user running the program to be "trusted". The program can be given explicit permissions to do only what it needs. Linux already had SELinux, which fulfils a similar role, but AppArmor is generally considered to be easier to learn and maintain.
This is important, because complex security systems are harder to get right, and a security system that you think protects you when it doesn't is worse than no system at all.
Born in fire
AppArmor has been around for a while. A previous incarnation, known as SubDomain, appeared in Immunix Linux in 1998. In 2005 it was released as AppArmor by Novell and included in OpenSUSE and SLES. However, it was not until the release of Linux 2.6.36, in October 2010, that it became an integral part of the kernel.
If you want to read about the nitty-gritty of kernel development, there is only one place to go - the LKML (Linux Kernel Mailing List). This is where the important work and discussions take place, but it is not a place for the faint-hearted. It is quite high traffic, with around 200-300 mails a day, and very technical. It's also fair to say that many developers are known more for their coding abilities than their people skills.
Email also encourages a more blunt approach than face-to-face communication - "if you're subtle on the internet, nobody gets it" - add in the ego factor when people are debating the merits, or otherwise, of their own brainchild, and you have an environment that is more productive than friendly.
Flamewars abound, but they serve a purpose, allowing developers to argue the case for their choice of included code. As long as the debates focus on the topics and not personalities, this remains productive.
Linus has acknowledged that even he has changed his mind and admitted that his original position was wrong in some of these debates, although not very often.
There have been some significant flamewars/debates over recent years, such as the one on allowing the inclusion of binary firmware 'blobs' (proprietary code needed by some drivers to interface with the hardware), a topic bound to raise the blood pressure of the freedom purists while appealing to the pragmatists. Or the one between Linus and the ARM developers about their somewhat insular position. This one worked out well, with more ARM-related code being moved into the main tree instead of hiding in its own corner.
Given that there are probably more devices running Linux on ARM than x86 these days (it's the processor of choice for embedded systems and smartphones), this was a sensible evolution - even if (or because) it was born in fire. Virtualisation is everywhere, from Linux Format readers using VirtualBox or KVM to test distros from the DVD, to massive data centres providing hosting on virtual machines. The kernel has support for the KVM (Kernel-based Virtual Machine) and Xen virtualisation systems.
There is also extensive support for proprietary systems, such as those supplied by VMware and Microsoft, so that Linux virtual machines can run just about anywhere. A virtual machine emulates a hardware computer, so anything that can reduce the amount of emulation required by the software, and place as little overhead between the virtualised hardware and the metal it runs on, is a good thing.
The KVM hypervisor extensions have done this for the CPU for some time now, and now the network has had the same treatment. All communication between a guest OS and either the host or another guest is done over a network connection, and virtual machines have traditionally emulated a real world network card, for the widest compatibility.
Linux recently introduced the vhost network driver, which does away with as much of the legacy hardware emulation as possible to give speeds of up to 10GB per second, as well as lower latency. It still works with standard drivers on the guest, so a specially modified virtual machine is unnecessary, as all the work is done in the kernel and set-up on the host.
Rise of the Androids
One company has managed to dramatically increase the number of people using Linux-powered systems, and we're not referring to Canonical. Google's Android OS has put Linux in the hands (literally) of millions of people who don't even know what an operating system is, let alone have heard of Linux.
Is that a good thing? That remains to be seen but it's a subject for plenty of discussion. What is also of concern to many is how Google participates in kernel development, or not. Many of the changes it makes are not fed back to the kernel. This is within the terms of the GPL, as long as the source is available, but many feel it's not in the spirit of sharing that forms the basis of the GPL.
Whether this is due to a reluctance on Google's part, or the way its development process works (it seems to prefer to feed back in large blocks at infrequent intervals rather than the far more common 'little and often' approach of open source) also remains to be seen.
It is, however, a huge, if largely invisible, take-up of Linux over the past couple of years. If only Linus had had the foresight to adopt a licence that required any device running Linux to display Tux somewhere.
Incidentally, Android may be the strongest argument for calling the desktop/server operating system GNU/Linux to differentiate it from other Linux-based operating systems, like Android, while at the same time showing that Linux without GNU is both workable and popular. While there haven't been many obvious, major leaps forward for the Linux kernel recently, this is simply a sign of its maturity, and a justification for the 3.0 label.
It will continue to evolve and progress in small steps (most of the time), which may seem uneventful. If you ever doubt the progress that Linux has made over recent years, grab a three-year-old copy of Linux Format, read what was considered new and interesting and then try installing the distro from the DVD to your new hardware.
Here's looking forward to the new features in the 4.0 kernel released in 2021, even if it is just a renumbered 3.99.
First published in Linux Format Issue 150
Liked this? Then check out 10 best Linux distros for 2011
Sign up for TechRadar's free Week in Tech newsletter
Get the best tech stories of the week, plus the most popular news and reviews delivered straight to your inbox. Sign up at http://www.techradar.com/register