What on earth is OpenStack?
25th Nov 2012 | 12:00
Your guide to the Linux of cloud computing
It was back in 2010 when Rackspace, the company famous for hosting lots of websites, got together with NASA, the agency famous for pretending to send astronauts to the moon.
The whole project kicked off after a single blog post by a NASA contractor. The post read: "Launched NOVA - Apache-Licensed Cloud Computing, in Python. It's live, it's buggy, it's beta. Check it out."
Together, NASA and Rackspace went on to create a kind of online fantasy world, where storage, resources and performance would be no object, and small startups could build their ivory towers in the clouds, knowing that when their day came, they'd be able to scale everything up, quickly and efficiently, before quickly selling their stock to Facebook.
8 of the best tiny Linux distros
How to dual-boot Linux and Windows
24 things we'd change about Linux
Beginner's guide to Linux
OpenStack makes all this possible because it's open source, and so works like Linux itself. In fact, some people refer to it as the Linux of cloud computing.
You can pay for someone to run OpenStack for you, or you can experiment with it knowing you're playing with something that can easily be expanded. Just like Linux.
Q. Please, not the cloud. I don't get it.
A. Perhaps the best analogy we can make between real clouds and those of the computing variety is that there's no single definition of what a cloud is.
It's a definition that's always changing, like the weather. It could be your Gmail and your Google Docs, but it's also your Facebook account or your Exchange calendar.
If you're a tech company, there's a good chance your IT department might already be replacing its own racks of servers with rented space and capacity on someone else's cloud, or you might want to farm out CPU-intensive science to Amazon's EC2 to get the results of a calculation in minutes rather than the weeks it might take on our desktop machine.
Q. Does that mean you can't be any more specific on what OpenStack does?
A. Getting more specific means using some dangerous acronyms. Start with IaaS, for instance, which rather than being a self-deprecating insult, stands for Infrastructure as a Service.
This use of the cloud is closest to our IT department in the previous example, where a company replaces its own infrastructure with one it rents by capacity in the cloud.
This is the business Rackspace wants to dominate, and with good reason.
There are massive economies of scale to be had when a single company manages the infrastructure, with potentially big savings for any company that wants rid of its own IT. That makes for a very attractive business model.
This is also the level at which OpenStack works. It's a solution for IaaS because you can do anything you want with it - just as you can with a virtual machine, for example.
At the other end of the scale we've got SaaS, which is Software as a Service. This is easier to understand because, rather than managing the entire infrastructure, you need to worry only about a specific application.
It could be a company CMS, or any of your company-wide systems that need high-availability.
The important difference between SaaS and older server models is that these applications are usually provided by the cloud vendor.
The simplest example is paying for corporate access to Google's mail, calendar and document suits, rather than running those services yourself or paying a company to host your servers...
SaaS means that you don't need to take responsibility for installation, deployment or maintenance (unless these services are running on your own cloud, of course).
SaaS becomes relevant to OpenStack only when you configure it to offer a service that you then sell to customers, such as your own cloud-based email or document editing.
Q. OK, but why is OpenStack any different?
A. There are several reasons why OpenStack is different and worthy of your attention. The first is that it really is open source. Each component has been released under the terms of the Apache licence, which is slightly more permissive than the GPL.
It means you can release any changes you make under a different licence, for instance, and this licence issue has been turned into a feature by the OpenStack team.
"We strongly believe that an open development model is the only way to foster badly-needed cloud standards, remove the fear of proprietary lock-in for cloud customers and create a large ecosystem that spans cloud providers," is part of their five-minute overview of the project.
With an open API, open formats and completely modifiable source code, you can see why, when so much money is being spent on moving to the cloud, OpenStack makes so much sense.
It's also relatively straightforward for companies to play with private clouds on their own networks, and then farm them out to OpenStack providers such as Rackspace when it begins to make sense.
Another reason is that OpenStack has become something of a phenomenon. You need only to look at the official list of the 180 companies that confess to using it. You'll find the likes of Dell, AMD, Cisco, HP, AT&T, Broadcom and Yahoo alongside Rackspace.
Sadly, NASA is currently moving away from OpenStack in favour of Amazon.
You'll also find a list of Linux heavyweights getting closer to OpenStack, including Red Hat, SUSE and Canonical. Red Hat previously had its own cloud solutions, including Aeolus, which is a neat software suite for deploying your own virtual machines internally and across multiple incompatible clouds.
It was finally tempted by OpenStack in April, after it moved to a new foundation governance model.
Q. But we thought Canonical had made a big deal over using Eucalyptus for its cloud?
A. It did, originally. But about a year ago, the company behind Ubuntu announced it was replacing Eucalyptus with OpenStack.
The decision seems to have been made because Eucalyptus isn't as open as OpenStack. It's open source, but there's six months between each release, for example. OpenStack, on the other hand, is developed in the open and doesn't have a proprietary component.
Q. So there's more than one component to OpenStack?
A. Yes, there are actually three. The first is called OpenStack Compute. This is the 'Nova' alluded to in that initial blog post, and it's the infrastructure part. It's where the virtual machines actually live.
Compute provides API access to their configuration and management. It's where the developers can not only get involved with the virtual hardware, they can also build for scaling and concurrency.
Despite the low-level nature of these kinds of operations, Compute is written in Python, and many developers choose to use the Python bindings. There is wide-ranging support for the different virtual machine backends - a part known as the hypervisor.
This is in contrast to Eucalyptus, which supports only Xen and KVM, and that was after considerable help from Canonical.
OpenStack will work with KVM and Xen, but also VMWare, LXC (Linux Containers - look out for a tutorial next month), User Mode Linux and even Qemu, although the documentation admits the last two are best used for development purposes rather than live.
Q. And what are the other components?
A. With the infrastructure being handled by Nova, the other major requirement for a cloud system is storage. Unlike the computer you're reading this on, storage in the cloud needs to cater for distributed systems, processes and dynamic capacity.
This component in OpenStack is called Object Storage, or Swift, and it was initially developed by Rackspace. Like Nova, it's also written in Python.
It caters for both file (object) and block storage, which can be used for persistent storage by the hypervisor.
There are many advanced features, such as the ability to easily add extra capacity, or heal failures automatically, and it can scale to sizes still unfamiliar to most users - multiple petabytes and billions of objects.
Finally, Swift is tied to Nova using the OpenStack Image Service, which handles the discovery, registration and delivery for virtual disk images.