Quick fixes for common Linux problems

8th Mar 2009 | 12:00

Quick fixes for common Linux problems

Solve issues with start-up, drivers, networking and more

Fix Linux booting problems

We'll come right out and say this – Linux breaks. There, we've got that off our chests. No matter how much we might like our chosen distro, there is no denying that things can go wrong, or that it might not even be right in the first place.

Of course, Linux distros are not alone in this – a computer system is a huge, complex collection of interacting software and hardware, even more so when the basic install includes several gibibytes of extra software over and above the OS. The typical distro has more components than a car engine, yet is open for, and even encourages, user fiddling, which leads the curious user to indulge in some provocative maintenance.

To make it worse, a computer is often built from bits made by different manufacturers – motherboard from one, graphics card from another, soundcard from elsewhere – and an operating system that many hardware manufacturers pay no more than lip service to, if that. Between the Answers pages of Linux Format and the Help forum on our website, we see the pain you sometimes suffer and we're here to help.

So here's our guide to dealing with some of the most common problems, and some advice on how to deal with new disasters. The types of difficulties most often seen can be split into a number of broad categories: booting, hardware and drivers, misbehaving software and networking are among the most popular topics for discussion. We can't show you solutions for every problem that might arise, but we can show some of the common issues people face and, more importantly, show you how to go about identifying a problem.

One more thing to bear in mind as you're reading is that even if you can't work out the solution yourself, an accurate description of the problem will be of great help when asking others for advice.

Fix Linux booting problems

Distro installers are pretty good at identifying an existing Windows installation and setting up dual booting, but should you have to reinstall a spyware-riddled Windows install you'll find that your machine boots straight into Windows and that your Linux installation is gone!

Don't panic: all Windows has done is overwrite the Grub bootloader with its own equivalent, removing your boot menu. All your data is still there – you just need to reload the bootloader configuration into the disk's master boot record (MBR). You'll need to boot from a Live CD to do this, this, then open a terminal and run

sudo grub-install /dev/sda

This assumes you have everything installed on the first (or only) hard drive. Grub-install will usually make a good job of detecting a Grub installation and set things back to rights. If it doesn't, you'll have to do it manually, which is a lot easier than it sounds. Run sudo grub to enter the Grub shell. Then run

find /boot/grub/stage1

to determine which partition holds the Grub files. If Windows is on the first partition Grub is likely to be on the second, in which case this command will return something like (hd0,1). Now set Grub up with

root (hd0,1)
setup (hd0)
quit

The first command identifies the boot partition, the second writes the bootloader to the MBR and then you leave the Grub shell. Grub is only concerned with the location of /boot, so if you have a separate /boot partition, omit the /boot part from the find command.

When booting stalls

In times of yore, the Linux boot sequence scrolled pages of text up the screen. Most of it was undecipherable to mere mortals, but if it stopped you could see exactly where it stopped, with the last line or two of text containing a clue to the problem.

Nowadays, distros show a splash screen while they're booting, which is all very nice until things go wrong, then the boot stops and the splash screen hides all the clues. If the failure is early in the boot sequence, you may find that adding noapic to the kernel boot line helps. Do this in the same way you remove the splash references (see box below).

If this does fix it, edit the Grub configuration file at /boot/ grub/menu.lst or /boot/grub/grub.conf and add the noapic option, or others your searches revealed as cures. You can use the same technique if your system is slow in shutting down, watching the output to see where it stalls or pauses for too long. As with so many problems that can arise, it's easier to find an answer once you know the problem.

Fix Linux driver problems

Fix Linux driver problems

Don't expect to find Linux drivers on the CD that comes with your shiny new gizmo. That's not because the manufacturers don't care about Linux but because drivers for most devices are already installed on your system as kernel modules. Kernel modules can be loaded from the command line or a startup file, but the HAL/D-BUS system usually recognises hardware and loads the modules automatically. What do you do if it does not? How do you know which module to load?

Identifying the hardware

The first step is to get the details of the hardware with lspci for internal devices or lsusb for USB devices (some laptop hardware is also connected via USB) with these commands

sudo lspci
sudo lsusb

which produce output like

00:1f.2 SATA controller: Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA AHCI Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation GeForce 7100 GS (rev a1)
02:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03)
03:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet Adapter (rev b0)

and

Bus 001 Device 004: ID 03f0:2c17 Hewlett-Packard
Bus 004 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
Bus 002 Device 002: ID 067b:2303 Prolific Technology, Inc. PL2303 Serial Port

Once you've identified which device is which, you can get further information by using the -s option to query a specific device and -v for more information, like

sudo lspci -s 03:00.0 -v
sudo lsusb -s 001:004 -v

This is particularly useful with lspci, as the extra information shows the kernel module in use for the device (if there is one). The -k option also shows this, without the other extra information. You may be wondering what use that could be if you're trying to find out which module to load to enable the device.

The answer lies with that firm favourite of troubleshooters, the Live CD. If the device is recognised when booting from the Live CD, run lspci -k to see which module it uses, then you can go back to your installed system and try to load it with.

modprobe -v modulename

If you see no output, the module is already loaded, and should show up in the output from lsmod. If you see something like this:

insmod /lib/modules/.../modulename.ko

the module is now loaded and your drivers should work, or at least be available for configuration. The other responses are a 'device not present' message, which indicates that the hardware for that module is not found, which usually means you have picked the wrong module.

Finally a 'module not found' message means the module is not present on your system. Most distros come with most kernel modules installed, so your hardware is either incredibly arcane and you'll need to compile a new kernel to enable it, or the hardware is only supported in a more recent kernel than you have.

You can check the kernel version of the Live CD and your system with

uname -r

If the Live CD's kernel is newer, look for an update for your distro. Another option is that this hardware is not supported in the kernel but uses a third-party driver. The most common occurrence of this is with wireless cards that use a driver like MadWifi or NdisWrapper. If you need to install a separate driver, it is probably available in one of your distro's repositories. Once that is installed, your hardware should be ready to go.

Graphics hardware

None of the above applies to graphics cards. Their drivers are part of the X.org software, unless you use an ATI or Nvidia card. These do have drivers included with X.org, but the separate driver packages from the manufacturers give better performance. If you want to do anything that needs 3D acceleration, whether it's playing games or enabling desktop effects, you should try the manufacturer's drivers.

While they can be downloaded and installed from the respective websites, it is best to use your package manager to install them, because they also require changes to your xorg.conf file, the file that controls your graphical display, and the distro packages will make the changes for you. If you do decide to go the independent route, make sure you download the correct package for your card and read the instructions carefully before you do anything.

It's not all down to the software

The hardest problems to diagnose are those that occur apparently randomly, especially if they lock up or crash the computer without warning. When the crash occurs at the same time, or using the same software, you have an idea who to blame, but if it is truly random it may well be hardware. The most common hardware causes of such problems are overheating, faulty memory or poor power.

It's no use thinking "this doesn't happen in my other OS, so it must be Linux's fault" because different systems work the hardware differently. For instance, Linux uses memory more aggressively and will experience instability due to faulty memory before Windows starts to show symptoms. Fans and heatsinks become gradually blocked with dust and other crud during a computer's lifetime. Try blowing it clear with a can of compressed air. Installing lm_sensors (your distro should have it) will let you monitor CPU and case temperatures, and a system monitor like GKrellM will display the temperatures on your desktop.

Laptops don't lend themselves to being opened up for a good blow, but you should check the various vents for any blockage. One area where laptops are fairly safe is power, since the battery ensures a clean steady supply. Desktop power supplies are another matter, especially the cheap, unnamed ones that are included with lower-priced cases.

Built down to a price, some barely meet their specs when new, so try a different PSU in your computer – you may be surprised by the difference it makes. Dirty power can damage your hardware and data, so saving money here can be a false economy whereas good-quality PSUs can go on for years. If you live in an area with unreliable or dirty power, a UPS (Uninteruptible Power Supply) may be a worthwhile investment. Surge protectors don't protect against power reductions, only surges.

Testing memory is easy, if time consuming. Most Live CDs include Memtest86, which does exactly what it says. You need to boot into Memtest86, because it can only test memory that isn't in use, so you don't want a full OS running. Let it run through its full set of tests at least twice, preferably overnight. The longer you can leave it running, the more certain you can be that your memory is OK. If you see any errors, at least one of your memory sticks will need to be replaced.

Where's my desktop?

So you've installed the latest distro, rebooted your computer and instead of the glorious 3D enhanced desktop you expected to see, all you get is a black screen with a login prompt and a blinking cursor. What went wrong? The usual cause of this is that the installer was unable to auto-detect the properties of either your graphics card or display.

Sometimes it will drop right down to a text console, other systems may boot into a limited display, like 800x600 with no acceleration. You need to run your distro's configuration tool to generate a working display configuration, but the first step is to log in as root if possible, otherwise as your normal user, using the password you gave during installation. The program to run depends on your distro, but the most popular options are

These usually open a textual version of the graphical configuration tool, from where you can select the correct graphics card and monitor. If your distro doesn't have such a tool, you can create a basic X.org with

X -configure

If you still get a text display when you boot up, log in at the console and run

startx

which should load up a really basic desktop. Press Ctrl+Alt+Backspace to exit it, you now have a working X display. If startx fails, look at the log file at

/var/log/ Xorg.0.log

in particular any lines containing (EE), as these are errors. The file is quite long, but you can find them with

grep EE /var/log/Xorg.0.log

If you get a desktop, but in a limited resolution, the approach is the same, except you can use the graphical versions of the configuration tools.

Fix Linux wireless networking problems

If there is one topic that causes more tearing out of hair than any other, it's wireless networking, what with in-kernel and third-party modules, not to mention the use of Windows drivers as a last resort. Then you have the various encryption methods and a variety of network management systems to contend with.

As with all such things, when you break it down into simple steps, one complex task becomes a series of much simpler ones. The first step is to make sure your hardware drivers are loaded, so check the output of

sudo ifconfig -a

This should show your wired network interface as eth0 and your wireless as one of wlan0, ath0 or even eth1. If none of these show up, try repeating the test from a Live CD and, if it does show up, run

sudo lspci -k

to see which module it uses. If you're still stuck, the details from lspci -v should give enough information on the card to search the web for the correct driver.

Once you have the correct driver you can get on with configuring it, right? Well, maybe. Some wireless cards need a firmware file that is loaded on to the card when it is initialised. The driver will take care of this, but it needs the file to be in /lib/firmware. The methods for getting this file depend on the hardware in use, but usually involve extracting the firmware from the Windows drivers (or downloading a file that someone has already extracted).

So now you are ready to proceed with configuring the connection, so you can skip over the next bit. A last resort? What happens if you can't find the driver for your wireless card? In that case you will have to use NdisWrapper. This is a kernel module that uses the NDIS (Network Driver Interface Specification) drivers supplied for Windows in Linux.

The first step is to install NdisWrapper from your distro's package manager. Then you need files from the CD that came with your card. It is important to use the correct CD, because manufacturers have a habit of changing the chipsets used on a card, and hence the drivers needed, without changing the model number. You can also find information on which cards are supported by which drivers at http://burnthesorbonne. com/?page_id=32.

Once you have installed NdisWrapper, find the driver file, which will be an INF file on the CD. Load it with

sudo ndiswrapper -i /path/to/driver.inf

You can then check that it is working with

sudo ndiswrapper -l

which will list the drivers now available to NdisWrapper. Finally, you can load the module with

sudo modprobe -v ndiswrapper

and your wireless card should appear as wlan0. If there is no INF file on the CD, the drivers are probably packed into an EXE file, which is usually a self-extracting zip file in a Windows executable. You can unpack this using the unzip program on Linux with something like

unzip /mnt/cdrom/install.exe

You will probably want the NdisWrapper module loaded automatically – see the Auto Loading Modules boxout, belowleft, for details on this. Getting connected The first rule of wireless networking is to always use an encrypted connection, but in this case it is easier if you turn off encryption for a couple of minutes, as it removes one potential source of problems. Also make sure that your router is not set to filter out all but specified MAC addresses (we've all been caught out by that one when using a new laptop or wireless card).

Most Linux distros use Network Manager to handle wireless (and wired) connections, and the name of your wireless access point should appear when you click on the Network Manager icon in the task bar. If it doesn't, the first thing to check is that your access point is set to broadcast its SSID (Service Set Identifier – the name of your wireless network). Some people disable this in their access points in the belief that it increases security (it doesn't, because every time you connect to the network, you broadcast the SSID in plain text).

If it still fails to show up, try moving closer to the access point. You can also check for the presence of available networks with these terminal commands

sudo ifconfig wlan0 up
sudo iwlist wlan0 scan

The first line ensures that the wireless card is active, and the second should produce a list of all wireless networks in range. If you see a message like "Interface Doesn't Support Scanning" you're either using the wrong interface (wired instead of wireless), or you're not using the correct driver or firmware for your wireless card, and you'll have to go back to the top of the page and try again.

Once you can connect, immediately disconnect, enable encryption in your access point/router and try again. The best encryption to use is WPA2 or, if your wireless card/ driver does not support it, use WPA (Wi-Fi Protected Access). You should not use WEP unless you absolutely cannot avoid it. It provides only minimal security and is easily cracked by a determined neighbour.

Fix Linux networking problems

Fix Linux wireless networking problems

Networking that doesn't work is a problem that affects all operating systems from time to time. It can be frustrating to deal with, since things often seem to just not work, without giving any clue as to where the chain is broken. The first test is usually that old favourite, ping

ping www.linuxformat.co.uk

This should show packets being sent to an received by the Linux Format server. If it doesn't, try

ping 80.244.178.151

That's the IP address of the Linux Format website, so if that works when the previous command didn't, you know that you're unable to resolve domain names into IP addresses. Check that /etc/resolv.conf contains the addresses of your ISP's DNS servers. If you are using a router with a DHCP server, you may find that it contains the router address, in which case you should check that the router has the correct IP addresses.

If pinging an IP address doesn't work, try pinging one of the ISP's servers, such as the DNS server (ISPs usually give DNS addresses on their websites, and it doesn't hurt to make a note of these). If that works, the problem is probably with your ISP's connection to the rest of the internet. Another possibility is that your system is trying to use IPv6, the newer IP protocol, but your router does not understand it, which causes long delays, long enough to look like it isn't working.

The next step is to check whether you can connect to your router's web interface (if it has one) or ping your modem. If this works, the link between your modem and the ISP is down, which could be a line fault (check that your cat/ significant other hasn't unplugged the ADSL phone line), a problem at the ISP or you haven't paid your bill.

At this point a phone call to your ISP's support desk is in order. If you can't get through, it's most likely a problem at their end and the only solution is patience. Finally, check everything local: are the cables connected? Does ifconfig -a even show your network interface? If not, have you changed anything since your last reboot? A kernel update will stop third-party modules working until they are reinstalled, and some network adaptors, particularly wireless ones, use third-party kernel modules.

Software - what to do when all your processes are trying to run at once

Have you ever noticed how sometimes things just go slower and slower? There's nothing specific but everything seems to take longer than it should. I find caffeine helps here, or sleep in extreme cases, but what about when it happens to the computer? There are three main resources in your computer: CPU cycles, memory and hard disk space, and it is possible that a runaway program, or even general usage, can be using up too much of one of these.

CPU usage is the easiest to check, using the top program (that's its name, not an opinion of its usefulness). Run this from a terminal and you'll see rows and columns of data in the terminal window. The CPU line shows how much of the processor is being used by various types of program: sy is system, us is user and ni refers to programs that are running with a positive nice value. Nice is a way of scheduling programs to use more or less CPU; the higher the niceness, the nicer a program is to other process, letting them have first pick at the available CPU cycles.

It's a little more complicated than that, as nice is only a recommendation to the kernel's process scheduler, but that's too involved to go into here.

Double top

If you have more than one CPU core, press 1 to have top show them all. The figures to look at first are id and wa, for idle and wait. Unless you are compiling software or playing video, the idle figure should be quite high, usually over 90%. If it is down to single figures, or even zero, something is sucking up all your CPU cycles. That's fine if it's something you intend to do, like transcoding a video, but it could also be a runaway process.

The list of processes shows the amount of CPU and memory that each program is using, and by default this is sorted by CPU usage. If something is hogging the processor, you can use top to either renice it, if it is something you want to keep running, or kill it.

The first column shows the PID, which is the program's unique process ID. Press R to renice or K to kill, and type in the PID of the process. Renicing asks for a number to nice the process by, which is added to the existing value (higher number are 'nicer'). Nice values can run from -20 to 19, but only the root user can set a negative value.

Five is a good starting point, and 19 means the process only gets CPU time when nothing else wants it, which is useful for an intensive background task like video transcoding. Killing a task sends signal 15 (the TERM signal), and is the equivalent to pressing Ctrl+C in a terminal. It asks the program to stop, so the program has the opportunity to shut down cleanly. If the program is really out of control, it may not respond to this, so you should send signal 9 (KILL) which will stop the program without giving it that chance to elegantly shut down.

-------------------------------------------------------------------------------------------------------

First published in Linux Format issue 117

You might also like: How to run Linux from a USB drive

Sign up for the free weekly TechRadar newsletter
Get tech news delivered straight to your inbox. Register for the free TechRadar newsletter and stay on top of the week's biggest stories and product releases. Sign up at http://www.techradar.com/register

Follow TechRadar on Twitter

Linux Ubuntu Debian Fedora
Share this Article
Google+

Apps you might like:

Most Popular

Edition: UK
TopView classic version