Just Software Solutions

Blog Archive for / general /

Cryptography and Society

Tuesday, 05 May 2015

Politicians in both the UK and USA have been making moves towards banning secure encryption over the last few months. With the UK general election coming on Thursday I wanted to express why I think this is a seriously bad idea.

Context

Back in January there were some terrorist attacks in Paris. These attacks were and are a serious matter, and stopping such attacks in future should be something that governments concern themselves with.

However, one aspect of the response by politicians has been to call for securely encrypted communication outlawed. In particular, the British Prime Minister, David Cameron, asked

"In our country, do we want to allow a means of communication between people which, even in extemis with a signed warrant from the Home Secretary personally, that we cannot read?"

It was clear from the context that he thinks the answer is a resounding "NO", and from the further actions of politicians both here and in the USA it appears that others in government agree with that point of view. They clearly believe that the government should be able to read all communications.

I think in a fair, open, democratic society, the answer must be "YES": private individuals must be able to communicate without risk of eavesdropping by government officials.

Secure Encryption is Not New

Firstly, there have ALWAYS been means of communication between people that the government cannot read. You might be able to intercept a letter, and read the words written on the piece of paper, but if the message is not in the words as they appear, then you cannot read it.

Ciphers which could not be cracked by contemporary eavesdroppers have been used since at least the time of the Roman Empire. New technology merely provides a new set of such ciphers.

For example, the proper use of a one-time pad provides completely secure encryption. This technique has been in use since 1882, if not earlier.

Other technology in widespread use today "merely" makes it exceedingly difficult to break the cipher, requiring hundreds, thousands or even millions of years to crack with a brute-force method. These time periods are enough that these ciphers can be considered uncrackable for all intents and purposes.

Consequently, governments are powerless to actually prevent communication that cannot be read by the security services. All that can be done is to make it hard for the average citizen to use such communication.

Terrorists are Criminals

By their very nature, terrorists are criminals: terrorist acts themselves are illegal, and even the possession of the weapons used for the terrorist acts is often also illegal.

Therefore, terrorists will not be put off from using secure communication just because that too is illegal.

In particular, criminal organisations will not think twice about using whatever means is available to ensure that their communications are private: if something gives them an edge of the police or anyone else who would seek to stop them, they will use it.

Society relies on secure encryption

Secure encrypted communication doesn't just prevent government agencies reading the communications of criminals, it also prevents criminals reading the communications of ordinary citizens.

This website, in common with an increasingly large number of websites, uses HTTPS for all traffic. When properly configured this means that someone intercepting the website traffic cannot identify which pages on the website you visited, or extract any of the data sent by you as a visitor to the website, or by the website back to you.

This is crucial for facilities such as online banking: it prevents computer criminals from obtaining passwords and account data by intercepting the communications between you and your bank. If such communications could not be relied upon to be secure then online banking would not be viable, as the potential for fraud due to stolen passwords would be too great.

Likewise, many businesses use secure Virtual Private Networks (VPNs) which rely on secure encryption to transfer data between computers that are only connected to each other via the internet. This allows them to securely transfer data between sites, or between remote workers, without worrying about the communications being intercepted by criminals. Without secure encryption, many large multi-national businesses would be hugely impacted, as they wouldn't be able to rely on transferring data across the internet safely, and would instead have to rely on physical transfer via courier.

A "back door" or "government secret key" destroys the security of encryption

Some of the proposals from politicians have been to require that companies that provide encryption services must also provide a means whereby government security services can also decrypt the communications if required.

This requires that either (a) the company in question keeps a database of all the encryption/decryption keys used for all communications, or (b) the encryption algorithm used allows for decryption via a "back door" or "secret key" in addition to the standard decryption key, so that the government security services can gain access if required, without needing to know the customer's decryption key.

Keeping a database of the decryption keys just provides a direct target for attack by computer criminals. Once such a database is breached, none of the communications provided by that company can be considered secure. This is clearly not a good state of affairs, given the number of times that password databases get compromised.

That leaves option (b): providing a "back door" or "secret key", or other means whereby an otherwise-encrypted communication can be read by the security services. However, this fundamentally compromises that encryption.

Knowing that the back door exists, criminal computer crackers will work to ensure that they too can gain access to the communication, and they won't wait for a warrant from the Home Secretary or whatever government department is responsible for issuing such warrants! Any such group that does manage to obtain access would probably not make it public knowledge, they would merely use it to ensure that they could access communications that were relevant to them, whether that was because they had a direct use for the information, or because it could be sold to other criminal organisations.

If there is a single key that can decrypt all communication using a given system then that dramatically reduces the computation effort required to break the key: the larger the number of messages that are transmitted with a given key, the easier it is to identify the key, especially if you have access to the raw unencrypted message. The huge volume of electronic communications in use today would mean that the secret back door key would be much more readily compromised than any individual encryption key.

Privacy is a human right

The Universal Declaration of Human Rights was adopted by the UN in 1948. Article 12 states:

No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.

Secure encrypted communication protects our correspondence from interference, including interference by the government.

Restricting the use of encryption is also a violation of the right to freedom of expression, guaranteed to us by article 19 of the Universal Declaration of Human Rights:

Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers.

The restriction on our freedom of expression is easy to see: if I have true freedom of expression then I can impart any series of letters or numbers to anyone without interference. If that series of letters or numbers happens to be an encrypted message then that is of no consequence. Any attempt to limit the use of particular encryption algorithms therefore limits my ability to send whatever message I like, since particular sequences of letters and numbers are outlawed purely because of their meaning.

Human rights organisations such as Amnesty International use secure encrypted communications to communicate with their workers. If those communications could not be secured against interference then this would have a detrimental impact on their ability to do their humanitarian work, and could endanger their workers.

Encryption is mathematics

Computer encryption is just a mathematical algorithm applied to a series of numbers. It is ridiculous to consider that performing mathematical operations on a sequence of numbers could be outlawed merely because that sequence of numbers has meaning to someone.

End note

I strongly object to any move to restrict the use of encryption technology. It is technologically and morally unsound, with little or no upside and considerable downsides.

I urge politicians to likewise oppose any moves to restrict the use of encryption technology, and I urge those standing in the elections in the UK this week to make it known to their potential constituents that they will oppose such measures.

Finally, I think we should be encouraging the use of strong encryption rather than discouraging it, to protect us from those who would intercept our digital communication and use that for their gain and our detriment.

Posted by Anthony Williams
[/ general /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Firefox is losing market share to Chrome

Monday, 09 March 2015

I read with interest an article on Computerworld about Firefox losing market share, wondering what people were using instead. Unsurprisingly, the answer seems to be Chrome: apparently Chrome now has a 27.6% share compared to Firefox's 11.8%. That's quite a big difference.

I checkout out the stats for this site for February 2015, and the figures bear it out: 30.7% of visitors use Chrome vs 14.9% Firefox and 12.8% Safari. Amusingly, 3.1% of visitors still use IE6!

What I did find interesting is the version numbers people are using: there were visitors using every version of Chrome from version 2 to version 43, and the same for Firefox — someone was even using Firefox 0.10! I'm a bit surprised by this, as I'd have thought that users of these browsers were probably amongst the most likely to upgrade.

Why the drop?

The big question of course is why the shift? I switched to Firefox because Internet Explorer was poor, and I've stuck with it, mainly through inertia, but I've used other browsers over the years, and still prefer Firefox. I've got Chrome installed on my desktop, but I don't particularly like it, and only really use it for cross-browser testing. I only really use it on my tablets, where it is the only browser I have installed — I tried Firefox for Android and was really disappointed.

Maybe that's the cause of the shift: everyone is using mobile devices for browsing, and Chrome/Safari are better than the others for mobile.

Which browser(s) do you use, and why?

Posted by Anthony Williams
[/ general /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Gotchas Upgrading Apache from 2.2 to 2.4

Wednesday, 10 December 2014

I finally got round to upgrading one of my servers from Ubuntu 12.04 (the previous LTS release) to Ubuntu 14.04 (the latest LTS release). One of the consequences of this is that Apache gets upgraded from 2.2 to 2.4. Sadly, the upgrade wasn't as smooth as I'd hoped, so I'm posting this here in case anyone else has the same problem that I did.

Oh no! It's all broken!

It's nice to upgrade for various reasons — not least of which being the support benefits of being on the most recent LTS release — except after the upgrade, several of the websites hosted on the server stopped working. Rather than getting the usual web page, they just returned an "access denied" error page, and in the case of the https pages they just returned an SSL error. This was not good, and led to much frantic checking of config files.

After verifying that all the config files for all the sites were indeed correct, I finally resorted to googling the problem. It turns out that the default apache2.conf file was being used, as all the "important" config was in the module config files, or the site config files, so the upgrade had just replaced it with the new one.

Whereas the old v2.2 default config file has the line

Include sites-enabled/

The new v2.4 default config file has the line

IncludeOptional sites-enabled/*.conf

A Simple Fix

This caused problems with my server because many of the config files were named after the website (e.g. example.com) and did not have a .conf suffix. Renaming the files to e.g example.com.conf fixed the problem, as would have changing the line in apache2.conf so it didn't force the suffix.

Access Control

The other major change is to the access control directives. Old Allow and Deny directives are replaced with new Require directives. The access_compat module is intended to allow the old directives to work as before, but it's probably worth checking if you use any in your website configurations.

Exit Stage Left

Thankfully, all this was happening on the staging server, so the websites weren't down while I investigated. Testing is important — what was supposed to be just a simple upgrade turned out not to be, and without a staging server the websites would have been down for the duration.

Posted by Anthony Williams
[/ general /] permanent link
Tags:
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Migrating to https

Thursday, 30 October 2014

Having intended to do so for a while, I've finally migrated our main website to https. All existing links should still work, but will redirect to the https version of the page.

Since this whole process was not as simple as I'd have liked it to be, I thought I'd document the process for anyone else who wants to do this. Here's the quick summary:

  1. Buy an SSL certificate
  2. Install it on your web server
  3. Set up your web server to serve https as well as http
  4. Ensure that any external scripts or embedded images used by the website are retrieved with https
  5. Redirect the http version to https

Now let's dig in to the details.

1. Buy an SSL certificate

For testing purposes, a self-signed certificate works fine. However, if you want to have people actually use the https version of your website then you will need to get a certificate signed by a recognized certificate authority. If you don't then your visitors will be presented with a certificate warning when they view the site, and will probably go somewhere else instead.

Untrusted Certificate

The SSL certificate used on https://www.justsoftwaresolutions.co.uk was purchased from GarrisonHost, but there are plenty of other certificate providers available.

Purchasing a certificate is not merely a matter of entering payment details on a web form. You may well need to provide proof of who you are and/or company registration certificates in order to get the purchase approved. Once that has happened, you will need to get your certificate signed and install it on your web server.

2. Install the SSL certificate on your web server

In order to install the certificate on your web server, it first has to be signed by the certification authority so that it is tied to your web server. This requires a Certificate Signing Request (CSR) generated on your web server. With luck, your certification provider will give you nice instructions. In most cases, you're probably looking at the openssl req command, something like:

openssl req -new -newkey rsa:2048 -nodes -out common.csr \
-keyout common.key \
-subj "/C=GB/ST=Your State or County/L=Your City/O=Your Company/OU=Your \
Department/CN=www.yourcompany.com"

This will give you a private key (common.key) and a CSR file (common.csr). Keep the private key private, since this is what identifies the web server as your web server, and give the CSR file to your certificate provider.

Your certificate provider will then give you a certificate file, which is your web server certificate, and possibly a certificate chain file, which provides the signing chain from your certificate back to one of the widely-known root certificate providers. The certificate chain file will be identical for anyone who purchased a certificate signed by the same provider.

You now need to put three files on your web server:

  • your private key file,
  • your certificate file, and
  • the certificate chain file.

Ensure that the permissions on these only allow the user running the web server to access them, especially the private key file.

You now need to set up your web server to use them.

3. Set up your web server to serve https as well as http

I'm only going to cover apache here, since that's what is used for https://www.justsoftwaresolutions.co.uk; if you're using something else then you'll have to check the documentation.

Firstly, you'll need to ensure that mod_ssl is installed and enabled. Run

sudo a2enmod ssl

on your web server. If it complains that "module ssl does not exist" then follow your platform's documentation to get it installed and try again. On Ubuntu it is part of the basic apache installation.

Now you need a new virtual host file for https. Create one in the sites-available directory with the following contents:

    <IfModule mod_ssl.c>
    <VirtualHost *:443>
    ServerAdmin webmaster@yourdomain.com
    ServerName yourdomain.com
    ServerAlias www.yourdomain.com

    SSLEngine On
    SSLCertificateFile /path/to/certificate.crt
    SSLCertificateKeyFile /path/to/private.key
    SSLCertificateChainFile /path/to/certificate/chain.txt

    # Handle shutdown in broken browsers
    BrowserMatch "MSIE [2-6]" \
            nokeepalive ssl-unclean-shutdown \
            downgrade-1.0 force-response-1.0
    BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

    DocumentRoot /path/to/ssl/host/files
    <Directory "/path/to/ssl/host/files">
    # directory-specific apache directives
    </Directory>

    # Pass SSL_* environment variables to scripts
    <FilesMatch "\.(cgi|shtml|phtml|php)$">
            SSLOptions +StdEnvVars
    </FilesMatch>

    </VirtualHost>
    </IfModule>

This is a basic configuration: you'll also want to ensure that any configuration directives you need for your website are present.

You'll also want to edit the config for mod_ssl. Open up mods-available/ssl.conf from your apache config directory, and find the SSLCipherSuite, SSLHonorCipherOrder and SSLProtocol directives. Update them to the following:

    SSLHonorCipherOrder on
    SSLCipherSuite ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
    SSLProtocol all -SSLv2 -SSLv3
    SSLCompression off

This disables the protocols and ciphers that are known to be insecure at the time of writing. If you're reading this much after publication, or wish to be certain, please do a web search for the latest secure cipher lists, or check a resource such as https://wiki.mozilla.org/Security/Server_Side_TLS.

After all these changes, restart the web server:

sudo apache2ctl restart

You should now be able to visit your website with https:// instead of http:// as the protocol. Obviously, the content will only be the same if you've set it up to be.

Now the web server is running, you can check the security using an SSL checker like https://sslcheck.globalsign.com, which will test for available cipher suites, and ensure that your web server is not vulnerable to known attacks.

Now you need to ensure that everything works correctly when accessed through https. One of the big issues is embedded images and external scripts.

4. Ensure that https is used for everything on https pages

If you load a web page with https, and that page loads images or scripts using http, your browser won't be happy. At the very least, the nice "padlock" icon that indicates a secure site will not show, but you may get a popup, and the insecure images or scripts may not load at all. None of this leads to a nice visitor experience.

It is therefore imperative that on a web page viewed with https all images and scripts are loaded with https.

The good news is that relative URLs inherit the protocol, so an image URL of "/images/foo.png" will use https on an https web page. The bad news is that on a reasonably sized web site there's probably quite a few images and scripts with full URLs that specify plain http. Not least, because things like blog entries that may be read in a feed reader often need to specify full URLs for embedded images to ensure that they show up correctly in the reader.

If all the images and scripts are on servers you control, then the you can ensure that those servers support https (with this guide), and then switch to https in the URLs for those resources. For servers outside your control, you need to check that https is supported, which can be an issue.

Aside: you could make the URLs use https on https pages and http on http pages by omitting the protocol, so "http://example.com/images/foo.png" would become "//example.com/images/foo.png". However, using https on plain http pages is fine, and it is generally better to use https where possible. It's also more straightforward.

If the images or scripts are on external servers which you do not control, and which do not support https then you can use a proxy wrapper like camo to avoid the "insecure content" warnings. However, this still requires changing the URLs.

For static pages, you can do a simple search and replace, e.g.

sed -i -e 's/src="http:/src="https:/g' *.html

However, if your pages are processed through a tool like MarkDown, or stored in a CMS then you might not have that option. Instead, you'll have to trawl through the links manually, which could well be rather tedious. There are websites that will tell you which items on a given page are insecure, and you can read the warnings in your browser, but you've still got to check each page and edit the URLs manually.

While you're doing this, it's as well to check that everything else works correctly. I found that a couple of aspects of the blog engine needed adjusting to work correctly with https due to minor changes in the VirtualHost settings.

When you've finally done that, you're ready to permanently switch to https.

5. Redirect the http version to https

This is by far the easiest part of the whole process. Open the apache config for the plain http virtual host and add one line:

Redirect permanent / https://www.yourdomain.com

This will redirect http://www.yourdomain.com/some/path to https://www.yourdomain.com/some/path with a permanent (301) redirect. All existing links to your web site will now redirect to the equivalent page with https.

When you've done that then you can also enable Strict Transport Security. This ensures that when someone connects to your website then they get a header that says "always use https for this site". This prevents anyone intercepting plain http connections (e.g. on public wifi) and attacking your visitors that way.

You do this by enabling mod_headers, and then updating the https virtual host. Run the following on your web server:

sudo a2enmod headers

and then add the following line to the virtual host file you created above for https:

Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains"

Then you will need to restart apache:

sudo apache2ctl restart

This will ensure that any visitor that now visits your website will always use https if they visit again within 2 years from the same computer and browser. Every time they visit, the clock is reset for another 2 years.

You can probably delete much of the remaining settings from this virtual host config, since everything is being redirected, but there is little harm in leaving it there for now.

All done

That's all there is to it. Relatively straightforward, but some parts are more involved than one might like.

Posted by Anthony Williams
[/ general /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Memories of Learning C

Monday, 17 October 2011

Herb Sutter's latest blog entry invites us to share our memories of our first C program, in tribute to Dennis Ritchie. I can't remember what my first C program was, but I thought I'd write about my memories of learning C.

I studied Physics at college, and there was very little programming taught as part of the course. That didn't bother me though; I'd taught myself to program up until then, and I wasn't going to stop now. The big benefit I got from computing at college was access to the internet, and access to C and C++ compilers. I could program in BASIC, Pascal and a couple of forms of assembly language, and I'd eagerly read Stan Lippman's C++ Primer and written out (on paper!) some C++ code, but I hadn't yet had a C++ compiler to try out my programs on.

I wrote several C++ programs before I even considered writing a plain C program, but I probably typed in and compiled the classic printf("hello world\n"); C program to check everything was working before I compiled any C++.

Usenet

My strongest memories about learning C are about learning from usenet. Though I had access to C compilers at college, access to experts was not so readily available unless you were studying computing. With access to the internet, I didn't need local experts though — usenet provided access to experts from across the world. I read comp.lang.c and comp.lang.c++ avidly, and taught myself both languages together. The usenet community was invaluable to me. The wealth of knowledge that people had, and their willingness to share with newbies was something I really appreciated.

I remember struggling over file handling, and getting the arguments to scanf right; I remember puzzling over the poor performance of a program and having someone kindly point out that my code was doing malloc and free calls in a tight loop. Though I tend to answer more questions than I ask these days, I still hang out on newgroups such as comp.lang.c++ today. It seems that for many people StackOverflow has replaced usenet as the place to go for help, but the old-style newsgroups are still valuable.

Ubiquity

Back then, C++ compilers were in their infancy. Templates didn't work on every compiler, there was no STL, and many platforms didn't have a working C++ compiler at all. I consequently wrote a lot of C — every platform had a C compiler, and my C code would work on the college PCs, my PC (when I saved up enough to buy one), the University's Unix machine, and the Physics department workstations. The same could not be said for C++.

The ubiquity of C is something I still appreciate today, and this is only possible because Dennis Ritchie designed his language to be portable to multiple platforms. Though "implementation defined" behaviour can be frustrating when the implementation defines it a different way to how you would like, it is this that enables the portability. You want to write code for a DSP that only handles 32-bit data? Fine: make char, short, int and long all 32-bits. What if your machine has 9-bit bytes? No problem: just make char 9 bits, and everything else a convenient multiple of that.

C is the basic lingua-franca of the computing world. It is a "portable assembly language". These days I use C++ where I can because it allows a higher level of abstraction, and easier expression of intent without compromising on the performance you'd get with plain C, but it's not as portable, and wouldn't be possible without C.

The computing world owes a lot to Dennis Ritchie.

Posted by Anthony Williams
[/ general /] permanent link
Tags: ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Computer Education

Wednesday, 12 October 2011

Andrew Hague's post on Computer Education in Britain touches on something I've been discussing with my wife recently.

The travesty of ICT

Our children do not get taught anything at school about how computers work, or how to program them, and are unlikely to. Andrew says he found that "the study of computer science for British children ends at about age 11." This doesn't tally with my experience of primary schools — they are never taught any computer science. Local secondary schools are proud of their ICT suites, with office programs and image editing programs galore, but not a single class on the basics of computers and programming.

Part of the problem is the complexity of modern computers. Whereas the computers we grew up with (the Dragon 32, BBC Micro, ZX Spectrum, Commodore 64, etc.) were simple beasts, and booted into a programming environment (BASIC), modern computers are complex beasts with a swish graphical OS with no native programming environment. Yes, you can use Javascript in a browser, or the macro language in office programs, but it's not the same. PCs do not invite programming the same way that the older computers did, and you have to go out of your way to provide a basic programming environment.

Schools could overcome this hurdle, and provide programming environments, but they don't. Instead they teach everyone how to use the latest versions of office programs, despite the fact that next year's release will have a different UI, and different capabilities. Yes, children need to be computer-savvy, due to the prevalence of computers in everyday life, but they don't need to be experts in using word processors. Rather, they should be taught how to learn to use the programs, the things that are common about them (e.g. menus), how to get help (the help menu, Google), and so forth, and then taught about how computers work. Yes, use a word processor for writing the occasional thing in English, or use a spreadsheet for doing some data analysis in Geography, but the "computing" lessons should be about programming and how computers work at the basic level, rather than how to use popular software.

I think this lack of teaching about the basics of computing has a wider effect, as well as the lack of new programmers. The computer is something that people don't understand, but which they rely on. This can give people a sense of powerlessness, especially when it does something unexpected. I've had to help people who've been all in a panic because they "lost their work". It didn't appear in the list that was presented in the "open file" dialog, so it was "lost". Somehow they had saved it in a different directory, and their lack of understanding about the file system meant they didn't know how to find it, and panicked — the computer that they relied on had "lost" their important work. Teaching about the basics of modern operating systems (rather than the specifics of the software package being used) would have alleviated this fear.

Addressing the Problem

So, what is to be done? Firstly, as programming parents we can teach our children about computers and programming, which is something that my wife and I have started doing. But beyond that, we need to make the schools, colleges and government aware of the issues.

Andrew points to the Computing at School working group and the Behind the Screen project, both of which seem promising. However, without support these projects will fizzle, and our children will continue to be taught how to use office software rather than computing principles.

Posted by Anthony Williams
[/ general /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Importing an Existing Windows XP Installation into VirtualBox

Wednesday, 03 June 2009

On Monday morning one of the laptops I use for developing software died. Not a complete "nothing happens when I turn it on" kind of death — it still runs the POST checks OK — but it won't rebooted of its own accord whilst compiling some code and now no longer boots into Windows (no boot device, apparently). Now, I really didn't fancy having to install everything from scratch, and I've become quite a big fan of VirtualBox recently, so I thought I'd import it into VirtualBox. How hard could it be? The answer, I discovered, was "quite hard".

Since it seems that several other people have tried to import an existing Windows XP installation into VirtualBox and had problems doing so, I thought I'd document what I did for the benefits of anyone who is foolish enough to try this in the future.

Step 1: Clone the Disk into VirtualBox

The first thing to do is clone the disk into VirtualBox. I have a handy laptop disk caddy lying around in my office which enables you to convert a laptop hard disk into an external USB drive, so I took the disk out of the laptop and put it in that. I connected the drive to my linux box, and mounted the partition. A quick look round seemed to indicate that the partition was in working order and the data intact. So far so good. I unmounted the partition again, in preparation for cloning the disk.

I started VirtualBox and created a new virtual machine with a virtual disk the same size as the physical disk. I then booted the VM with the System Rescue CD that I use for partitioning and disk backups. You might prefer to use another disk cloning tool.

Once the VM was up and running, I connected the USB drive to the VM using VirtualBox's USB support and cloned the physical disk onto the virtual one. This took a long time, considering it was only a 30Gb disk. Someone will probably tell me that there are quicker ways of doing this, but it worked, and I hope I don't have to do it again.

Step 2: Try (and fail) to boot Windows

Once the clone was complete, I disconnected the USB drive and unmapped the System Rescue CD and rebooted the VM. Windows started to boot, but would hang on the splash screen. If you're trying this and Windows now boots in your VM, be very glad.

Booting into safe mode showed that the hang occurred after loading "mup.sys". It seems lots of people have had this problem, and mup.sys is not the problem — the problem is that the hardware drivers configured for the existing Windows installation don't match the VirtualBox emulated hardware in some drastic fashion. This is not surprising if you think about it. Anyway, like I said, lots of people have had this problem, and there are lots of suggested ways of fixing it, like booting with the Windows Recovery Console and adjusting which drivers are loaded, using the "repair" version of the registry and so forth. I tried most of them, and none worked. However, there was one suggestion that seemed worth following through, and it was a variant of this that I finally got working.

Step 3: Install Windows into a new VM

The suggestion that I finally got working was to install Windows on a new VM and swipe the SYSTEM registry hive from there. This registry hive contains all the information about your hardware that Windows needs to boot, so if you can get Windows booting in a VM then you can use the corresponding SYSTEM registry hive to boot the recovered installation. At least that's the theory; in practice it needs a bit of hackery to make it work.

Anyway, I installed Windows into the new VM, closed it down rebooted it with the System Rescue CD to retrieve the SYSTEM registry hive: C:\Windows\System32\config\SYSTEM. You cannot access this file when the system is running. I then booted my original VM with the System Rescue CD and copied the registry hive over, making sure I had a backup of the original. If you're doing this yourself don't change the hive on your original VM yet.

The system now booted into Windows. Well, almost — it booted up, but then displayed an LSASS message about being unable to update the password and rebooted. This cycle repeats ad infinitum, even in Safe Mode. So far not so good.

Step 4: Patching the SYSKEY

In practice, Windows installations have what's called a SYSKEY in order to prevent unauthorized people logging on to the system. This is a checksum of some variety which is spread across the SAM registry hive (which contains the user details), the SYSTEM hive and the SECURITY hive. If the SYSKEY is wrong, the system will boot up, but then display the message I was getting about LSASS being unable to update the password and reboot. In theory, you should be able to update all three registry hives together, but then all your user accounts get replaced, and I didn't fancy trying to get everything set up right again. This is where the hackery comes in, and where I am thankful to two people: firstly Clark from http://www.beginningtoseethelight.org/ who wrote an informative article on the Windows Security Accounts Manager which explains how the SYSKEY is stored in the registry hives, and secondly Petter Nordahl-Hagen who provides a boot disk for offline Windows password and registry editing.

According to the article on the Windows Security Manager, the portion of the SYSKEY store in the SYSTEM hive is stored as class key values on a few registry keys. Class key values are hidden from normal registry accesses, but Petter Nordahl-Hagen's registry editor can show them to you. So, I restored the original SYSTEM hive (I was glad I made a backup) and booted the VM from Petter's boot disk and looked at the class key values on the ControlSet001\Control\Lsa\Data, ControlSet001\Control\Lsa\GBG, ControlSet001\Control\Lsa\JD and ControlSet001\Control\Lsa\Skew1 keys from the SYSTEM hive. I noted these down for later. The values are all 16 bytes long: the ASCII values for 8 hex digits with null bytes between.

This is where the hackery comes in — I loaded the new SYSTEM hive (from the working Windows VM) into a hex editor and searched for the GBG key. The text should appear in a few places — one for the subkey of ControlSet001, one for the subkey of ControlSet002, and so forth. A few bytes after one of the occurrences you should see a sequence of 16 bytes that looks similar to the codes you wrote down: ASCII values for hex digits separated by spaces. Make a note of the original sequence and replace it with the GBG class key value from the working VM. Do the same for the Data, JD and Skew1 values. Near the Data values you should also see the same hex digit sequence without the separating null bytes. Replace that too. Now look at the values in the file near to where the registry key names occur to see if there are any other occurrences of the original hex digit sequences and replace these with the new values as well.

Save the patched SYSTEM registry hive and copy it into the VM being recovered.

Now for the moment of truth: boot the VM. If you've patched all the values correctly then it will boot into Windows. If not, then you'll get the LSASS message again. In this case, try booting into the "Last Known Good Configuration". This might work if you missed one of the occurrences of the original values. If it still doesn't work, load the hive back into your hex editor and have another go.

Step 5: Activate Windows and Install VirtualBox Guest Additions

Once Windows has booted successfully, it will update the SYSKEY entries across the remaining ControlSetXXX entries, so you don't need to worry if you missed some values. You'll need to re-activate Windows XP due to the huge change in hardware, but this should be relatively painless — if you enable a network adapter in the VM configuration then Windows can access the internet through your host's connection seamlessly. Once that's done you can proceed with install the VirtualBox guest additions to make it easier to work with the VM — mouse pointer integration, sensible screen resolutions, shared folders and so forth.

Was it quicker than installing everything from scratch? Possibly: I had a lot of software installed. It was certainly a lot more touch-and-go, and it was a bit scary patching the registry hive in a hex editor. It was quite fun though, and it felt good to get it working.

Posted by Anthony Williams
[/ general /] permanent link
Tags:
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy