Blog Archive for / 2007 / 11 /

Testing on Multiple Platforms with VMWare

Tuesday, 27 November 2007

Whilst testing on multiple platforms is important, it can be difficult to obtain access to machines running all the platforms that you wish to test on. This is where virtualization software such as VMWare comes in handy: you don't need to have a separate machine for each tested platform — you don't even need a separate partition. Instead, you set up a Virtual Machine running the target platform, which runs on top of your existing OS. This Virtual Machine is completely self-contained, running off a virtual hard disk contained in a file on your real disk, and with a virtual screen which can be shown in a window on your host desktop.

Virtual Networks

This can be incredibly useful: not only can you test on multiple platforms without repartitioning your hard disk, but you can have multiple virtual machines running simultaneously. If you're developing an application that needs to run on multiple platforms, this can be invaluable, as you can see what the application looks like on different operating systems simultaneously. It also allows you to test network communication — each virtual machine is entirely independent of the others, so you can run a server application on one and a client application on another without having to build a physical network.

Get Started with a Pre-built Virtual Machine

VMWare have a repository of pre-built virtual machines, that they call "appliances". This makes it very easy to get started, without all the hassle of installing the OS. Some appliances even come with pre-installed applications — if you want to try a Ruby on Rails app on linux, then the UbuntuWebServer appliance might be a good place to start.

Warning: Virtual Machines use Real Resources

It's worth noting that the resource use (CPU, memory, disk space) is real, even if the machines are virtual — if you run a CPU-intensive application on your virtual machine, your system will slow down; if you give 3 virtual machines 1Gb of memory each but you only have 2Gb installed, you're going to see a lot of swapping. Virtual machines are not full-time replacements for physical ones unless you have a server with a lot of resources. That said, if you do have a server with a lot of resources, running separate systems and applications in separate virtual machines can make a lot of sense: the individual systems are completely isolated from one-another, so if one application crashes or destroys its (virtual) disk, the others are unaffected. Some web hosting companies use this facility to provide each customer with root access to their own virtual machine, for example.

It's also worth noting that if you install a non-free operating system such as Microsoft Windows, you still need a valid license.

Alternatives

VMWare Server is currently a free download for Windows and Linux, but it's not the only product out there. VirtualBox is also free, and runs on Windows, Linux and MacOSX. One nice feature that VirtualBox has is "seamless Windows": when running Microsoft Windows as the guest operating system, you can suppress the desktop background, so that the application windows from the virtual machine appear on the host desktop.

Another alternative is QEMU which offers full-blown emulation as well as virtualization. This allows you to experiment with operating systems running on a different CPU, though the emulated hardware can be quite limited.

Posted by Anthony Williams
[/ testing /] permanent link
Tags: testing, vmware, virtualization, virtual machine
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Database Tip: Use an Artificial Primary Key

Monday, 19 November 2007

If your data has a clear "master" field or combination of fields, which can uniquely identify each row (such as customer name in a table of customers or ISBN for a table of books), it is tempting to use that as the primary key of the table. However, my advice is: don't do that, use a separate, numeric, artificial primary key instead. Yes, it's an extra column on the table, requiring extra space, and you will have to generate it somehow, but that's not a big deal. Every database vendor provides some way of auto-generating unique key values (e.g. SEQUENCEs in Oracle, and AUTOINCREMENT fields in SQLite), so populating it is easy, and the complications it saves are more than worth the trade-off. You can still maintain the uniqueness of the master columns by applying a unique index to those columns.

Save Space on Foreign Keys

Firstly, if you have any tables that are associated with the master table, and therefore have foreign key columns that refer to the primary key of your master, then having a separate primary key can actually save space overall, as the data for the master columns doesn't have to be duplicated across all the linked tables. This is especially important if there is more than one "master column", such as customer_first_name and customer_last_name, or if the data for these columns is large.

Changing the master data

Secondly, if the "master columns" are actually the primary key of your table, changing the data in them is potentially problematic, especially if they are used as a foreign key in other tables. Many online services use a customer's email address as their "master column": each customer has one email address, and one email address refers to one customer. That's fine until a customer changes their email address. Obviously, you don't want to lose all data associated with a given customer just because they changed their email address, so you need to update the row rather than delete the old one and insert a new one. If the email_address column is the primary key of the table, and therefore used as the foreign key in other tables, then you've got to update the data not just in the master table, but in each dependent table too.

This is not impossible, but it's certainly more complex and time consuming. If you miss a table, the transaction may not complete due to foreign key constraint violations, or (worse) the transaction may complete, but some of the data may be orphaned. Also, in some database engines, the constraint violation will fire when you change either the master table or the dependent table, so you need to execute a special SQL statement to defer the constraint checking until COMMIT time. If you use an auto-generated primary key, then only the data in the master table needs changing.

Changing the master columns

Finally, if the primary key is auto-generated, then not only is it easy to change the data in the master columns, but you can actually change what the master columns are. For example, if you initially decide that customer_first_name and customer_last_name make an ideal primary key, then you're stuck if you then get another customer with the same name. OK, so you make customer_first_name, customer_last_name and customer_address the primary key. Oops — now you've got to duplicate the address information across all the dependent tables. Now you encounter two people with the same name at the same address (e.g. father and son), so you need to add a new designator to the key (e.g. Henry Jones Junior, Richard Wilkins III). Again, you need to update all the dependent tables. If the primary key is auto-generated, there's no problem — just update the unique constraint on the master table to include the appropriate columns, and all is well, with the minimum of fuss.

Simplify your code

It's not going to simplify your code much, but using an auto-generated numeric key means that this is all you need to store as an identifier inside your program to refer to a particular row: much easier than storing the data from a combination of columns. Also, it's much easier to write code to update the data on one table than on multiple tables.

Conclusion

Don't use real table data as the primary key for a table: instead, use a separate, numeric, auto-generated column as the primary key. This will simplify the connections between tables, and make your life easier if the structure of the database or the data in the key columns changes.

In previous posts on Database Design, I've talked about:

Posted by Anthony Williams
[/ database /] permanent link
Tags: database, primary key, foreign key
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Elegance in Software

Monday, 12 November 2007

What does it mean for software to be elegant? When I write code, elegance is something I aspire to, and in some senses goes hand-in-hand with beautiful code, but that doesn't really make it any clearer. Certainly, I think there is a strong element of "elegance is in the eye of the beholder", but I think there are also some characteristics of the software that are contributory factors — how a particular person may rate the code on any aspect may vary, as may the importance they place on any given aspect, but these aspects will almost certainly impact how elegant the code appears.

Factors affecting the elegance of software

Here's a short list of some of the factors that I think are important. Obviously, this is not an exhaustive list, and all comments are my opinion, and not definitive.

Does it work?: I'd be hard-pushed to call software "elegant" if it didn't work
Is it easy to understand?: Lots of the following factors can really be summarised by this one: if I can't understand the code, it's not elegant.
Is it efficient?: A bubble sort is just not elegant, because there's lots of much more efficient algorithms. If a cunning algorithmic trick can drastically reduce the runtime, using that trick contributes to making the code elegant, especially if it is still easy to understand.
Short functions: Long functions make the code hard to follow. If I can't see the whole function on one screen in my editor, it's too long. Ideally, a function should be really short, less than 5 lines.
Good naming: Short functions are all very well, but if functions are called foo, abc, or wrt_lng_dt, it can still be hard to understand the code. Of course, this applies to classes just as much as functions.
Clear division of responsibility: It is important that it is clear which function or class is responsible for any given aspect of the design. Not only that, but a class or function should not have too many responsibilities — by the Single Responsibility Principle a class or function should have just one responsibility.
High cohesion: Cohesion is a measure of how closely related the data items and functions in a class or module are to each other. This is tightly tied in to division of responsibility — if a function is responsible for calculating primes and managing network connections, then it has low cohesion, and a poor division of responsibility.
Low coupling: Classes and modules should not have have unnecessary dependencies between them. If a change to the internals of one class or function requires a change to apparently unrelated code elsewhere, there is too much coupling. This is also related to the division of responsibility, and excessive coupling can be a sign that too many classes, modules or functions share a single responsibility.
Appropriate use of OO and other techniques: It is not always appropriate to encapsulate something in a class — sometimes a simple function will suffice, and sometimes other techniques are more appropriate. This is also related to the division of responsibilities, but it goes beyond that — is this code structure the most appropriate for handling this particular responsibility? Language idioms come into play here: is it more appropriate to use STL-style std::sort on an iterator interface, or does it make more sense to provide a sort member function? Can the algorithm be expressed in a functional way, or is an imperative style more appropriate?
Minimal code: Code should be short and to-the-point. Overly-long code can be the consequence of doing things at too low a level, and doing byte-shuffling rather than using a high-level sort algorithm. It can also be the consequence of too many levels of indirection — if a function does nothing except call one other function, it's getting in the way. Sometimes this can be at odds with good naming — a well-named function with a clear responsibility just happens to be able to delegate to a generic function, for example — but there's obviously a trade-off. Minimal code is also related to duplication — if two blocks of code do the same thing, one of them should be eliminated.

One thing that is not present in the above list is comments in the code. In my view, the presence of comments in the code implies that the code is not sufficiently clear. Yes, well-written comments can make it easier to understand a given block of code, but they should in general be unnecessary: truly elegant code can be understood without comments. Of course, you might need to understand what it is that the code is trying to accomplish before it makes complete sense, particularly if the code is using advanced algorithms, and comments can help with that (e.g. by providing a reference to the algorithm), but my general view is that comments are a sign of less-than-perfect code.

Let me know what you think constitutes elegant code.

Posted by Anthony Williams
[/ design /] permanent link
Tags: design, elegance, software
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Review of Patterns for Parallel Programming by Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill

Thursday, 01 November 2007

Cover Image for Patterns for Parallel Programming

This book gives a broad overview of techniques for writing parallel programs. It is not an API reference, though it does have examples that use OpenMP, MPI and Java, and contains a brief overview of each in appendices. Instead, it covers the issues you have to think about whilst writing parallel programs, starting with identifying the exploitable concurrency in the application, and moving through techniques for structuring algorithms and data, and various synchronization techniques.

The authors do a thorough job of explaining the jargon surrounding parallel programming, such as what a NUMA machine is, what SPMD means, and what makes a program embarrassingly parallel. They also go into some of the more quantitative aspects, like calculating the efficiency of the parallel design, and the serial overhead.

Most of the content is structured in the form of Patterns (hence the title), which I found to be an unusual way of presenting the information. However, the writing is clear, and easily understood. The examples are well though out, and clearly demonstrate the points being made.

The three APIs used for the examples cover the major types of parallel programming environments — explicit threading (Java), message passing (MPI), and implicit threading from high-level constructs (OpenMP). Other threading environments generally fall into one of these categories, so it is usually straightforward to see how descriptions can be extended to other environments for parallel programming.

The authors are clearly coming from a high-performance computing background, with massively parallel computers, but HyperThreading and dual-core CPUs are becoming common on desktops, and many of the same issues apply when writing code to exploit the capabilities of these machines.

Highly Recommended. Everyone writing parallel or multi-threaded programs should read this book.

Buy this book

Patterns for Parallel Programming
Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill
Published by Addison-Wesley
ISBN 0-321-22811-1

Buy from Amazon.co.uk
Buy from Amazon.com

Posted by Anthony Williams
[/ reviews /] permanent link
Tags: reviews, threads, patterns, books
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Previous Entries Later Entries

Just Software Solutions

About Us

Technical Writings

Subscribe to Blog

Blog Archives

Blog Archive for / 2007 / 11 /

Testing on Multiple Platforms with VMWare

Tuesday, 27 November 2007

Virtual Networks

Get Started with a Pre-built Virtual Machine

Warning: Virtual Machines use Real Resources

Alternatives

Database Tip: Use an Artificial Primary Key

Monday, 19 November 2007

Save Space on Foreign Keys

Changing the master data

Changing the master columns

Simplify your code

Conclusion

Related Posts

Elegance in Software

Monday, 12 November 2007

Factors affecting the elegance of software

Review of Patterns for Parallel Programming by Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill

Thursday, 01 November 2007

Buy this book