“Loose Coupling”. Don’t snigger, because loose coupling is one of the most important ideas in software development: that program A should be able to use program B without caring about how B does its job.
Coupling has parallels everywhere in every day life. The economic theory of price signals is one — how firms allocate resources is, in general, supremely unimportant to the buyer. All the buyer knows is the price. Loosely coupled economies are flexible, with individual firms able to connect to each other without having to bear the costs and complexities of knowing how their suppliers and buyers do business. They can stick to their knitting.
Recently I have started chatting to an engineer at Automattic, the mob which employs most of the lead programmers on WordPress. They also run the WordPress.com service, which is a big job — millions of blogs, tens of millions of monthly visits.
I complained to him that one of my annoyances in life is how complex WordPress actually is. Why? Because you don’t just have to configure WordPress to get anywhere. To get it to perform acceptably you can either throw powerful hardware at the problem (which is how Club Troppo has done it since our donation drive last year) or you can implement a whole rogue’s gallery of tweaks and adaptations.
But you can’t do it to WordPress. You have to fiddle with the ‘stack’11. Stacks and Stacks: The most famous stack of web software is LAMP: Linux, Apache, MySQL, PHP; which are an Operating System, a web server, a database and a programming language. And you need to configure and integrate all of them for WordPress to work [↩], the software WordPress relies on. To make WordPress sing I have changed web servers — from Apache to Lighttpd — and aggressively tuned the latter. I have tuned MySQL, the database it relies on. I’ve tuned the operating system. I’ve installed caching software. And so on, and so forth.
To me, this defeats the claim that WordPress, or any other software using the LAMP stack, is loosely coupled. I need to understand a bit about all of the other parts to make the one I’m interested in works well.
Sometimes this work is done for me. If I choose a full-service webhost, it’s possible that they have already bundled WordPress and other LAMP apps in their service. Also, most Linux distributions include ways to deploy a fully configured LAMP stack in some sort of working order. But in the first case I can’t tune it, and in the latter case, I have to tune it22. Eg: By way of example, the default configuration for Lighttpd and PHP on Ubuntu is slightly broken. Even if you set up so-called FastCGI, the regular CGI is also configured to capture PHP programs. Regular CGI is substantially slower because it launches a new copy of PHP every time instead of just reusing a single copy. I only discovered this little glitch a few days ago while tweaking a WordPress installation over on my Ozblogistan server. [↩].
The separation of concerns in the LAMP stack are largely historical in their origin. They are not selected for design reasons; they are selected because That’s How It Has Always Been Done — or at least since circa 1998, which is ancient history in internet years. The Apache web server emerged by itself and later on people glued it to MySQL databases with Perl. Still later came the PHP language which usurped Perl’s place in this menagerie. But at no point has there be an assessment whether the benefits of the abstraction barriers between the different bits is outweighed by the high cost of integration and configuration.
Sometimes, historical abstraction layers get removed or collapsed together. In file systems, for instance, there has been a historical difference between the file system layer and the volume manager. This came to be because the volume manager evolved later and relies on the file system layer. Yet they are both just parts of the larger problem, which is managing storage. A new filesystem from Sun called ZFS punched through several of these layers and has made amazing advances in reliability and manageability as a result.
A non-technical example might be the old Russian way of running a store. Go to table one, get a numbered ticket. Go to table two, get the item on the ticket. Go to table three, ticket gets stamped and the item bagged. Go to table four, pay for item, then leave. Each of these operations is separate largely for historical reasons. Collapsing the natural order of things leaves us with the concept of the checkout operator: a single point of sale where items are identified, paid for and bagged.
I think blog software deserves this kind of revisitation. Web software in general is moving away from the shared hosting model where flexibility at every layer is an advantage, and towards a single-application model where only one major app is installed per server instance. Rather than a single server hosting a dozen different web applications, there are a dozen different servers — most of them virtual — hosting a single application each.
In this scenario, having four layers before you reach the application itself seems a bit unnecessary. My instinct is that standalone web software, like blog applications, would benefit from absorbing the web serving layer and the data storage layers as libraries, rather than standalone applications in their own right. Done properly this could dramatically simplify the tuning and development of the application; though done incorrectly it could be a performance nightmare. But that’s a rant for another time.