The Odd Couples

“Loose Coupling”. Don’t snigger, because loose coupling is one of the most important ideas in software development: that program A should be able to use program B without caring about how B does its job.

Coupling has parallels everywhere in every day life. The economic theory of price signals is one — how firms allocate resources is, in general, supremely unimportant to the buyer. All the buyer knows is the price. Loosely coupled economies are flexible, with individual firms able to connect to each other without having to bear the costs and complexities of knowing how their suppliers and buyers do business. They can stick to their knitting.

Recently I have started chatting to an engineer at Automattic, the mob which employs most of the lead programmers on WordPress. They also run the WordPress.com service, which is a big job — millions of blogs, tens of millions of monthly visits.

I complained to him that one of my annoyances in life is how complex WordPress actually is. Why? Because you don’t just have to configure WordPress to get anywhere. To get it to perform acceptably you can either throw powerful hardware at the problem (which is how Club Troppo has done it since our donation drive last year) or you can implement a whole rogue’s gallery of tweaks and adaptations.

But you can’t do it to WordPress. You have to fiddle with the ‘stack'11. Stacks and Stacks: The most famous stack of web software is LAMP: Linux, Apache, MySQL, PHP; which are an Operating System, a web server, a database and a programming language. And you need to configure and integrate all of them for WordPress to work [], the software WordPress relies on. To make WordPress sing I have changed web servers — from Apache to Lighttpd — and aggressively tuned the latter. I have tuned MySQL, the database it relies on. I’ve tuned the operating system. I’ve installed caching software. And so on, and so forth.

To me, this defeats the claim that WordPress, or any other software using the LAMP stack, is loosely coupled. I need to understand a bit about all of the other parts to make the one I’m interested in works well.

Sometimes this work is done for me. If I choose a full-service webhost, it’s possible that they have already bundled WordPress and other LAMP apps in their service. Also, most Linux distributions include ways to deploy a fully configured LAMP stack in some sort of working order. But in the first case I can’t tune it, and in the latter case, I have to tune it22. Eg: By way of example, the default configuration for Lighttpd and PHP on Ubuntu is slightly broken. Even if you set up so-called FastCGI, the regular CGI is also configured to capture PHP programs. Regular CGI is substantially slower because it launches a new copy of PHP every time instead of just reusing a single copy. I only discovered this little glitch a few days ago while tweaking a WordPress installation over on my Ozblogistan server. [].

The separation of concerns in the LAMP stack are largely historical in their origin. They are not selected for design reasons; they are selected because That’s How It Has Always Been Done — or at least since circa 1998, which is ancient history in internet years. The Apache web server emerged by itself and later on people glued it to MySQL databases with Perl. Still later came the PHP language which usurped Perl’s place in this menagerie. But at no point has there be an assessment whether the benefits of the abstraction barriers between the different bits is outweighed by the high cost of integration and configuration.

Sometimes, historical abstraction layers get removed or collapsed together. In file systems, for instance, there has been a historical difference between the file system layer and the volume manager. This came to be because the volume manager evolved later and relies on the file system layer. Yet they are both just parts of the larger problem, which is managing storage. A new filesystem from Sun called ZFS punched through several of these layers and has made amazing advances in reliability and manageability as a result.

A non-technical example might be the old Russian way of running a store. Go to table one, get a numbered ticket. Go to table two, get the item on the ticket. Go to table three, ticket gets stamped and the item bagged. Go to table four, pay for item, then leave. Each of these operations is separate largely for historical reasons. Collapsing the natural order of things leaves us with the concept of the checkout operator: a single point of sale where items are identified, paid for and bagged.

I think blog software deserves this kind of revisitation. Web software in general is moving away from the shared hosting model where flexibility at every layer is an advantage, and towards a single-application model where only one major app is installed per server instance. Rather than a single server hosting a dozen different web applications, there are a dozen different servers — most of them virtual — hosting a single application each.

In this scenario, having four layers before you reach the application itself seems a bit unnecessary. My instinct is that standalone web software, like blog applications, would benefit from absorbing the web serving layer and the data storage layers as libraries, rather than standalone applications in their own right. Done properly this could dramatically simplify the tuning and development of the application; though done incorrectly it could be a performance nightmare. But that’s a rant for another time.


Related rants from yours truly:
Individual and Community
Blogging: The Next Generation
Blogging Software: Who Cares?

This entry was posted in Blogs TNG, Geeky Musings, IT and Internet. Bookmark the permalink.

27 Responses to The Odd Couples

  1. gilmae says:

    I complained to him that one of my annoyances in life is how complex WordPress actually is. Why?

    ooc, what did he say in response?

  2. Jacques Chester says:

    Essentially his answer was, “are you suggesting we more tightly couple things?”

    Whereas my point is: they’re already tightly coupled, mostly for historical reasons.

  3. In the Soviet system you would go to the fifth table to actually pick up your item. Or come back the next week.

  4. David Rubie says:

    I dunno Jacques – MySQL has other uses than a backend for WordPress, and merging a DB layer directly into the code is generally a bad idea that’s been tried before (with much hair pulling afterwards) – those old pre-DB2 IBM mainframe databases come to mind, you end up with 15 versions of the DB code spread across your applications because none of them work with one version, and all the embedded versions have been tweaked up the kazoo to work just with your app. Similarly with the HTTP server (although much less so – there are quite a few applications that embed minimal HTTP service functionality into the application or device).

    Fundamentally, I think it’s been tried and rejected (probably guaranteeing it’ll be tried again). .NET comes to mind – everything is floating around as a library of some sort, but it still doesn’t buy you the convenience of not having to tune the different layers when you hit a performance problem.

    I think the ultimate incarnation of this kind of thinking was the early object based systems (Lisp or more specifically CLOS, and Smalltalk). Everything you need was embedded in a running “image” with Smalltalk, with loose coupling being the object interfaces. It *still* needed tuning when it performed badly.

    So, in a long winded way, I don’t think tighter or looser coupling of code bases really buys you much when performance is a problem, as the problem is really taking general purpose code and specialising it. Every time a bit of specialisation takes place, you get an incompatible fork of something and suddenly you’re up Mac Plus creek without a ram stick, while your competitors sail past having successfully compromised.

  5. Jacques Chester says:

    David;

    MySQL has other uses than a backend for WordPress

    Yes, but WordPress doesn’t really use a fraction of what a full SQL database provides, but still gets all the complexity.

    I do recognise that there are problems with versioning, but that’s a problem with schema evolution too. Common Lisp has some degree of hot-swappability and Erlang (to pick a currently trendy example) even more so.

    As for the HTTP server, that problem has been solved so thoroughly so many times it’s not funny. People don’t want pure HTTP servers any more; they want a plugin engine built around the protocol. It seems to me to be valid to fold HTTP into a plugin or library and include it in the base application.

    I suppose I didn’t really make my point well. It wasn’t so much about performance tuning; that was just the example. It was about complexity that might not really be necessary. Performance tuning is more complex because I have to do it in a bunch of places.

  6. David Rubie says:

    Jacques Chester wrote:

    It was about complexity that might not really be necessary.

    That perhaps is a reflection on WordPress, rather than unnecessary complexity. I remember thinking much the same thing on first encountering database programming – thinking it was an overly complex way to solve something easy (i.e. what’s wrong with flat files?). In some cases, flat files are perfect, but for anything where the data is going to get large, you need indexing capabilities, an interface to make programming easier, mechanisms to clean up removed data without fragmentation ruining things, concurrency features etc. As soon as you step beyond trivial, complexity starts to compound. I think it’s hard to characterise as unnecessary even if you aren’t using 100% of the features available.

    I would equate schema evolution and object versioning – there’s just no getting away from breaking an interface every now and again. That’s a problem that there is no solution to.

    I’d agree about HTTP though – it’s simple protocol, widely implemented and there are few risks in “re-inventing the wheel” because it’s not much more complex than a wheel. Databases and database implemention, however, is a different animal.

  7. Jacques Chester says:

    David;

    I don’t disagree with you — I am a big fan of relational databases. I am also interested in the so-called ‘impedance mismatch’ between object oriented programming and databases, which is a related problem.

    I’m not proposing flatfiles as an alternative. Possibly something along the lines of persistent objects as you can find in several flavours in Smalltalk and Common Lisp. You give up some of the declarative advantages of an RDBMS, including fast, flexible ad hoc querying, in exchange for an almost invisible data layer.

    Most LAMP software is just using MySQL store-and-fetch, not decision support, so in this case the tradeoff could be sound.

  8. Jacques Chester says:

    I dont disagree with you I am a big fan of relational databases.

    “Some of my best friends are relational databases!” :)

    For example, I work in a firm that does HR software. The next generation of product is currently being designed and I am definitely keen on using an RDBMS as the backing store and authoritative data model.

  9. David Rubie says:

    Object databases are my worst nightmare. We played with a few of them in the early nineties with Smalltalk and C++ – oh the humanity!

    All of them were based entirely on “navigation” like the old IMS databases of crusty programmer nightmare – if you didn’t know where to start, you were stuffed. If you couldn’t remember the record layout, you were stuffed. If somebody accidently broke the only link to a big table of stuff, you were stuffed. Those old guys with long, grey beards know a thing or two if you come across them.

    Some of the truly awful object DBs (not naming names) stored memory snapshots of your objects, so if you wanted to try sharing between ST and C++ or a PC and a Sun you were SOL. Some of them fell to bits when you changed your object as the object versioning stuff was either broken or missing. The good ones (that stored things in an architecture and language independent way) were like molasses and were fundamentally untuneable.

    Then, one of the geniuses I worked with decided he could use the BLOB feature of Sybase to store objects if he shared the marshalling/unmarshalling code and had a library that checked the version of the object before loading it. That resulted in a runaway Smalltalk program that tried to store every object in a running image into the Sybase database. Same genius got knuckle rap, then went ahead and implemented it anyway with the broken feature (the bit that walked the object tree a leeetle too far when storing) sorta-kinda-fixed. Into production it went. Said application took 20 minutes to start after 3 months and as the data wasn’t indexable, it wasn’t tuneable and the Sybase DBA washed his hands of the team.

    Then, an unnamed junior programmer “fixed” that feature in a running, production environment and hosed the app completely, resulting in much hair pulling and me having to debug it. Live. With crowd.

    Object storage – just say no. Don’t get me started on Corba as I will have to punch somebody.

  10. James A says:

    I’ll start out by saying Sun have what they call Cool Stack, a set of pre-optimised AMP packages for Solaris, so it’s not quite as dire as you portray (although they added lighttpd recently).

    The performance nightmare is pretty obvious in most Java web applications – in theory the application server is responsible for managing all that icky HTTP, database and authentication goo in one place. In practice they turn out to be poorly performing piles of shit that still have byzantine configuration/tuning. This might be because Java also encourages flexibility at each layer so it’s not really an integrated system, it’s still a katamari of discrete components.

    Most non-dedicated webservers (ie language libraries) tend to have poor performance – the way to performance is mod_perl, not use Apache; mod_python not import BaseHTTPServer. I think this is because the HTTP model is actually quite different to imperative programming, so na

  11. Jacques Chester says:

    All of them were based entirely on navigation like the old IMS databases of crusty programmer nightmare – if you didnt know where to start, you were stuffed.

    I got to chatting with a professor of mine the other week about this. You’re quite right that most object databases are hiding the network-database nature of object graphs. My hunch is that the answer will actually be to bend object oriented languages to include relational theory, rather than treating RDBMSes as ugly, inconvenient but necessary datastores.

    As for your story — you should write it up and submit it to the Daily WTF.

  12. Jacques Chester says:

    I mean, to work with an SQL database you need to know the schema anyway.

  13. James A says:

    Yeah, but ORMs range from ActiveRecord, which will build objects from the schema, to ORMs which use the db as a glorified key-value store with no schema and do all the magic away from the SQL server, to things that make the Daily WTF. Even so, schema sucks in SQL, and it sucks in LDAP. Schema-less dbs do exist (couchdb is the first one that comes to mind) but suck differently. So while I’m talking about things that don’t exist (some sort of non-sucky schema), I’d really like a database that does versioning natively. RDF is particularly bad at this … although some sort of file-based RDF store plugged into git would be hot. Hmm. Some sort of query language or ORM on top of small files stored in git would be hot.

  14. Your argument seems incoherent. On one hand you complain about coupling in the LAMP stack, and on the other hand you propose collapsing the layers in a way that create tighter coupling.

    Not only that, but what you are proposing makes scalability a nightmare. Scaling the LAMP stack is usually easy:

    You add frontends as required. Once you hit database limits, you add replicas.

    A well written LAMP style application will handle this well. At some point you’ll add one or more caches to reduce load on the backends. And as with anything there’s a number of ways of tuning each element.

    That some LAMP based apps aren’t designed to handle this well is an application problem, not a problem with the separation.

    Now, there’s nothing wrong with for example linking the app into a HTTP library, as long as the interfaces are narrow and clearly defined, but the main reason people do that is when they use languages (like Ruby) where the startup cost for the interpreter is significant and/or where there’s poor support for the “traditional” model. In the Ruby community at least a lot of people have been clamoring for a mod_php like module for a while, and Passenger is promising to deliver at least part of that functionality for Rails, but even when using a standalone server, we tend to put things behind Apache, Nginx etc. – in a sense we’re adding a layer to the stack, not collapsing it.

    Revisiting the interface boundaries is useful, but you’re arguing about layering that is well established for a reason: Separating out data storage for example is done exactly because scaling and managing storage and processing are vastly different in terms of how you approach it, and pinchpoints.

    Separating out the data storage in a database (what type of database it is is secondary) allow you to scale each element based on when you run into limits, which is vital in a large scale environment. And nothing stops you from running it in a single process if you like – there’s always solutions like Sqlite. Again the point of the separation is stable interfaces, not whether or not it runs in the same process.

  15. Jacques Chester says:

    Vidar;

    I don’t think you see my point. It’s not loosely coupled as it is now because I need to know the whole stack to scale up WordPress — I need to know MySQL replication and possibly get up to speed on memcached. Each individual component is coherent, which is a good design indicator; but they are still too tightly coupled for my taste.

    It’s been interesting getting feedback so far.

  16. gilmae says:

    Does WordPress only work with MySql, or do you mean you need to know about database replication in general to scale up WordPress? I ask as someone who has willfully ignored WordPress as much as possible and knows almost nothing about it specifically.

  17. Jacques Chester says:

    It only works with MySQL.

  18. David Rubie says:

    Jacques wrote:

    I mean, to work with an SQL database you need to know the schema anyway.

    The neat thing about modern RDBMS’s is they contain the meta-information to discover bits of the schema – not quite self describing, but good enough to be able to write generic DB management tools and externally walk relational dependencies. The whole “object/relational impedence mismatch” is just a load of hooey – while it’s non-trivial to write a good object/db layer, they can be found just about anywhere so you can treat it as a solved problem.

    James A wrote:

    although some sort of file-based RDF store plugged into git would be hot. Hmm. Some sort of query language or ORM on top of small files stored in git would be hot.

    James, it’s been done and it was called Pick – quite successful in it’s day (20 years ago it dropped off in competition with networked relational databases).

  19. Helen says:

    A non-technical example might be the old Russian way of running a store. Go to table one, get a numbered ticket. Go to table two, get the item on the ticket. Go to table three, ticket gets stamped and the item bagged. Go to table four, pay for item, then leave.

    That’s how the Lions club were doing things at the sausage sizzle at Bunnings the other day. Communists!

    Do you have time for a question from a mere front-end user? :-) Why does it take aeons for WordPress to save a post containing an embed link to a YouTube video? I mean, it’s only HTML innit? I don’t see why the nature of what it links to would affect anything. Maybe this has something to do with the problems you describe but I’m too technically clueless to see it.

    My workplace uses Smalltalk and the stories I could tell! We are right this moment rebuilding everything in Javascript. There is no support available for Smalltalk any more, it’s an orphan.

  20. gilmae says:

    I love javascript as much as the next web developer, but god I hope you meant to say Java.

  21. David Rubie says:

    Helen wrote:

    My workplace uses Smalltalk and the stories I could tell! We are right this moment rebuilding everything in Javascript. There is no support available for Smalltalk any more, its an orphan.

    It was terribly fashionable 10-15 years ago. But as gilmae said, I can’t see a smalltalk system running in Javascript unless you’re talking about one of those funky systems that spat out web pages directly from a running Smalltalk program, in which case you’ll still need something else at the back doing the grunt work.

    I really, really miss the workspaces in Smalltalk, where you could just type a bit of code, say “do it” and look at the results. Makes incremental development trivial, but the whole system suffered from the tightly coupled syndrome. Since everything is running in one complete image, to deploy a program you throw things away (rather than in a normal environment where necessary things are included). Some developers just didn’t “get it” and struggled with the whole concept, never successfully packaged an image for deployment and relied on their cow-orkers to do it for them. These were smart people, but the whole environment is so alien it’s no wonder it withered (kinda like Pick which suffered in the same way).

  22. Helen says:

    Sorry, I did mean java. I do know the difference, but am a bit dyslexic with the two.

  23. Jacques,

    For WordPress this may be the case, but you then used that as an argument against the LAMP style separation, when the problem is WordPress not LAMP.

    My point being that the LAMP separation is the bare minimum for a reasonably scalable web app – for a large setup you’re likely to end up with more layers, not fewer, simply because you will end up wanting to be able to scale out at the pinch points, and the web application server and the database at the very least are completely different in terms of how you scale them.

  24. James A says:

    I just came across http://oubiwann.blogspot.com/2008/05/mantissa-alternative-to-lamp.html might be what you’re talking about.

  25. James A says:

    http://blogs.awesomeplay.com/elanthis/archives/2008/05/28/464/ is a much more eloquent statement of PHP’s failings than I managed. Also the point you made Jacques made in IRC: “Unfortunately for all of us working with PHP professionally, the only thing PHP has going for it that any other languages dont is that PHP comes as standard in pretty much every web hosting provider service out there.”

  26. Jacques Chester says:

    James;

    That’s the best critique of PHP I’ve ever seen. Mostly I have fixated on the usual polluted namespace and crappy typing as well as the other warts.

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via e-mail. You can also subscribe without commenting.