A quick nerdy question
Posted by Jacques Chester on Thursday, August 7, 2008
Poking around Wordpress source code, I ran across the database schema, which includes these gems:
user_login varchar(60) NOT NULL default ``, user_pass varchar(64) NOT NULL default ``,
I’d be interested to learn why applying a NOT NULL constraint, immediately followed by a default value of an empty string, makes any goddamn sense.
This entry was posted on Thursday, August 7th, 2008 at 12:15 PM and filed under Geeky Musings, IT and Internet.
Follow comments here with the RSS 2.0 feed.
Apologies. Comments and trackbacks are both currently closed.

It doesn’t even sound legal. Wordpress uses MySQL, doesn’t it? I don’t know much about it (I use Oracle at work), but I’d expect the DBMS to complain once the table definition had been submitted to it.
Posted on 07-Aug-08 at 1:18 pm | PermalinkIt means that it can’t be null, and if you enter nothing, the default value will be null, so don’t allow the save.
Posted on 07-Aug-08 at 1:21 pm | PermalinkMS SQL Server is quite happy to allow you to create a table like that. And it will allow you to do inserts that only insert one coulmn, using the default for the other column. And if you explicitly try and insert a null, it will error.
I can barely even remember PHP and its mysql functions, but does it throw an exception if you try and do things with a database null that are perfectly acceptable with a string, like comparisons, &c? C# does, it will whine like a leftie in a Starbucks if you try and cast DBNull into a string. If you leave off the NOT NULL and then try and insert a NULL, it accepts the NULL, it doesn’t use the default.
So if a
Posted on 07-Aug-08 at 1:39 pm | Permalinklazyefficient developer is trying to avoid having to test every single cell from a dataset against DBNull.Value before casting it, they would use NOT NULL default ” to make sure a NULL never gets in.gilmae, I think you’ve nailed it. I’ve got a feeling that NULL is not the same as ” (generally), so the field will be forced to hold an empty string (which can be used in string comparisons) if you enter nothing, rather than NULL which can’t be compared with anything in a SELECT.
I’m just starting to recall everything Chris Date wrote about NULLs as harmful …
Posted on 07-Aug-08 at 2:08 pm | PermalinkStill, on reflection why would you want an empty string as a user id, since NOT NULL disallows entering nothing anyway. You can also (at least in Oracle) force a value to be entered by making that column (part of) the primary key for the table.
Posted on 07-Aug-08 at 2:12 pm | Permalinkempty string and NULL are different.
NULL means that nothing has been added, nothing provided, you cant join on NULL
An empty string is just that – data – a string, and empty string to be sure, but subtly different from nothing
Posted on 07-Aug-08 at 2:19 pm | PermalinkIndeed if you do an ALTER TABLE to modify a string field to be NOT NULL, it’s generally necessary to include something like DEFAULT ” after it so that the RDBMS knows what to do with any exist NULL values.
Posted on 07-Aug-08 at 4:05 pm | PermalinkSo you’ve really got it in for WordPress, huh?
Posted on 07-Aug-08 at 5:40 pm | PermalinkOh boy – another front opens in the nulls are not null-strings holy war.
My perfectly rational position on this – the only rational position – I shall keep to myself. So there.
Posted on 07-Aug-08 at 7:28 pm | PermalinkSo it’s a workaround to avoid the entire point of having a NOT NULL field in the first place? Nice.
Posted on 07-Aug-08 at 7:29 pm | PermalinkPurity never survives first contact with the project plan, Jacques.
Posted on 07-Aug-08 at 7:45 pm | PermalinkOr contact with PHP programmers.
Posted on 07-Aug-08 at 7:49 pm | PermalinkIIRC you can do a select on the string ‘NULL’ in MySQL and get back a listing of all the null entries in that column for the table.
Posted on 08-Aug-08 at 2:38 am | PermalinkIn Java and C++ null is what you get when you try to use a reference before you have allocated memory, i.e. called “new”. It like saying “Look at that thing over there” but there is not thing over there yet there is “thing”.
Posted on 08-Aug-08 at 8:26 am | PermalinkNULL in SQL is quite different from a null pointer (or whatever) in Java or C++, pbrosnan. In SQL, it means “we don’t care what the value is, or at least we don’t know it, so it has no value.” It causes enormous problems, as you have to have really ugly WHERE clauses like “WHERE emp_name = ‘Smith’ OR emp_name IS NULL”, particularly when you’re joining a bunch of tables. See any one of an almost infinite number of books by Chris Date on why NULL is harmful, if you can stay awake through the experience. (He’s right, but extremely tedious about it.)
Despite my early brain-fart yesterday, I have no fucking idea why they included ” as a default value, as the NOT NULL constraint stops you entering an empty string anyway. Maybe it’s something to do with early versions of MySQL being really crap at implementing the relational model.
Posted on 08-Aug-08 at 2:45 pm | PermalinkI think that part of the problem with NULLs is that they get used for sentinel values, but they have a kind of anti-semantics in that they don’t mean anything. In practice a field with a NULL could actually be a NULL, or it could mean there was no data to enter, or it might mean something application-specific like “This person no longer works here”.
If you allow NULLs in your model there is no way to distinguish these potential meanings (NULL? Empty field? Sentinel value? Meaningful business rule?) within the model. You need to define it elsewhere. I think that’s the gist of Chris Date’s tedious harping about it.
In practice NOT NULL is often used to force data entry; what it really means is “empty field not OK”. Which is why this SQL — which forbids NULLs but then allows a meaningless empty string to be entered — sorely puzzles me.
It could well be a lingering MySQLism. Wordpress retain compatibility with MySQL 4 as I recall. It’s also a fun fact that they use the TEXT datatype for several fields which the non-integrity-maintaining MyISAM engine supports and the integrity-protecting InnoDB engine doesn’t support.
So you can have TEXT fields, or referential integrity, but you can’t have both. Seriously.
Posted on 08-Aug-08 at 2:56 pm | PermalinkActually not quite – I sometimes define fields as NOT NULL DEFAULT ” because it’s easier to query data when there’s no possibility of NULLs. It also takes slightly less storage and can query slightly faster.
Posted on 08-Aug-08 at 4:19 pm | PermalinkSo another possible meaning — performance hack.
Posted on 08-Aug-08 at 4:29 pm | PermalinkChrist on a bike! They used MyISAM? Gak. (Actually I heard an early implementation was flat files, but that may be a canard.)
ISAM is fast, but (as you point out, Jacques) you can chuck data integrity out the window. The system I manage at work is basically ISAM shoehorned onto an Oracle database – all the performance disadvantages of a RDBMS, with none of the referential integrity advantages. I hate it.
Posted on 08-Aug-08 at 8:17 pm | PermalinkWhen you actually use NULL for what it was designed for, it works well. Unfortunately, many people don’t.
My understanding is that the most widely accepted principle is that the NULL value is only ever to be used when data is missing, either because it is unknown or inapplicable. Although it is possible to use NULL for something application specific, most SQL programmers would consider it dirty practice (and semantically wrong use of NULL).
Thus, finding NULL in a string value implies that we do not know enough about this value, and are unable draw any implications from it. Finding an empty string implies that we do actually know this value is empty. All part of the complex modeling of uncertainty based on ternary logic (true / false / unknown). Many operations in SQL treat NULL as a special case, such as index generation and uniqueness enforcement. Different vendors deliver subtly different semantics for these special cases and programmers get tricked by it all the time.
Excluding NULL from your data is almost always a good idea, except when you have a genuine need to model missing and/or unknown data. NULL can never match a join and will never match a “normal” selection operation, consider:
This selection will exclude the rows where x is NULL, because an unknown will always remain unknown and we do not have the confidence of being sure of either anything that it is, or anything that it isn’t. Thus special “IS NULL” and “IS NOT NULL” clauses need to be introduced into the SELECT, just to handle those cases.
Posted on 08-Aug-08 at 9:27 pm | PermalinkTel;
“What NULLs are meant for” sounds a lot like “pointers are safe in the right hands”. Quite true, all of it, but unfortunately all drowned out by the hollow booming sound of feet being blown off.
Posted on 08-Aug-08 at 11:47 pm | PermalinkAaaah, holy wars.
I’m with Tel, one of those semantic nazis who dislikes the typical SQL interpretation (which even Oracle uses) that treats a zero-length string as a null, MOST of the time.
Consider the semantic difference between values for a person’s middle name:
* Null = Dunno what it is
* ZeroLength = There is NO middle name
Tel is correct that any operation involving an unknown operand (apart from IS NOT unknown or IS unknown) is undecideable. What annoys the hell out of me is the semantic conflict between the following:
select 1 + null from dual (returns null);
select concat(null,’B') from dual; (returns ‘B’);
Personally, with a varchar, I’d implement the internals such that a non-negative value in the length component implies a known value, while a negative length implies an unknown.
This would let the following statements to return something semantically correct:
* select 1 from dual where null=”
** No rows returned (patently false)
* select 1 from dual where null=null
** No rows returned (undecideable)
* select 1 from dual where ”=”
** 1 returned (because two knowns are identical)
Has anyone got a copy of Codd and/or Date handy?
I’d also argue that we need another “type” of NULL, to differentiate an unknown value from one that (in context) requires INVALID.
e.g. “Does that thing smell like an orange?”
* If thing is an orange, then answer is yes
* If thing is ammonia, then answer is no
* If you haven’t smelt it, but it CAN be smelt, then answer is null
* If thing cannot be smelt (it is the COLOR orange) then answer is “INVALID”
(And don’t mention synaethetes)
Putting this in arithmetic, we should have 4-state logic, not merely 3-state.
Posted on 09-Aug-08 at 10:10 am | Permalink* Q: Does 2=3 A: No
* Q: Does 2=2 A: Yes
* Q: Does 2=x A: Dunno, but I might be able to figure it out
* Q: Does 2/0 = 3/(2*0) A: Do you want me to go bonkers like Cantor?
Dave;
If you eliminate NULL then the DBA has to explicitly work out the boundary cases rather than just punting to the semantically wishy-washy NULL. I’m fairly sure that’s the gist of Date’s complaint.
Posted on 09-Aug-08 at 12:44 pm | PermalinkJacques
I’m not arguing that NULL is extremely useful: I dunno how I’d live without it (especially in triggers so I can set up decent COMPUTED default values rather than static declarations).
I’m just annoyed by what you rightly call “wishy washy semantics”, especially as it’s hard to get around the problem of identities (zero length strings) not being able to equal each other, or compare lengths of strings properly. I wish there were more functions to deal with this elegantly in SQL, especially with varchars.
Indeed, if anything, I’m arguing for another (“It is invalid to have a value here”) value. And the thought of going back over the SQL systems I written still in production (one approaching its 21st birthday) to deal with the nuances of the INVALID value fills me with dread.
And give me postgres where check constraints etc can reference user-written (rather than just inbuilt) functions that can deal with the gories of interpreting values in context. (Have you tried pg’s WHERE clause that can be used with a CREATE UNIQUE INDEX statement – e.g. unique on something where “end date” is null? Miss that when I’m designing for everything else. http://www.postgresql.org/docs/8.3/static/indexes-partial.html – even if it makes Codd roll in his grave.)
Posted on 09-Aug-08 at 4:04 pm | PermalinkYou certainly couldn’t live without NULLs; it is essential for foreign key fields where the relationship is empty.
Posted on 09-Aug-08 at 4:57 pm | PermalinkJacques,
I dunno, the crowd I hang with all feel comfortable with pointers. I’ve run into people who think they can get things right first time without any testing because they use a “safe” language (or a “strong typed” language, or something similar) and that makes me plenty nervous. When someone gets a device driver working without the use of any pointers, I’ll be impressed.
Putting it a different way, you go build your idiot proof system, I’ll go find a better class of idiot… I think we know who’s gonna win
Dave,
PostgreSQL is more semantically strict than Oracle. The PostgreSQL guys take their three value logic seriously. If you concatenate strings (not with the CONCAT function but with the “||” operator) then a single NULL anywhere in the concatenation will wipe the slate clean and give a NULL output. Also, there’s a few other tricks:
select NULL and true; gives NULL
select NULL or true; gives true
select NULL and false; gives false
select NULL or false; gives NULL
There’s good reasons for all of the above, but still many people get confused by it. Modeling uncertainty really is difficult, and each new logic value that you add to the system, makes it more complex and results in more arguable special cases.
Posted on 09-Aug-08 at 5:10 pm | PermalinkTel;
Would you be more or less nervous if they were programming in C?
But your point is well made.
Posted on 09-Aug-08 at 5:16 pm | PermalinkAs for PostgreSQL, I’ve been using it on a new project I’m developing. It’s nice to have CHECK constraints!
It occurs to me that I recently read an essay about databases which talked about how relational models struggle to cope with uncertainty — the partial solution discussed was a Lisp OODB with Prolog queries
.
Posted on 09-Aug-08 at 5:18 pm | PermalinkOODB are things of pure evil. (Again, see Date for a tediously detailed explanation.) You basically end up with the CODASYL model (shudder) which is what the relational model was designed to specifically avoid. Goodbye referential integrity, hello pointer-chasing.
The relational model, if properly implemented (which excludes 99.9% of implementations that actually exist) can deal with uncertainty quite well. Although http://www.dbdebunk.com/index.html appears to be abandoned, it explains things quite well if you dig around a bit.
Posted on 13-Aug-08 at 11:10 am | PermalinkBugger. Screwed up the html. Should be
I think.
Posted on 13-Aug-08 at 11:11 am | Permalink