Dare Obasanjo on Amazon SimpleDB:
It is interesting to imagine how this system evolved. From experience, it is clear that everyone who has had to build a massive relational database that database joins kill performance. The longer you’ve dealt with massive data sets, the more you begin to fall in love with denormalizing your data so you can scale. Taking to its logical extreme, there’s nothing more denormalized than a single table. Even better, Amazon goes a step further by introducing multivalued columns which means that SimpleDB isn’t even in First Normal Form whereas we all learned in school that the minimum we should aspire to is Third Normal Form.
Uh-huh! As I see more and more databases, I reach the WTF point quicker and quicker. And I haven’t seen enough yet. But so far I couldn’t agree more with this observation.
Every project I’ve worked on, the common theme of performance problems is database joins, and obviously the bigger the database the bigger the performance problem. Not only does it cause performance problems, but inevitably the solution architect comes into save the day and starts replicating data in tables to denormalize data and improve performance, which then obviously leads to application bugs.
The database I currently work with has about 1000 tables. And I’m dumbfounded as to why.
Just put all the shit (okay, most of it) in one table.
Now most DBAs, in my observation, would cringe and argue against this ’til they’re blue in the face, but they need to stay employed so they have a decent incentive, I guess.
I think some of my most wasted class time in college (after 4 semesters of spanish) was learning how to normalize tables for 2 reasons, 1. I’ve never really used the skill, and 2. The solution to the problem is often to denormalize, (after checking the explain plan, modifying the query and adding indexes)
