Ok now, that I got your attention, that it isn't dead yet - but we can
see a whole class of applications (maybe a couple of classes) where the
importance of the RDBMS as we know it today is greatly diminished.
In an article I
posted recently on InfoQ, (which I also mentioned in the
post on eBay architecture
last week ) I discussed the notion of database denormalization on
internet-scale sites (such as Amazon, eBay, Flickr etc.). One point of
denormalization is immutable data where there isn't a lot of gain in
normalization to begin with.
The other thing is entity representation vs. speed. The problem is that
joins are slow and sometimes you get to corners where if we want any
type of scent speed we need to denormalize.
Todd Hoff notes that as well:
The problem is joins are relatively slow, especially over very large
data sets, and if they are slow your website is slow. It takes a long
time to get all those separate bits of information off disk and put
them all together again. Flickr decided to denormalize because it took
13 Selects to each Insert, Delete or Update.
This
point is, however, that these "corner cases" get more and more
prevalent even in smaller scale application - especially when you have
complex entities (as is the case with defense systems for example).
Mats Helander, recently wrote a
post about saving to Blob,
and only adding fields as needed for indexing and identity purposes.
Mats also suggest the semi-transparent way of using XML columns where
the database can do something with the otherwise opaque data.
This point in fact, demonstrate that the relational data future is
indeed not totally secures as we do see that that leading databases
begin to treat XML data (which is hierarchical and not relational) as
a native citizen - to the point we can even index XML data.
So far we've seen a trend to denormalize more, handle non-relational data, what else? ah transactions
Ive worked on several systems where the data was constantly updated and
actually gave the system's representation of the world out-side (of the
system) the focus was on availability and latency. Which is again also
aligned with the approach taken by the large internet sites which
emphasis eventual consistency over immediate consistency.
In distributed systems crashes happen. The RDBMS is show-stopper when
it comes to crashes - if we can't commit, we need to stop,roll back.
now maybe we can start-over. Is this acceptable? there are many
scenarios where it is not. I've seen it in defense systems, in
communications systems and even in e-commerce systems (if you are not
responsive, I'll just go to the competition).
What do you do in the presence of error? Joe Armstrong suggest the following as the basis for
Erlang in his thesis:
To make a fault-tolerant software system which behaves reasonably in the presence of software errors we proceed as follows:
1. We organize the software into a hierarchy of tasks that the system
has to perform. Each task corresponds to the achievement of a number of
goals. The software for a given task has to try and achieve the goals
associated with the task. Tasks are ordered by complexity. The top
level task is the most complex, when all the goals in the top level
task can be achieved then the system should function perfectly. Lower
level tasks should still allow the system to function in an acceptable
manner, though it may offer a reduced level of service.The goals of a
lower level task should be easier to achieve than the goals of a higher
level task in the system.
2. We try to perform the top level task.
3. If an error is detected when trying to achieve a goal, we make an
attempt to correct the error. If we cannot correct the error we
immediately abort the current task and start performing a simpler task.
On top of that we try to keep any update local i.e. within a task
boundary on the hardware where the task occurred - distributing the
transactions is not a good option. I outlined why when I talked about
SOA and cross-services transactions but the reasoning holds.
Well, truth be said the RDBMS is not dead, its demise probably not even
around the corner. Also this does not mean that there aren't any uses
for a database. But that's true for other architectural choices. Who
ever said that a single tier solution is not the right one for very
specific types of system...
RDBMS succeeded to to become the de-facto standard to building system
because they offer some very compelling attributes - ACID brings a lot
of piece of mind. Large scale systems,low-latency system and fault
tolerant systems opt for another set of compelling attributes (BASE).
The point is that when you design your next solution maybe the
conventional database thinking is something that you should at least
give another thought to and instead of just following dogma