January 26, 2008
@ 11:31 PM
David J. DeWitt and Michael Stonebraker are at it again. There was a lot of buzz on the internet after their previous post (here is what I had to say about it).
Their first point on the new post tries to counter the claim that MapReduce is not a database so it shouldn't be judged as one. They claim that it isn't a matter of apples and oranges but rather
 " We are judging two approaches to analyzing massive amounts of information, even for less structured information."

The problem with that is they continue from there to define a problem in database terms and then show how MapReduce will not be as good as a database in solving it - well, duh.
The fact that isolated queries may run better in a pre-indexed database should come as no great surprise. As I noted in the previous post on the subject - MapReduce can be used to create the appropriate index or partition the data into smaller chunks that would be easier to use to answer the type of queries David and Michael mention.
As Mark Chu-Carroll explains Map/Reduce and databased don't solve the same kind of problems

Also what happens when the database is constantly updated ?!  - I don't mind how scientifically accurate are the measurements that say database scale like no other things. I am more comfortable with the empiric experience by companies like Amazon, Diggs, Google and ebay who found they have to shard their data to support their scalability needs and not use distributed transactions/distributed databased.


 
Comments are closed.