Arnon Rotem-Gal-Oz's Cirrus Minor
"Making IT work" - Musings of a Holistict Architect
Navigation for Arnon Rotem-Gal-Oz's Cirrus Minor - The DBMS vs. Map/Reduce - is that really a competition?
Content
Sidebar
Footer
January 19, 2008
@ 09:39 PM
Comments [0]
The DBMS vs. Map/Reduce - is that really a competition?
David DeWitt and Michael Stonebraker
write about MapReduce
in
"The Database Column".
Now I usually like what Michael Stonebraker writes (e.g. his piece on the
RDBMS demise
which
I also wrote about myself).
However I can't say that this time around.
David and Michael write that MapReduce is a big step backwards. before I'll talk about what they write, here is a (very high level) reminder what Map/Reduce is
MapReduce as
Google's Jeffery Dean and Sanjay Ghemawat explain
is a way to get automatic parallelization and distribution along with fault tolerance, monitoring and I/O scheduleing for tasks that need to work on complete datasets. MapReduce uses two functions:
Map - multiple instances of which run in parallel to process a key/value pair and produce produce a set of grouping key(s) and intermediate values.
Reduce - which runs per grouping key and merge the intermediate values to a a set of merged outputs (usually one)
David and Michael claims that MapReduce is
1. a step backwards because it doesn't build on Schema
2. a poor implementation because it doesn't use indexes
3. not new
4. missing features - like bulk load, indexing, updates, transactions, integrity constraints, referential integrity, views
5. incompatible with DBMS tools - like report writers, BI tools, replication tools, design tools
Well, if anything, it seems that David and Michael don't really understand what MapReduce is. As I noted above MapReduce is a way to go over complete sets in an efficient distributed manner. In fact it can even be used to build the index of a traditional RDBMS. It isn't really competing wit databases Relational or other. Yep, comparing MapReduce and databse is the apples and oranges thing...
I guess they might have meant to talk about another Google tool called
BigTable
- which is at least sort of a column database (Michael's company also makes a column database) for storing structured data in a highly distributed , high performance way. However David and Michael would still be wrong as BigTable is proprietary and targeted at a specific purpose so it isn't supposed to solve the same problems as a general purpose database not to mention that it is highly scalable (ever heard of google's search engine ;) ) and does support things like indexes, updates etc.
Also as I mentioned in the
"RDBMS is dead"
post, the internet proved that RDBMS features (like transactions etc.) can only only scale so much. While Databases focus on the Consistency and Availability parts of the
CAP conjecture
and ACID tenets , internet scale systems pick Partitioning and Availability and BASE tenets instead.
Tags:
data
|
scalability
|
Software Architecture
Related posts:
Messaging subscriptions - per Message vs. Topics
Architect Soft Skills - PDF version
Architect Soft Skills
Eventing in WCF
Architecture - It is always a tradeoff
Think Hollistically
« Use Cases vs. User Stories
|
Home
|
Defining SOA - Part III - Layered System... »
Comments are closed.
RSS/Subscribe
Navigation
Home
Papers, Articles & Presentations
SPAMMED Architecture Framework
SOA Patterns
About Me
Search
Featured Presentations & Papers
REST introduction (ppt)
SOA Pattern Presentation (pdf)
Fallacies of Distributed Computing (pdf)
Getting SPAMMED for architecture (pdf)
OO Primer (ppt)
Use Case Methodology for large systems (pdf)
Use Cases Methodology for large systems (ppt)
Software Architecture (ppt)
Service Oriented Architecture - Intro (ppt)
What is SOA anyway? (pdf)
O/R Mapping: Why/When (pdf)
Order my SOA Patterns Book
Published Patterns
Edge Component (pdf)
Gridable Service (pdf)
Service Firewall (html @ InfoQ)
Saga (pdf)
What I am reading
Subscribe to RSS headline updates from:
Tag Cloud
.NET (65)
A&D2007 (6)
Agile (20)
BI (2)
dasBlog (1)
data (5)
Design (21)
ESB (2)
Everything (200)
Functional Languages (1)
General (60)
Java (6)
Mobile (2)
new (4)
OO (11)
PaperLnx (6)
Papers (3)
Project Management (8)
Q&A (1)
refactoring (1)
Requirements (2)
REST (15)
RIA (2)
ruby (8)
scalability (6)
SCRUM (2)
SOA (79)
SOA Patterns (33)
Software Architecture (174)
SPAMMED Process (33)
TDD (4)
Trends (1)
Trends (8)
WCF (2)
xsights (2)
Archives
November, 2008 (3)
October, 2008 (4)
September, 2008 (4)
August, 2008 (8)
July, 2008 (6)
June, 2008 (5)
May, 2008 (4)
April, 2008 (4)
March, 2008 (6)
February, 2008 (3)
January, 2008 (5)
December, 2007 (9)
November, 2007 (6)
October, 2007 (11)
September, 2007 (11)
August, 2007 (10)
July, 2007 (9)
June, 2007 (9)
May, 2007 (9)
April, 2007 (6)
March, 2007 (4)
February, 2007 (2)
January, 2007 (5)
December, 2006 (4)
November, 2006 (3)
October, 2006 (4)
September, 2006 (2)
August, 2006 (4)
July, 2006 (3)
June, 2006 (4)
May, 2006 (10)
April, 2006 (8)
March, 2006 (8)
February, 2006 (6)
January, 2006 (6)
December, 2005 (3)
November, 2005 (5)
October, 2005 (6)
September, 2005 (10)
August, 2005 (5)
July, 2005 (15)
June, 2005 (16)
All dates
All Posts
Contact the Author
Contact Arnon
Affiliations
Admin
Sign In