January 26, 2008
@ 11:31 PM
David J. DeWitt and Michael Stonebraker are at it again. There was a lot of buzz on the internet after their previous post (here is what I had to say about it).
Their first point on the new post tries to counter the claim that MapReduce is not a database so it shouldn't be judged as one. They claim that it isn't a matter of apples and oranges but rather
 " We are judging two approaches to analyzing massive amounts of information, even for less structured information."

The problem with that is they continue from there to define a problem in database terms and then show how MapReduce will not be as good as a database in solving it - well, duh.
The fact that isolated queries may run better in a pre-indexed database should come as no great surprise. As I noted in the previous post on the subject - MapReduce can be used to create the appropriate index or partition the data into smaller chunks that would be easier to use to answer the type of queries David and Michael mention.
As Mark Chu-Carroll explains Map/Reduce and databased don't solve the same kind of problems

Also what happens when the database is constantly updated ?!  - I don't mind how scientifically accurate are the measurements that say database scale like no other things. I am more comfortable with the empiric experience by companies like Amazon, Diggs, Google and ebay who found they have to shard their data to support their scalability needs and not use distributed transactions/distributed databased.


 
This post is part of a series of posts trying to define SOA as an architectural style. In the previous post I talked about how SOA builds on the Client/Server architectural style. In this post I'll talk about how SOA builds on the architectural style of Layered System.

Layered System or Layered architectural style is one of the most basic and widely used architectural styles. Here is a definition of Layered architecture I posted in the past
The layered style is composed of layers (the components) which provides facilities and has a specific roles. The layers have communication paths / dependencies (the connectors).

In a layered style a layer has some limitations on how it can communicate with other layers (the constraints). Typically a layered is allowed to call only the layer below it and be called only by the layer above it (but there are variants e.g. a layer can call to any layer below it;  etc. - all is fine as long as the layers communication paths are limited and restricted by some rules)
SOA takes the strict layers definition and restricts the knowledge of one service only to the service interface/contract of the other services. This means the services cannot be aware or care about the internal structure of other services. Services don't mind the internal structure of other services. This helps with introducing the  "boundaries are explicit" tenet  (although, it build on more than just layering)

The layered nature of SOA means you can also add additional layers between the services. One very common example is adding a servicebus (e.g. using an ESB or tools like NServiceBus) other examples can include load balancers, firewalls (see Service Firewall pattern) etc. Naturally, When you add intermediary layers  services don't talk to each other directly rather accept the services (such routing , message persistence etc.)  from the intermediary layer.

It should be noted, that in the context of SOA the layers are, in most cases, actually tiers. The difference is that tiers provide (potential) physical separation where as layers provide logical separation . When a layer is actually a tier it has extensive implication on the level of trust between the tiers (see my post "Tier is a natural boundary" for more details)

The next post in the series will talk about the "Pipe and Filters" style  and SOA. This is the first place where the REST architectural style and SOA diverge.


 
David DeWitt and Michael Stonebraker write about MapReduce in "The Database Column". Now I usually like what Michael Stonebraker writes (e.g. his piece on the RDBMS demise which I also wrote about myself). However I can't say that this time around.
David and Michael write that MapReduce is a big step backwards. before I'll talk about what they write, here is a (very high level) reminder what Map/Reduce is
MapReduce as Google's Jeffery Dean and Sanjay Ghemawat explain is a way to get automatic parallelization and distribution along with fault tolerance, monitoring and I/O scheduleing for tasks that need to work on complete datasets. MapReduce uses two functions:
  • Map - multiple instances of which run in parallel  to process a key/value pair and produce  produce a set of  grouping key(s) and intermediate values.
  • Reduce - which runs per grouping key and merge the intermediate values to a a set of merged outputs (usually one)
David and Michael claims that MapReduce is
1. a step backwards because it doesn't build on Schema
2. a poor implementation because it doesn't use indexes
3. not new
4. missing features - like bulk load, indexing, updates, transactions, integrity constraints, referential integrity, views
5. incompatible with DBMS tools - like report writers, BI tools, replication tools, design tools

Well, if anything, it seems that David and Michael don't really understand what MapReduce is. As I noted above MapReduce is a way to go over complete sets in an efficient distributed manner. In fact it can even be used to build the index of a traditional RDBMS. It isn't really competing wit databases Relational or other. Yep, comparing MapReduce and databse is the  apples and oranges thing...

I guess they might have meant to talk about another Google tool called BigTable - which is at least sort of a column database (Michael's company also makes a column database) for storing structured data in a highly distributed , high performance way. However David and Michael would still be wrong as BigTable is proprietary and targeted at a specific purpose so it isn't supposed to solve the same problems as a general purpose  database not to mention that it is highly scalable (ever heard of google's search engine ;) ) and does support things like indexes, updates etc.

Also as I mentioned in the "RDBMS is dead" post, the internet proved that RDBMS features (like transactions etc.)  can only only scale so much.  While Databases focus on the Consistency and Availability parts of the CAP conjecture and ACID tenets , internet scale systems pick Partitioning and Availability and BASE tenets instead.


 
January 9, 2008
@ 10:51 PM
A while ago I wrote about use cases and user stories and how I use both, starting with user stories for ease of estimation and manageability and expanding them into use cases as a way to elicit more requirements.
I just read a recent blog (wiki) entry by Alistair Cockburn (of "writing effective use cases" fame) called  "why I still use use cases".
Alistair explains that he sees (with companies he consults to) three main problems with user stories:
    1. lack of context (what's the largest goal)
    2. sense of completeness - that you covered all bases relating to a goal.
    3. no mechanism for looking ahead at upcoming work.


I think the first reason is partially dealt with by using themes, although use cases provide more context if we also consider the brief description, pre conditions and post conditions.
Also I am not sure I agree (maybe I don't understand) the third one - since you can have a backlog of future work regardless of the requirements methodology you use.

The second point, seems to be around  the same reason I have for using use cases - elicitation of requirements. Indeed, you can see that reading reasons 3 and 4 (of the 5 reasons Alistair  mentions for still using use cases) :
"
3. The extension conditions of each use case provide the requirements analysts a framework for investigating all the little, niggling things that somehow take up 80% of the development time and budget. It provides a look ahead mechanism, so the customer / product owner / business analyst can spot issues that are likely to take a long time to get answers for. These issues can and should then be put ahead of the schedule, so that the answers can be ready when the development team gets around to working on them. The use case extension conditions are the second part of the completeness question.

4. The use case extension scenario fragments provide answers to the many detailed, often tricky business questions progammers ask: "What are we supposed to do in this case?" (which is normally answered by, "I don't know, I've never thought about that case.") In other words, it is a thinking / documentation framework that matches the if...then...else statement that helps the programmers think through issues. Except it is done at investigation time, not programming time. "

Alistair also says that working with short iterations means you have to break the use case into stories :
These days, iteration/sprint lengths are so short that it is not practical to implement an entire use case in just one of them. That means additional work is needed, to create user stories or backlog items for each use case, track that each one get developed, and ensure that the complete set of user stories or backlog items do indeed deliver the subset of the use cases needed for the particular release.

I think that working just from the use case toward stories  is limiting, since it is many times easier to think up a user story and figure out the larger context later on (along with the other details of the use case). Also, as I've mentioned in the above mentioned blog entry, user stories are also easier to estimate not just easier to build.



 
Tags: Agile | Requirements

Sam Gentile and myself exchanged a few blog posts on the definition of SOA, in the latest installment Sam disagrees with me that SOA should first be looked at in the pure architectural sense without bundling in the business and enterprise aspects.
In a nut shell I have two main reasons to prefer looking at SOA at the core as a pure architectural style.
The first is the when you bundle in enterprise-wide aspects of implementing SOA you loose out on the option (or the audience) that can use it to solve more local problem (i.e. at the product/solution level) using the same principles that bring the benefits on the enterprise scale.
The other reason I have  for separating the concepts is that the business encompassing definitions tend to be fluid, hand waiving ones and cannot be measured for compliance.
Consider the definitions Sam quotes from  Thomas Erl's books:
"SOA establishes an architectural model that aims to embrace the efficiency, agility, and productivity of an enterprise by positioning services as the primary means through which solution logic is represented in support of the realization of strategic goals associated with service-oriented computing." (emphasis by Sam)"

SOA represents a model in which functionality is decomposed into small, distinct units (services), which can be distributed over a network and can be combined together and reused to create business applications. [3]
Now what the hell is that? These are all noble goals but shouldn't this be the goal of any enterprise architecture ? What makes SOA unique in this sense?
Also how does these definitions help us build services? what makes a service a service ? Why is (or isn't) any web-enabled component a service?
Definitions that distance themselves from the architectural roots seems to me like smoke and mirror and contribute to the general confusion around SOA - to the point where even people like Harry Pierson wonder why we should even bother defining it

Personally, I still think it is worth while defining *** ( the architectural style, formerly known as SOA) since as I mentioned earlier it is (in my opinion) a useful architectural style for building distributed systems - whether the distributed system is a solution, a product, a product line or a complete enterprise