June 23, 2009
@ 10:03 PM

In one of my previous posts (Rest: good, bad and ugly), I made a passing comment, about how I think using CRUD in RESTful service  is a bad practice. I received a few comments / questions asking why do I say that – so what’s wrong with CRUD and REST?

On the surface, it seems like a very good fit (both technically and architecturally), however scratch that surface, and you’d see  that it isn’t a good fit for either.

REST over HTTP is the most common (almost only) implementation of the REST architectural style - to the point REST over HTTP is synonymous with REST. I would say most of the people who think of REST in CRUD terms, think about mapping of the HTTP verbs.

CRUD which stands for Create, Read, Update and Delete, are the four basic database operations. Some of the  HTTP verbs, namely POST, GET, PUT and DELETE (there are others like OPTIONS or HEAD) seem to have a 1-1 mapping to CRUD. As I said earlier they don’t. The table below briefly contrast HTTP verbs and CRUD

Verb CRUDdy Candidate Actually
GET SELECT (Read) Get a representation of a resource. While it is very similar to SELECT it also has a few features beyond an out-of-the-box SELECT e.g. by using If-Modified-Since (and similar modifiers) you might get an empty reply.
Delete Delete Maps well
PUT Update Put looks like an update but it isn’t since:
1. You have to provide a complete replacement for the resource (again similar to update but not quite)
2. You can use PUT to create a resource (when the URI is set by the client)
POST Insert It can be used to create a   but it should be a child/subordinate  one. Furthermore, it can be used to provide partial update to a resource (i.e. not resulting in a new URI)
OPTIONS ? Get the available ways to continue considering the current state or the resource
HEAD ? Get the headers or metadata about the resource (which you would otherwise GET)

The way I see it,  the HTTP verbs are more document oriented than database oriented (which is why document databases like CouchDB are seamlessly RESTful). In any event, what I tried to show here is that while you can update, delete and create new resources the way you do that is not exactly CRUD in the database sense of the word – at least when it comes to using the HTTP verbs.

However, the main reason CRUD is wrong for REST is an architectural one. One of the base characteristics(*) of REST is using hypermedia to externalize the statemachine of the protocol (a.k.a. HATEOS– Hypertext as the engine of state). The URI to URI transition is what makes the protocol tick (the transaction implementation by Alexandros  discussed in the previous post shows a good example of following this principle). 

Tim Ewald explains this  nicely (in a post from 2007…) :

… Here's what I came to understand. Every communication protocol has a state machine. For some protocols they are very simple, for others they are more complex. When you implement a protocol via RPC, you build methods that modify the state of the communication. That state is maintained as a black box at the endpoint. Because the protocol state is hidden, it is easy to get things wrong. For instance, you might call Process before calling Init. People have been looking for ways to avoid these problems by annotating interface type information for a long time, but I'm not aware of any mainstream solutions. The fact that the state of the protocol is encapsulated behind method invocations that modify that state in non-obvious ways also makes versioning interesting.

The essence of REST is to make the states of the protocol explicit and addressableg by URIs. The current state of the protocol state machine is represented by the URI you just operated on and the state representation you retrieved. You change state by operating on the URI of the state you're moving to, making that your new state. A state's representation includes the links (arcs in the graph) to the other states that you can move to from the current state. This is exactly how browser based apps work, and there is no reason that your app's protocol can't work that way too. (The ATOM Publishing protocol is the canonical example, though its easy to think that its about entities, not a state machine.)

If you are busy with inserting and updating (CRUDing) resources you are not, in fact, thinking about protocols or externalizing a State machine and, in my opinion, miss the whole point about REST.

CRUD services leads and promoted to the database as a service kind of thinking (e.g. ADO.NET data services) which as I explained in another post last year is a bad idea since:

  1. It circumvents the whole idea about "Services" - there's no business logic.
  2. It is exposing internal database structure or data rather than a thought-out contract.
  3. It encourages bypassing real services and going straight to their data.
  4. It creates a blob service (the data source).
  5. It encourages minuscule demi-serices (the multiple "interfaces" of said blob) that disregard few of the fallacies of distributed computing.
  6. It is just client-server in sheep's clothing.

The main theme of this and the previous post is that if we try to drag REST to the same old, same old stuff we always did we wouldn’t really get that many benefits. In fact, the “old” ways of doing that stuff are probably more suitable for the job anyway since they have been in use for a while now. and they are “tried and tested”  (“You can’t win an argument with an idiot, he’ll just drag you down to his level and beat you with experience” …). REST is just  a different paradigm that RPC, ACID transactions and CRUD.


* I know I sound like a broken record on that but our industry has a history diluting terms to a point they almost stop being useful (SOA comes to mind..). The way I see it you can have 3 levels on your way to REST over HTTP:

  • You can be using HTTP and XML/JSON – this is level 1 or “Using standards”.
  • You can be using the HTTP verbs properly and/or applying document oriented communications – this is level 2 or “Rest-like” interface
  • You can conform to all REST constraints and be at level 3 or “RESTful”.

All levels can be useful and bring you merit but only the 3rd is REST


 
Tags: REST | SOA | Software Architecture | Trends

June 15, 2009
@ 11:10 PM

 

Yesterday I read an interesting paper called “RETRO: A RESTful Transaction Mode”. On the good side, I have to say, it is one of the best RESTful models I’ve seen thus far. The authors took special care to satisfy the different REST constraints, unlike many “RESTful” services (e.g. twitter that returns identifier and not URIs). On the downside is I think a distributed transaction model is bad for REST or in other words I don’t see a reason for going through this effort and jumping through all these hoops.

Why?

For the same reasons transactions are wrong for SOA and  why WS-AtomicTransactions is wrong for SOAP web services:

  • Service Boundary – RESTful or otherwise is a trust boundary. Atomic transactions require holding locks and holding them on behalf of foreign service is opening a security hole (makes it much easier to do a denial of service attack)
  • You cannot assume atomicity between two different entities or resources. Esp. when these resources belong to different businesses.
  • Transactions introduce coupling (at least in time)
  • Transactions hinder scalability – It isn’t that you can’t scale but it is much harder

For rest it is even worse - Since using hypermedia as the engine of state change means that the hypermedia actually  describes the protocol, we clutter the business representations (the representations of real business entities like customer, order etc.) with transactional  nitty-gritty as the authors say:

“our model explicitly identifies locks, transactions, owners and conditional representations as explicit, linkable resources. In fact, every significant entity in our model is represented as a resource in order to comply with this constraint.”

This also means the programming the resources themselves will get much more complicated

I think that if you want to reap the benefits of REST you should keep the protocol simple and focus on the business and technical merits you can get not bog it all with needless complexity. It seems to me that RETRO is a good mental exercise to show transactions can be RESTful. I think, however that it is an overkill for RESTful implementations.

RESTful architectures will be better off with BASE (Basically Available, Scalable, Eventually Consistent) and/or ACID2 (Associative, Commutative, Idempotent and Distributed) models –or at least the Saga model (which the authors intend to tackle next) which  is a better candidate (IMHO) for achieving distributed consensus.


 
Tags: REST | SOA | Software Architecture

This is another post (<Rant>) about WCF default behavior and how it can make the life of developers miserable ( you can also check out “WCF defaults limit scalability”  and “Another WCF gotcha - calling another service/resource within a call”)

Anyway, the trigger for this is a post by Ayende called “WCF works in mysterious ways”.  Ayende posted some code he wrote which was throwing a serialization exception. You can see his post for the full code, but in a nut shell he was defining a large object graph (8192 objects that contain other objects) and was trying to send that over the wire. Here’s a short excerpt from the service definition:

   1: [ServiceBehavior(
   2:        InstanceContextMode = InstanceContextMode.Single,
   3:        ConcurrencyMode = ConcurrencyMode.Single,
   4:        MaxItemsInObjectGraph = Int32.MaxValue
   5:        )]
   6:    public class DistributedHashTableMaster : IDistributedHashTableMaster
   7:    {
   8:        private readonly Segment[] segments;
   9:  
  10:        public DistributedHashTableMaster(NodeEndpoint endpoint)
  11:        {
  12:            segments = Enumerable.Range(0, 8192).Select(i =>
  13:                                                        new Segment
  14:                                                        {
  15:                                                            AssignedEndpoint = endpoint,
  16:                                                            Index = i
  17:                                                        }).ToArray();
  18:        }
  19:  
  20:        public Segment[] Join()
  21:        {
  22:            return segments;
  23:        }
  24:    }
  25:  
  26:    [ServiceContract]
  27:    public interface IDistributedHashTableMaster
  28:    {
  29:        [OperationContract]
  30:        Segment[] Join();
  31:    }
  32:  
  33:    public class NodeEndpoint
  34:    {
  35:        public string Sync { get; set; }
  36:        public string Async { get; set; }
  37:    }
  38:  
  39:    public class Segment
  40:    {
  41:        public Guid Version { get; set; }
  42:  
  43:        public int Index { get; set; }
  44:        public NodeEndpoint AssignedEndpoint { get; set; }
  45:        public NodeEndpoint InProcessOfMovingToEndpoint { get; set; }
  46:  
  47:        public int WcfHatesMeAndMakeMeSad { get; set; }
  48:    }

As you can see in line 4 – the service is properly decorated with an attribute to enlarge the number of objects in graph. so looking at the code I initially suggested he add a few ServiceKnowType and DataContract/DataMember attributes on the data classes (as the serialization sometimes needs some guidance. After that didn’t help I actually ran the code and then I noticed that the code was missing setting that same attribute – on the client side. So to fix the problem, the client side code below

   1: var channel =
   2:     new ChannelFactory<IDistributedHashTableMaster>(binding, new EndpointAddress(uri))
   3:         .CreateChannel();
   4: channel.Join();
Need to change to something like
   1: var channelFactory =
   2: new ChannelFactory(binding, new EndpointAddress(uri));
   3:  
   4: foreach (var operationDescription in channelFactory.Endpoint.Contract.Operations)
   5: {
   6:  
   7: var dataContractBehavior =
   8:  
   9: operationDescription.Behaviors[typeof(DataContractSerializerOperationBehavior)]
  10:  
  11: as DataContractSerializerOperationBehavior;
  12:  
  13: if (dataContractBehavior != null)
  14: {
  15:  
  16: dataContractBehavior.MaxItemsInObjectGraph = int.MaxValue;
  17:  
  18: }
  19:  
  20: }
  21: var channel=channelFactory.CreateChannel();
  22: channel.Join();

The main problem I find with this piece of code is the fact that it is needed at all. As the post’s title suggest I find this behavior greatly affects the loose coupling of anything that uses WCF (services or other components).

WCF requires that any change you make to the channel on the server side would be reflected in the channel on each and every client (e.g. we have a similar setting where we enlarge message sizes for webHttpBinding and there are many other such examples).

Sure, you say, that is just like adding a new field in the contract isn’t it? – Well no it isn’t since unlike anything else which appears in the (verbose as it is) SOAP contract these changes in default values, which are purely a WCF design choice, are not documented. Again, the changes in default values are not part of the contract. These are things you need to remember to pass on to you service consumer. So not only do I pay the overhead of having an explicit contract (e.g. vs. REST) – it really doesn’t work.  It means that two components who use the same contract may not  be interchangeable if one returns more data (in this case). It means that the two sides are coupled by the need to change these defaults and for what? WCF is smart enough to know how long is the message; WCF is smart enough to handle the message (if I encourage it by setting a behavior) why can’t it add 2 and 2 by itself?

Sometimes I just wish WCF had a TrainingWheels or DemosOnly attribute I could just set to false and make all this crap go away…

</Rant>


 
Tags: .NET | SOA | WCF

I recently read a post by  Tim Bray where he states that building on web technologies let you get away with believing some of the fallacies of distributed computing.

I personally thinks he is a little optimistic in that claim.

On “The network is reliable” – Tim says that that the connectionless of HTTP helps (it does) and that GET, PUT and DELETE are idempotent helps as well. I say that GET, PUT and DELETE only if the people implementing the server side make them so – i.e. consider the fallacy. The fact that the HTTP says they should be idempotent doesn’t automatically make each implementation compliant

On “ Latency is Zero” – Tim says the web makes it worse – but, he claims, users got used to that. Even if they did I think that users are just part of the picture since the programmable web is also making strides. Also as Tim says it is actually worse. Not to mention that “Latency isn’t constant” either

On “Bandwidth is infinite” – Again Tim agrees that it is worse but people learn to note it. Again learning that it is there doesn’t mean the fallacy is gone just that people are less likely to presume it

On “The Network is secure” – Tim says its probably the “least-well-addressed by the web” – no argument here

On “Topology doesn’t change” – Tim says URIs help mitigate it – Again Tim is assuming people make URIs permanent or will always return a temporary redirect/permanent redirect when a URI change – good luck with that.

On “There is one administrator” – Tim says that yes that’s the case but who cares. Well, an example I usually give is that time when I deployed an ASP.NET which worked for a while – until the hosting company decided to change their policy to partial-trust (the app. needed full-trust) – when that happens to you. You care. If you mashup with someone else, you care etc.

On “Transport cost is Zero” – Tim says it is the same as for Bandwidth – i.e. worse.

On “The network is homogeneous” – Tim says that that’s this is the “web’s single greatest triumph”. I actually agree to that as long as all of you stick to using the web’s ubiquitous standards (http, XML/JSON ) if you have parts of your application that can’t use that you still need to pay attention

One thing I am really  puzzled by is Tim’s conclusion :

“If you’re building Web technology, you have to worry about these things. But if you’re building applications on it, mostly you don’t.”

Since even according to him only 4 fallacies are covered by the web… (I think only 1)

In any event, I agree that the web standards and REST in particular, do contain guidelines that take into consideration the fallacies. However it is still up to developers to understand the problems they’ll create if they don’t follow these guidelines. Assuming that that is indeed the case, is well, overly optimistic in my experience.

You can also read a paper I published a few years ago which explains the fallacies  and why they are still relevant today.


 
Tags: REST | SOA | Software Architecture

Michael Poulin @ ebizq doesn’t like the Active Service pattern I suggest you read his post first but in a nutshell Michael sees two possible ways to understand the term Active Service:

“a) service view - a service that actively looking for companions to complete its own task
b) consumer view – a service which triggers its own execution by itself”

…and he doesn’t like both…

I think that both of these definitions aren’t that far… and I like both :)

The way I see it there are two concern here

1. Are services only reactive (“passive”)  ? - i.e. The service only “works” when it gets a request from a service consumer (user/another service/an orchestration engine) ? If the service also has at least one thread working to do internal stuff (e.g. scavenging outdated data, pre-fetching data from other service etc.) then that’s what I call an Active Service (option “b” above)

2.  How do services get data they need to complete a request when they actually get a request – There are many possibilities here: events, pub/sub, an orchestration engine that takes care of that, services that check for a known contract in a registry and then go to that service, even hardcoded. The options where the service looks for other services (e.g. using a registry) is option "a” above.

So basically all the options are valid a service can be a+b just a or just b or none and, in my eyes, these are orthogonal concerns.

Regarding pre-fetching – I think this can be beneficial as a way to achieve caching. Note that if you control both sides and you’ve got the needed infrastructure then it is probably better to push changes (eventing or pub/sub) but that’s not always the case.

In the comment I left on Michael’s blog I talked about different strategies for services “There are several strategies for that - one is to take that knowledge out of the service (e.g. using choreography or orchestration), providing a subscription and/or wiring infrastructure i.e. something that will tell you where to find certain contracts, hard coding , registry , using uniform interfaces (e.g. REST) etc.”

lets take a concrete (albeit very very simplistic) scenario to illustrate some of the approaches

Business scenario: When a customer makes an order we want to give a 5% discount for preferred customers. A customer get’s a proffered status upon a business decision (annual orders of 1M$ or knowing the CEO or whatever) and the status lasts for a year from the date it was introduced.

For the sake of this discussion say we have two services (again this is overly simplified) an Ordering service and a Customer service.

Here are a few technical options

Technical Scenario 1.

Customer places and order, the ordering service talks to “the” customer service to check if the customer deserves a discount if she does. the ordering service then updates the order with the discount and present it to the customer to finalize the order.

Technical Scenario 2.

Same as 1, with the ordering looking for a service that matches the customer contract it knows about

Technical Scenario 3

The ordering service asks “the” Customer service twice a day for a list of discounts and caches the result. When the user sends her order. it calculates the price and present it to her

Technical Scenario 4

Same as 3, with the ordering looking for a customer service (not using a known service)

Technical Scenario 5

The customer service sends a message to known subscribers whenever a new customer status occurs. The ordering service listens on that and update its internal cache. When the customer places her order, the ordering hits the cache for the discount

Technical Scenario 6

same as 5 but publishing an event to unknown subscribers

Technical Scenario 7

The customer service publish an event with the discounts (or changes in discounts) twice a day. The ordering service listens on that and update its internal cache. When the customer places her order, the ordering hits the cache for the discount

Technical Scenario 8

The customer order is passed to an orchestrating service, which hits a customer service for a discount and then passes all the data to an ordering service

There are quite a few more options and variants on the options listed but which one is best?

Yeah, you’ve guessed it -  it depends.It depends since each option has its own strength and weaknesses which can work best in different circumstances . It also  depends on the available infrastructure, on the structure of other services, on the services being internal or external etc.

for instance scenario 1 is less flexible than most others but it is simple to implement. There is coupling in time between ordering and customer (both have to be up for the order to complete). Scenario 4 needs to solve the problem of finding other services (e.g. using some kind of registry, or other services “pushing” their existence or whatever) but when a customer makes her request it (most likely) have all the needed info to process that request, making the ordering service more autonomous. As a side note, the fact that different approaches to achieve the same end-goal work in different situations is why I decided  to write patterns in the first place

Lastly, in case you are wondering the scenarios are:

1 – choreography with pre-known (configured or hardcoded) companion services

2 – choreography with “active service” of type a (ordering is active)

3- choreography with “active service” type b (ordering is active)

4 – Choreography with “active service” type a + b (ordering is active)

5 – pub/sub (e.g. using an ESB)

6 – eventing

7- eventing with “active service” type b (customer is active)

8 - orchestration


 
Tags: SOA | SOA Patterns | Software Architecture

May 12, 2009
@ 10:54 PM

I recently got a request from Alik for my opinion on REST. I think  this might be interesting for a wider audience and decided to blog my answer here.

Note: I also have a REST presentation I prepared awhile ago, which is downloadable from here (ppt)

The good

As you probably know REST is an architectural style defined by Roy Fielding for the web which is built on several foundations (client/server, uniform interface etc.) which gives it a lot of strength in affected areas. The top three in my opinion are:

  • (relatively) Easy to integrate – a good RESTful API is discoverable from the initial URI onward. This doesn’t suggest that a any application calling on on your service will automagically know what to do. It does mean however that the developer reading your API trying to integrate it has an easier life. Esp. if since hypermedia provides you the roadmap of what to do next.
  • Another feature for ease of integration which has to do with REST over HTTP (THE most common implementation of REST ) is the use of ubiquitous standards. Speaking HTTP which is the protocol of the web, emitting JSON or ATOMPub means it is much easier to find a library that can connect to you on any language and platform.
  • Scalability – stateless communication, replicated repository make for a good scalability potential.

do note that, as with any architecture/technology – a bad implementation can negate all the benefits

image

other REST goodness are things like the notion of the URI, idempotance of GET in  REST over HTTP etc.

The Bad

Some of the  problems of REST aren’t inherent problems of the architectural style but rather drawbacks of the REST over HTTP implementation. Most notable of these is what’s known as “lo-rest” (using just GET and POST) – While technically it might still be RESTful, to me a uniform interface with 2 verbs is too small to be really helpful (which indeed makes a lot of the implementation unRESTful see “The Ugly” below)

One problem which isn’t HTTP specific is handling REST- programming languages are not resource oriented so the handling code that maps URIs to tends to get messy. Actually Microsoft did a relatively good work with implementing Joe Gregorio’s idea of URI mapping which helps alleviate  some of the problem. On the other hand it is relatively hard to make the REST API hyper-text driven (Which is a constraints of REST)

Lastly and most importantly REST is not the answer to everything (see also another post I made on using REST along with other architectural styles) – e.g. most REST implementations I know do not support the notion of pub/sub (Roy did suggest a REST implementation called WAKA that enables this but most people never even heard of it). be weary of the “Hammer” syndrome, REST is a good tool for your toolset but it isn’t the only one. 

The Ugly

In my opinion there are 2 main ugly sides for REST. The first is Zealots. That isn’t something unique to REST any good technology/idea (Agile, TDD etc. ) gets its share of followers who think that <insert favorite idea> is the best thing since sliced bread and that everybody should do as they do or else.

The real ugliness comes from the misusers – There’s a lot of mis-understanding. The fact that REST over HTTP has become synonymous with REST leads people to think that HTTP is REST. I recently read a REST book review on Colin’s blog where “the author states that although hypermedia is important in REST it isn't covered in the book because WCF has poor support for it” i.e. a book on REST which ignores one of the important constraints of the style..

Other mis-uses include building an implementation that is GETsful  (ie. does everything with http GET) or doing plain RPC where the URI is the command, doing CRUD with HTTP verbs etc. etc.

The point is that REST seems simple but it isn’t – it requires a shift in thinking (e.g. identifying resources, externalizing the state transitions etc.). However, as noted above, done right it can be an important and useful tool in your toolset


 
Tags: REST | SOA

May 8, 2009
@ 10:59 PM

Apropos the Blogjecting  Watchdog pattern,  In addition to blogging I recently added to our system the ability to twitter. I am using Tweet# from DimeBrain (thanks Mark Nijhof for the tip via twitter).

Tweet# makes using tweeter really simple (I included the code below in case you find it useful).

The tweeter sender is part of a PostOffice service (I thought that it would be problematic to present it as SpamServer which was its original name :) ).

image Update 11/05 Here it is working on our staging environment :)

A few points about our design in general that are interesting in this regards are

  • The PostOffice is a “Server” type service – we have 3 types of services: server which has one instance per node, channel which has multiple instances per node and algorithmic which has one instance per core
  • The PostOffice implements a pattern I call “Legacy Bridge” – which is basically an SOA version of an adapter+facade in OO terms. The post office supports the events (over WCF) mechanism we have in our system from one side  and connects to external systems (SMS, coupons and twitter) on the other. The PostOffice, basically contains an Edge Component which accepts the requests and funnels them to *Sender classes that interact with the external systems.
  • from contract design perspective – The events I added into the system are StatusEvent and AdminStatusEvent (and not TwitterEvent and DirectMessageEvent). this is better, in my opinion, as it carries the intent of what I want to achieve. It also means that if I choose to change technology or use multiple destinations the events will stay meaningful. For instance, the AdminStatusEvent will be used by our monitoring system to send a notification if the system crashes. I’ll probably want that as an SMS, maybe even a phone call as well as a twit (so the AdminStatusEvent will have a severity to designate how it should be handled)
   1: using System;
   2: using System.Collections.Generic;
   3: using System.Linq;
   4: using System.Text;
   5: using Dimebrain.TweetSharp.Fluent;
   6:  
   7: namespace xsights.Apps.PostOffice.Server.Twitter
   8: {
   9:     class TwitterSender
  10:     {
  11:         private string account;
  12:         private string password;
  13:         private string admin;
  14:  
  15:         public TwitterSender(string tweetAccount, string twitterPassword,string adminAccount)
  16:         {
  17:             account = tweetAccount;
  18:             password = twitterPassword;
  19:             admin = adminAccount;
  20:         }
  21:         public void Update(string msg)
  22:         {
  23:              foreach (var tweet in BreakToTwitts(msg))
  24:             {
  25:                 var update =
  26:                     FluentTwitter.CreateRequest().AuthenticateAs(account, password).Statuses().Update(tweet).AsJson();
  27:  
  28:                 update.Request();
  29:             }
  30:         }
  31:  
  32:         public void SendAdminMessage(string msg)
  33:         {
  34:             foreach (var twit  in BreakToTwitts(msg))
  35:             {
  36:                 var dm =
  37:                 FluentTwitter.CreateRequest().AuthenticateAs(account, password).DirectMessages().Send(admin, twit).AsJson();
  38:  
  39:                 Retry(2,dm.Request,false);
  40:             }   
  41:             
  42:         }
  43:  
  44:         private IList<string> BreakToTwitts(string originalString)
  45:         {
  46:             var list = new List<string>();
  47:             for (int i = 0; i < originalString.Length; i += 140)
  48:             {
  49:                 var len = 140;
  50:                 if (originalString.Length - i < 140) len = originalString.Length - i;
  51:                 list.Add(originalString.Substring(i, len));
  52:             }
  53:             return list;
  54:         }
  55:  
  56:         private void Retry(int retries, Func<string> call,bool shouldThrow)
  57:         {
  58:            
  59:             try
  60:             {
  61:                 call();
  62:             }
  63:             catch (Exception ex)
  64:             {
  65:  
  66:                 if (retries > 0)
  67:                     Retry(--retries, call,shouldThrow);
  68:                 else
  69:                 {
  70:                     if (shouldThrow)
  71:                         throw;
  72:                 }
  73:             }
  74:           }
  75:           
  76:         }
  77:     }
  78:  
  79: }

 
Tags: .NET | OO | SOA | SOA Patterns

As I mentioned in the previous post I got a few interesting questions lately. The first from Colin regarding developing a customized solution for the blogjecting watchdog pattern vs. integrating/developing for a commercial monitoring suite (e.g. Unicenter/OpenView etc.). The second question I received was from Dru on running multiple versions of services (e.g. during upgrade) with active Sagas in the background. I think these questions are interesting enough to be answered as blog posts.Also since both these questions are related to the Blogjecting Watchdog pattern I thought it would be better to explain what it is actually first..

So here it is :)

Blogjecting Watchdog

Achieving availability is a multi-layered effort. I’ve already talked about how services should be autonomous (see for example Active Service pattern in chapter 2) , the Blogjecting Watchdog pattern will take a look at another aspect of autonomy. The Blogjecting Watchdog pattern shows how a service can proactively try to identify faults and problems and to try to heal itself when it identifies these problems.

1.1 The Problem

The Service Instance pattern (see section 3.4) for example, demonstrates a strategy that a service can implement to be able to cope with failure. The question is – is that enough? Is it enough for the service to try to cope with everything by itself? My answer is no, that is not enough. For one once we dealt with the failure within the service, the service ability to cope with the next failure would probably be diminished. For example if we found a failure in a server and moved to a standby server, the new server does not have another stand-by server to move to if another fault occurs.

Additionally, the failure might be too much for the service to be able to overcome it by itself. Like a switch going down - So we would have something external that looks after the service and could help the service (see Service Monitor pattern in chapter 4).

To increase the service autonomy and to increase the overall availability of our SOA we need both to try to identify and repair problem and to be able to notify the world about the service’s current status.

The question is then:

How can we identify and attend to problems and failures in the service and increase service availability?

One option is to try to infer the state of the service from the way it looks to the outside – yes this is as crude as it sound. You try to call the service, it doesn't respond you know it is down; you call the service, you expect to get a reply in 5 seconds you get it in 10 seconds, you understand that the service is congested. This is not a very good option as the external behavior only gives us coarse knowledge on the service's state. For example, if the services has a decent fault tolerance solution, we wouldn't know that anything happened – but the truth is that the service ability to handle the next fault might not exist anymore.

Another way is to install agents on the service's servers, this will give you a much better picture of what happens (vs. the option above). For example, you will also be able to get trend information (e.g. You can watch how much disk space is left and alert when it is getting low). There are several problems with this solution. One is that you need to actively install software on the service's servers which both decreases the service autonomy and creates a management hassle in itself. Another problem is that you still only get an external view of the service behavior (you just gain access more information). There are situations (see for example the Mashup pattern in chapter 7) where not all the services are under your control and you cannot access their hardware.

Yet another option is to actively question the service about it state. The has one big advantage over the two previous options since you also get some inside information regarding what the service has to say about its state. This enables the service to communicate trends in problems that will actually make it fail. For example if the service does not write any information into the local disk a low disk space is not a problem at all, if this is the disk where the database is located it is very much a problem. The solution is not perfect since it is the observers responsibility to go after the information. If the rate at which the observer samples the service is not fast enough it can miss on vital information.

As I mentioned earlier we want something that will help increase the service’s autonomy so a better approach in this regard would be for the service to watch over itself

1.2 The Solution

Watching over itself is also not enough as we also said we need the “world” to know what happening with the service, thus a combines solution is to :

Implement the Blogjecting Watchdog pattern and have the service actively monitor its internal state, try to heal itself and continuously publish its state and other important indicators.

clip_image002

Figure 3.14 The blogjecting watchdog pattern. The blogjecting. The blogjecting component that send the reports out and and listens for requests. The watchdog component monitor the status of the business service, tries to heal stray components and log any failure.

The pattern revolves around a single idea – to increase the service responsibility by using two complementary concepts reporting and self healing. The first is the Blogjecting concept where the service implements the Active Service pattern (see chapter 2 for more details) and a component which is in charge of monitoring the service's state. The component publish (see the publish/Subscribe interaction pattern in chapter 6) also the service's state on a cyclic basis or when something meaningful occurs. It is important to note that the fact that the service actively publishes its state doesn't have to mean it cannot also respond to inquiries regarding its health (akin to living a comment on a blog and getting a response from the author)

What are Blogjects

The term Blogjects was coined by Julian Bleecker back in 2005 (Bleecker, 2005) to describe "edgy designed objects that report themselves, or expose their experiences in some fashion" or in other words Blogject == Objects that blog. Julian Bleecker's vision for Blogjects is wider than the one suggested here. Jonathan's vision is for things that participate in the Web 2.0 sense of social-web or even further than that – to use Julian’s words :“Forget about the Internet of Things as Web 2.0, refrigerators connected to grocery stores, and networked Barcaloungers. I want to know how to make the Internet of Things into a platform for World 2.0. How can the Internet of Things become a framework for creating more habitable worlds, rather than a technical framework for a television talking to an reading lamp?” . I highly recommend taking a look at the full paper “A Manifesto for Networked Objects – Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things” (Bleecker, 2006) to get the full picture.

 

The second concept that plays in the Blogjecting Watchdog pattern is the watchdog, The idea here is to have a component that listens in on the information gathered and published by the blogject component and then to acts on that information in a meaningful way to increase the reliability and availability of the service. The possibilities for implementing self-healing are endless, two simple examples for self-healing actions are restating failed components and cleaning temporary files.

Watchdogs

Watchdog (actually watchdog timer) is a term borrowed from the embedded systems world. A watchdog is a hardware device that counts down to zero, and when it gets there it reset the device. To prevent this reset the application has to “kick the dog” before the timer runs out. If the application does not reset the counter it means that the application is hanged and the idea is that the reset would fix that.

 

How is the Blogjecting Watchdog pattern better than the other options mentioned above?

Even if we just consider the blogjecting part of the pattern we can see several advantages over the other approaches. The Blogjecting Watchcdog combines the benefits of an agent that actively monitors the service's health with the internal knowledge of what's important for the service continuity and what's not. Unlike the external agents solution, using Blogjects, the service retains its autonomy. The autonomy is increased even further when you combine the self-healing features of the watchdog. Thus the end result is a service which is more resilient (and thus has higher availability), which lets the world know both its current state as well as future trends.

In one project I was working on we inherited a situation where there were interdependencies between executable installed on different servers (within a service) – for example when one process was down on server A the objects running on server B could not function well and other such dependencies (this isn’t the brightest design, but sometimes you have to compromise - in this case there was no time and budget to redesign these applications). What we ended up with, is something like the situation in figure 3.15 below:

clip_image005

Figure 3.15 a sample deployment of a blogjecting watchdog. The daemons on the servers monitor the running components on each server. The Watchdog edge exposes the current the current state both through a web-services API and as SNMP traps

The watchdog agents on each of the server nodes monitors the components. The agents communicate amongst themselves to examine the dependencies and actions taken. The watchdog Edge component provides a WSDL based endpoint where other services can query it for the service’s health. It also publishes SNMP traps to an external SNMP monitor (e.g. HP-Openview). As an implementation hint, I can suggest keeping the watchdog components in a separate very simple executable (preferably a daemon that runs when the OS loads). The simpler the component, the lower the risk it will fail in itself (you can of course have a backup in the form of a hardware watchdog ..). Let’s take a more thorough look at the technology mapping options

1.3 Technology Mapping

Implementing Blogjecting Watchdog in an enterprise will usually pre-determine the protocols you will have to use for your “blog”. The IT team will most likely already standardize on one of the leading monitoring suites (CA-Unicenter, HP-Openview, IBM-Tivoli or if you are an all Microsoft shop Microsoft Operations Manager). In these cases you can use the SDK of the monitoring software (e.g. the Unicenter Agent SDK or MOM management pack developer guides). There are even 3rd party software packages to help you build such agents (for example OC Systems have a Universal Agent that makes it easier to write agents for Unicenter).

Note, that this is not always the case though, and sometimes you do have the freedom to choose you protocols. Few projects I worked on chose to standardize on using web-services with specific messages for monitoring the health of service (so we had a specific endpoint for each service where these messages were supported). With the emergent of SOA specific tools like the ones by Amberpoint and Weblayers you will see more and more WS-* based monitoring.

Other ways for reporting your internal state can be to use standards like SNMP (Simple Network Management Protocol) or plainly the windows Event logs An interesting option, which will let your Blogjecting Watchdog literally blog is to use a product called RSSBus. Whish is an ESB implementation that uses RSS protocol for communications. At the time I am writing this, the product is still in beta, so I haven’t used it for a serious system yet. Nevertheless, it looks like an interesting direction which I’ll consider when it is released.

Regarding the self-healing part (watchdog), self-healing is still more prevalent in hardware then in software (watchdog timers, RAID, IBM , hot spare memories, hot spare drives etc.) in a sense any solution that builds on clustering technology also has some of that built-in. The virtualization trend will also help in this sense (see discussion on utility computing in this chapter’s summary). You can already read papers that talk about self-healing web services (G. Kouadri Mostéfaoui, 2006) or see some projects that tries to look into this problem (e.g. WS-Diamond - DIAgnosability, Monitoring and Diagnosis). Nevertheless, all of them are still in the research phase and if you want something now, you will probably need to implement something by yourself. In my experience, it won’t take you too much time to have a basic watchdog up and running , but it will take you sometime until you will have it predicting and acting as an advanced warning system.

1.4 Quality Attribute Scenarios

The Blogjecting Watchdog is an interesting pattern (and not just because of its odd name) as it can really help on the way to autonomous computing. The effect of this proactive approach is to increase the overall reliability of the service. A service which is self-healing can overcome (at least) minor problem which results in better availability overall. Additionally the monitoring aspects of the Blogjecting Watchdog also help enhance availability by notifying administrators that something is amiss (which will enable them to fix it).

Quality Attribute (level1)

Quality Attribute (level2)

Sample Scenario

Availability

Failure detection

Upon a failure or degraded performance, The system will alert the system admin (via SMS) within 3 minutes.

Reliability

Increased autonomy

During normal operations, the system will clear all its temporary resources (e.g. files) continuously

Table 1.1 Blogjecting Watchdog pattern quality attributes scenarios. These are the architectural scenarios that can make us think about using the Blogjecting Watchdog pattern.

Once we introduce a monitor and start to collect data, we can start to find new uses for that data, for  example we can use the information on incoming request to try to locate attacks on the service etc. Saved monitoring data can be used to analyze the service’s behavior over time, predict failures and thus increase its maintainability etc.



 
Tags: Q&A | SOA | SOA Patterns | Software Architecture

April 28, 2009
@ 10:01 PM

I didn’t blog for the past month – well, I didn’t really had time to breath either :). However as of last week we (xsights) moved our system from demos and testing into production with our first client – Maariv, israel’s second largest newspaper.

In case you don’t know what xsights does – we hyperlink the real world using the mobile as your mouse (sort of like barcodes e.g. microsoft tag – but without the barcode;) )

Anyway, our first product (actually service) is for printed newspapers/magazines (see video above) and we are now in the dry-run phase were we make sure all the interfaces (specifying content, links etc.) work ( With the official launch planned in a few weeks) – but already each newspaper coming out has a few links which are marked by a small “mem” : imagefor instance, Today’s (independence day) paper has 6 different links (in the Sports, signon and Asakim). Currently only us and Maariv’s employees use the system but we thought it would be nice to start getting some more feedback from more objective parties so if you are interested in trying new technologies before others, live in Israel, read Maariv and have a video call  capable 3G phone drop me an email (arnon at xsights dot com). Note that there is no software to install, we use the video call capabilities of the phone (we are working on a client for phones that don’t support video calls like the iPhone etc. but that’s another story)

Other related good news  (I hope that’s how you’d see that anyway) is that I feel I can go back to my regular blogging schedule. Some of the posts in queue include nano-services anti-pattern, aggregated reporting pattern, blogjecting watchdog pattern and two interesting questions I received which are related to it, post on twitter integration for reporting  etc.


 
Tags: General | xsights