June 23, 2009
@ 10:03 PM

In one of my previous posts (Rest: good, bad and ugly), I made a passing comment, about how I think using CRUD in RESTful service  is a bad practice. I received a few comments / questions asking why do I say that – so what’s wrong with CRUD and REST?

On the surface, it seems like a very good fit (both technically and architecturally), however scratch that surface, and you’d see  that it isn’t a good fit for either.

REST over HTTP is the most common (almost only) implementation of the REST architectural style - to the point REST over HTTP is synonymous with REST. I would say most of the people who think of REST in CRUD terms, think about mapping of the HTTP verbs.

CRUD which stands for Create, Read, Update and Delete, are the four basic database operations. Some of the  HTTP verbs, namely POST, GET, PUT and DELETE (there are others like OPTIONS or HEAD) seem to have a 1-1 mapping to CRUD. As I said earlier they don’t. The table below briefly contrast HTTP verbs and CRUD

Verb CRUDdy Candidate Actually
GET SELECT (Read) Get a representation of a resource. While it is very similar to SELECT it also has a few features beyond an out-of-the-box SELECT e.g. by using If-Modified-Since (and similar modifiers) you might get an empty reply.
Delete Delete Maps well
PUT Update Put looks like an update but it isn’t since:
1. You have to provide a complete replacement for the resource (again similar to update but not quite)
2. You can use PUT to create a resource (when the URI is set by the client)
POST Insert It can be used to create a   but it should be a child/subordinate  one. Furthermore, it can be used to provide partial update to a resource (i.e. not resulting in a new URI)
OPTIONS ? Get the available ways to continue considering the current state or the resource
HEAD ? Get the headers or metadata about the resource (which you would otherwise GET)

The way I see it,  the HTTP verbs are more document oriented than database oriented (which is why document databases like CouchDB are seamlessly RESTful). In any event, what I tried to show here is that while you can update, delete and create new resources the way you do that is not exactly CRUD in the database sense of the word – at least when it comes to using the HTTP verbs.

However, the main reason CRUD is wrong for REST is an architectural one. One of the base characteristics(*) of REST is using hypermedia to externalize the statemachine of the protocol (a.k.a. HATEOS– Hypertext as the engine of state). The URI to URI transition is what makes the protocol tick (the transaction implementation by Alexandros  discussed in the previous post shows a good example of following this principle). 

Tim Ewald explains this  nicely (in a post from 2007…) :

… Here's what I came to understand. Every communication protocol has a state machine. For some protocols they are very simple, for others they are more complex. When you implement a protocol via RPC, you build methods that modify the state of the communication. That state is maintained as a black box at the endpoint. Because the protocol state is hidden, it is easy to get things wrong. For instance, you might call Process before calling Init. People have been looking for ways to avoid these problems by annotating interface type information for a long time, but I'm not aware of any mainstream solutions. The fact that the state of the protocol is encapsulated behind method invocations that modify that state in non-obvious ways also makes versioning interesting.

The essence of REST is to make the states of the protocol explicit and addressableg by URIs. The current state of the protocol state machine is represented by the URI you just operated on and the state representation you retrieved. You change state by operating on the URI of the state you're moving to, making that your new state. A state's representation includes the links (arcs in the graph) to the other states that you can move to from the current state. This is exactly how browser based apps work, and there is no reason that your app's protocol can't work that way too. (The ATOM Publishing protocol is the canonical example, though its easy to think that its about entities, not a state machine.)

If you are busy with inserting and updating (CRUDing) resources you are not, in fact, thinking about protocols or externalizing a State machine and, in my opinion, miss the whole point about REST.

CRUD services leads and promoted to the database as a service kind of thinking (e.g. ADO.NET data services) which as I explained in another post last year is a bad idea since:

  1. It circumvents the whole idea about "Services" - there's no business logic.
  2. It is exposing internal database structure or data rather than a thought-out contract.
  3. It encourages bypassing real services and going straight to their data.
  4. It creates a blob service (the data source).
  5. It encourages minuscule demi-serices (the multiple "interfaces" of said blob) that disregard few of the fallacies of distributed computing.
  6. It is just client-server in sheep's clothing.

The main theme of this and the previous post is that if we try to drag REST to the same old, same old stuff we always did we wouldn’t really get that many benefits. In fact, the “old” ways of doing that stuff are probably more suitable for the job anyway since they have been in use for a while now. and they are “tried and tested”  (“You can’t win an argument with an idiot, he’ll just drag you down to his level and beat you with experience” …). REST is just  a different paradigm that RPC, ACID transactions and CRUD.


* I know I sound like a broken record on that but our industry has a history diluting terms to a point they almost stop being useful (SOA comes to mind..). The way I see it you can have 3 levels on your way to REST over HTTP:

  • You can be using HTTP and XML/JSON – this is level 1 or “Using standards”.
  • You can be using the HTTP verbs properly and/or applying document oriented communications – this is level 2 or “Rest-like” interface
  • You can conform to all REST constraints and be at level 3 or “RESTful”.

All levels can be useful and bring you merit but only the 3rd is REST


 
Tags: REST | SOA | Software Architecture | Trends

June 15, 2009
@ 11:10 PM

 

Yesterday I read an interesting paper called “RETRO: A RESTful Transaction Mode”. On the good side, I have to say, it is one of the best RESTful models I’ve seen thus far. The authors took special care to satisfy the different REST constraints, unlike many “RESTful” services (e.g. twitter that returns identifier and not URIs). On the downside is I think a distributed transaction model is bad for REST or in other words I don’t see a reason for going through this effort and jumping through all these hoops.

Why?

For the same reasons transactions are wrong for SOA and  why WS-AtomicTransactions is wrong for SOAP web services:

  • Service Boundary – RESTful or otherwise is a trust boundary. Atomic transactions require holding locks and holding them on behalf of foreign service is opening a security hole (makes it much easier to do a denial of service attack)
  • You cannot assume atomicity between two different entities or resources. Esp. when these resources belong to different businesses.
  • Transactions introduce coupling (at least in time)
  • Transactions hinder scalability – It isn’t that you can’t scale but it is much harder

For rest it is even worse - Since using hypermedia as the engine of state change means that the hypermedia actually  describes the protocol, we clutter the business representations (the representations of real business entities like customer, order etc.) with transactional  nitty-gritty as the authors say:

“our model explicitly identifies locks, transactions, owners and conditional representations as explicit, linkable resources. In fact, every significant entity in our model is represented as a resource in order to comply with this constraint.”

This also means the programming the resources themselves will get much more complicated

I think that if you want to reap the benefits of REST you should keep the protocol simple and focus on the business and technical merits you can get not bog it all with needless complexity. It seems to me that RETRO is a good mental exercise to show transactions can be RESTful. I think, however that it is an overkill for RESTful implementations.

RESTful architectures will be better off with BASE (Basically Available, Scalable, Eventually Consistent) and/or ACID2 (Associative, Commutative, Idempotent and Distributed) models –or at least the Saga model (which the authors intend to tackle next) which  is a better candidate (IMHO) for achieving distributed consensus.


 
Tags: REST | SOA | Software Architecture

This is another post (<Rant>) about WCF default behavior and how it can make the life of developers miserable ( you can also check out “WCF defaults limit scalability”  and “Another WCF gotcha - calling another service/resource within a call”)

Anyway, the trigger for this is a post by Ayende called “WCF works in mysterious ways”.  Ayende posted some code he wrote which was throwing a serialization exception. You can see his post for the full code, but in a nut shell he was defining a large object graph (8192 objects that contain other objects) and was trying to send that over the wire. Here’s a short excerpt from the service definition:

   1: [ServiceBehavior(
   2:        InstanceContextMode = InstanceContextMode.Single,
   3:        ConcurrencyMode = ConcurrencyMode.Single,
   4:        MaxItemsInObjectGraph = Int32.MaxValue
   5:        )]
   6:    public class DistributedHashTableMaster : IDistributedHashTableMaster
   7:    {
   8:        private readonly Segment[] segments;
   9:  
  10:        public DistributedHashTableMaster(NodeEndpoint endpoint)
  11:        {
  12:            segments = Enumerable.Range(0, 8192).Select(i =>
  13:                                                        new Segment
  14:                                                        {
  15:                                                            AssignedEndpoint = endpoint,
  16:                                                            Index = i
  17:                                                        }).ToArray();
  18:        }
  19:  
  20:        public Segment[] Join()
  21:        {
  22:            return segments;
  23:        }
  24:    }
  25:  
  26:    [ServiceContract]
  27:    public interface IDistributedHashTableMaster
  28:    {
  29:        [OperationContract]
  30:        Segment[] Join();
  31:    }
  32:  
  33:    public class NodeEndpoint
  34:    {
  35:        public string Sync { get; set; }
  36:        public string Async { get; set; }
  37:    }
  38:  
  39:    public class Segment
  40:    {
  41:        public Guid Version { get; set; }
  42:  
  43:        public int Index { get; set; }
  44:        public NodeEndpoint AssignedEndpoint { get; set; }
  45:        public NodeEndpoint InProcessOfMovingToEndpoint { get; set; }
  46:  
  47:        public int WcfHatesMeAndMakeMeSad { get; set; }
  48:    }

As you can see in line 4 – the service is properly decorated with an attribute to enlarge the number of objects in graph. so looking at the code I initially suggested he add a few ServiceKnowType and DataContract/DataMember attributes on the data classes (as the serialization sometimes needs some guidance. After that didn’t help I actually ran the code and then I noticed that the code was missing setting that same attribute – on the client side. So to fix the problem, the client side code below

   1: var channel =
   2:     new ChannelFactory<IDistributedHashTableMaster>(binding, new EndpointAddress(uri))
   3:         .CreateChannel();
   4: channel.Join();
Need to change to something like
   1: var channelFactory =
   2: new ChannelFactory(binding, new EndpointAddress(uri));
   3:  
   4: foreach (var operationDescription in channelFactory.Endpoint.Contract.Operations)
   5: {
   6:  
   7: var dataContractBehavior =
   8:  
   9: operationDescription.Behaviors[typeof(DataContractSerializerOperationBehavior)]
  10:  
  11: as DataContractSerializerOperationBehavior;
  12:  
  13: if (dataContractBehavior != null)
  14: {
  15:  
  16: dataContractBehavior.MaxItemsInObjectGraph = int.MaxValue;
  17:  
  18: }
  19:  
  20: }
  21: var channel=channelFactory.CreateChannel();
  22: channel.Join();

The main problem I find with this piece of code is the fact that it is needed at all. As the post’s title suggest I find this behavior greatly affects the loose coupling of anything that uses WCF (services or other components).

WCF requires that any change you make to the channel on the server side would be reflected in the channel on each and every client (e.g. we have a similar setting where we enlarge message sizes for webHttpBinding and there are many other such examples).

Sure, you say, that is just like adding a new field in the contract isn’t it? – Well no it isn’t since unlike anything else which appears in the (verbose as it is) SOAP contract these changes in default values, which are purely a WCF design choice, are not documented. Again, the changes in default values are not part of the contract. These are things you need to remember to pass on to you service consumer. So not only do I pay the overhead of having an explicit contract (e.g. vs. REST) – it really doesn’t work.  It means that two components who use the same contract may not  be interchangeable if one returns more data (in this case). It means that the two sides are coupled by the need to change these defaults and for what? WCF is smart enough to know how long is the message; WCF is smart enough to handle the message (if I encourage it by setting a behavior) why can’t it add 2 and 2 by itself?

Sometimes I just wish WCF had a TrainingWheels or DemosOnly attribute I could just set to false and make all this crap go away…

</Rant>


 
Tags: .NET | SOA | WCF