April 28, 2010
@ 11:41 AM

After a long hiatus, I guess it is time for another SOA anti-pattern to see the light. It is probably also a good time to remind you that I am looking for your insights on this project. In any event I hope you’d find this anti-pattern useful and as always comments are more than welcomed (do keep in mind this is an unedited draft :) )

-------------------------------------

There are many unsolved mysteries, you’ve probably heard about some of them like the Loch Ness monster, Bigfoot etc. However, the greatest mystery, or so I’ve heard, is getting the granularity of services right… Kidding aside, getting right-sized services is indeed one of the toughest tasks designing services – there’s a lot to balance here e.g. the communications overhead, the flexibility of the system, reuse potential etc. I don’t have the service granularity codex and deciding the best granularity depends on the specific context and decisions (e.g. the examples in the Knot anti-pattern above). It is an easier task to define what shouldn’t be a service for instance, calling all of your existing ERP system a single service should definitely be shunned. The Nanoservices anti-pattern talks about the other extreme… the smaller services

Consider, for instance, the “calculator service” which appears in samples web-wide (I’ve personally seen examples in .NET, Java, PHP, C++ and a few more). A basic desk calculator, as we all know, supports several simple operations like add, subtract, multiply and divide and sometimes a few more. Implementing a calculator service isn’t very complicated - Listing 10.1 below, for example, shows part of WSDL for a java calculator service that, lo and behold, accepts two numbers and adds them.

Listing 10.1 excerpt from a WSDL of a stateless calculator service example. The sample only includes the data needed for the “Add” operation. The add operation accepts two numbers and returns a result (http://cwiki.apache.org/GMOxDOC21/jaxws-calculator-simple-web-service-with-jax-ws.html)

<wsdl:types>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

xmlns="http://jws.samples.geronimo.apache.org"

targetNamespace="http://jws.samples.geronimo.apache.org"

attributeFormDefault="unqualified" elementFormDefault="qualified">

<xsd:element name="add">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="value1" type="xsd:int"/>

<xsd:element name="value2" type="xsd:int"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="addResponse">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="return" type="xsd:int"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xsd:schema>

</wsdl:types>

<wsdl:message name="add">

<wsdl:part name="add" element="tns:add"/>

</wsdl:message>

<wsdl:message name="addResponse">

<wsdl:part name="addResponse" element="tns:addResponse"/>

</wsdl:message>

<wsdl:portType name="CalculatorPortType">

<wsdl:operation name="add">

<wsdl:input name="add" message="tns:add"/>

<wsdl:output name="addResponse" message="tns:addResponse"/>

</wsdl:operation>

</wsdl:portType>

<wsdl:binding name="CalculatorSoapBinding" type="tns:CalculatorPortType">

<soap:binding style="document" transport="http://schemas.xmlsoap.org/soap/http"/>

<wsdl:operation name="add">

<soap:operation soapAction="add" style="document"/>

<wsdl:input name="add">

<soap:body use="literal"/>

</wsdl:input>

<wsdl:output name="addResponse">

<soap:body use="literal"/>

</wsdl:output>

</wsdl:operation>

</wsdl:binding>

<wsdl:service name="Calculator">

<wsdl:port name="CalculatorPort" binding="tns:CalculatorSoapBinding">

<soap:address location="http://localhost:8080/jaxws-calculator/calculator"/>

</wsdl:port>

</wsdl:service>

Calculator services can be even more advanced and have memory - consider listing 10.2 below, which shows an interface definition for a .NET (WCF) sample that uses workflow services and accepts a single value at a time

Listing 10.2 a Service contract definition for a statufil calculator service (http://msdn.microsoft.com/en-us/library/bb410782.aspx). The service accepts a single number at a time and remembers the former state from operation to operation.

[ServiceContract(Namespace = "http://Microsoft.WorkflowServices.Samples")]

public interface ICalculator

{

[OperationContract()]

int PowerOn();

[OperationContract()]

int Add(int value);

[OperationContract()]

int Subtract(int value);

[OperationContract()]

int Multiply(int value);

[OperationContract()]

int Divide(int value);

[OperationContract()]

void PowerOff();

}

The calculator service (both versions of it) is a very fine grained service. Naturally, or hopefully anyway, the calculator examples are just over simplified services used to demonstrate SOA related technologies (JAX-WS in the first excerpt and WCF and WF in the second one). The problem is when we see this level of granularity in real life services

1.1.1Consequences

Problem? Why is “fine granularity” a problem anyway? Isn’t SOA all about breaking down monolith “silos” into small reusable services? More so, the finer grained a service is, the less context it carries. The less context a service carries the more reuse potential it has – and reuse is one of the holy grails of SOA isn’t it? The calculator service above seems like the epitome of a reusable service. There’s no doubt we can reuse it over and over and over.

Reuse is indeed a noble goal (I’ll leave discussing how real it is for another occasion), the culprit of fine grained services, however, is the network. Services are consumed over networks – both local (LANs) and remote (extranets, WANs etc.). The result is that services are bound by the limitations and costs incurred by those network. Trying to disregard these costs is exactly what ailed most, if not all, RPC distributed system approaches that predated SOA (Corba, DCOM etc.) - The calculator service and other similarly sized services are nanoserivces.

Nonoservice is an Anti-pattern where a service is too fine grained. Nanoservice is a service whose overhead (communications, maintenance etc.) out-weights its utility.

So how can nanoservices harm your SOA? Nanoservices cause many problems, the major ones being poor performance, fragmented logic and overhead. Let’s look at them one by one

Every time we send a request to a service we incur a few costs such as serialization on caller, moving caller process to the OS network service, translation to the underlying network protocol, traveling on the network, moving from the OS network service to the called process, deserialization on the called process – and that’s before adding security (encryption, firewalls etc), routing , retries etc. Modern networks and servers can make all this happen rather fast but if we have a lot of nano-services running around these numbers add-up to a significant performance nightmare,

Nano-services cause fragmented logic - almost by definition. As we break what should have been a meaningful cohesive service, into miniscule steps our logic is scattered between the bits that are needed to complete the business service. The fact that you need to haul over several services to accomplish something meaningful also spell increased chances of the Knot anti-pattern, mentioned above.

Proliferation of Nanoservices also causes development and management overhead. Just look at the amount of WSDL needed to define the calculator services in listing 10.1 above and for what? A service that adds a couple of numbers… There is a relatively fixed overhead associated with managing a service. This include things like keeping track of a service in a service registry, making sure it adheres to policy, writing the cruft (things we have to write around the business logic) for configuring it etc. Having nano-services around means we have to do this a whole-lot more times (i.e. per service) compared with having fewer coarser grained services.

The point of overhead out-weighing utility that appears in the Nano-services definition above is subtle but important. The fact that a contract does not have a lot of operations means we want to make sure we don't have a nano-service, but it doesn't automatically mean that it is. For instance, a fraud detection service contract might only accept transaction details and decide whether to authorize the transaction, deny it or move to further investigation. However the innards of this service involve a complex process like running the details in a rule engine checking for fraudulent behavior patterns, matching to black lists etc. In fact Fraud detection is such a complicated issue that these are actually systems and a SOA based one would be comprised of several services in itself.

The other side of the equation is also true a comprehensive contract does not guarantee a service is not a nano-service. For instance, in a system I designed on the initial iterations we developed a resource management service. It supported some very nice operations like getting status of all the services in the system, running sagas and of course allocating services. Allocating services meant that whenever an event went out that needed a (new) service instance to handle it, we had to make a call to the resource manager to get one. This provides for a neat centralized management and also for a performance bottleneck that slows the whole system. To solve this we went with distributed resource management but that't beyond the scope of this discussion. The point, however that is that the utility of the resource management (e.g. easy management of running sagas ) vs. overhead associated with the service (the number of calls and performance hit on the system) was not worth it – Hench a nano-service.

1.1.2Causes

From a more technical point of view, we get to nanoservices from not paying attention to at least a couple of the fallacies of distributed computing. Mentioned in chapter 1, the fallacies of distributed computing are a few false assumptions that are easy to make and prove to be wrong and costly down the road. Specifically, we are talking here about assuming that

§ Bandwidth is infinite – Even though bandwidth gets better and better, it is still not infinite within a specific setup. For instance in one project we were sending images over the wire and distribute them to computational services (a la map/reduce – see also Gridable Service in chapter 3). Things were working ok when we sent small images, but when we sent larger images we understood we were sending them as bitmaps and not as much more compact jpegs which caused a burden on the backbone of our switches which wasn’t ready for that load.

§ Transport cost is zero – As explained in the previous section every over-the-wire call incurs a lot of costs vs. a local call (also see figure 10.5 below).The costs of the transport can be considered both from the time it take to make each of these calls but even the real dollar value attached to making sure you have enough bandwidth (connection/routers, firewalls) to handle the traffic incurred

clip_image002

Figure 10.5 Local objects can “afford” to have intricate interactions with their surroundings. A similar functionality delivered over a network is more likely than not to cause poor performance because of the network related overhead.

Another reason to get Nano-Services, at least for beginners are poor examples – as, noted the calculator services above are taken from real examples provided by various vendors. SOA newcomers and/or people without a lot of distributed systems development experience can be easily take these samples at face value, and go about implementing services with similar granularity. The fact that that web-service framework mostly map service calls to object method calls makes this even more tempting.

Nano-services is also an inherent risk when applying the orchestrated choreography pattern. Adding an orchestration engine, capable of controlling flow and external to services tempts us to think that we can use it to drive all flow as little as it may seem. Couple this with the fact that the smaller the services are the more “Reuseable” they are (less context) and, again, you may end up with a lot of nano-services on your hands.

Lastly, since the nano-services boundary is soft (remember utility vs. overhead weight) behaviors that can look promising at design time can prove to be nano-services moving along (like the resource manager example above). This can be an acceptable if your SOA is developed iteratively (see 10.2.4 exceptions below) but it still mean that we have to come up with ways to refactor nano-services.

1.1.3Refactoring

There are basically two main ways to solve the nano-services problem. One, which is relatively easy, is to group related nano-services into a larger service. The second option, which is more complicated, is to redistribute its functionality among other services. Let's take a look at them one by one.

On one project I was working on we needed to send out notifications to users and admins via SMS messages. Since the software component that did the actual SMS dissemination was a 3rd party app we’ve decided to create a simple service (not unlike OO adapter) that accepts requests for SMS and talks to the 3rd party software. A nano-service was born, it even got a nice little name Post Office Service (ok, ok the original name was Spam Server but I thought it would look bad in presentations J).

Why is this a nano service? Well, it really doesn’t do much and it would be even simpler to package this as a library that other services can use and it does have all the management overhead of maintain as another system service.

What we did about it was to add similar functionality to the service so it also learned to send emails, tweets and MMSs. A serendipitous effect of this was that now instead of sending a request like TweetMessage or SendSMS to this service we could now raise more meaningful events such as SystemFailureEvent and have the service make decisions on how to alert administrators based on the severity of the problem etc. So combining the related functionality helped make the overall service even more meaningful.

Unfortunately it isn’t always possible to take the functionality of Nano-services and find suitable “other services” (nano-or right-sized) that can assimilate them. In those cases getting rid of a nano-service is more of an exercise in redesign than is a refactoring. For instance, in a project we’ve built we had a services allocation service (SAS). The SAS role was to know about other services location and health status and utilization and upon a request, such as beginning a Saga (see chapter 5 for the saga pattern) decide what service instances should be used. The service also provided “reporting” capabilities for active sagas, services utilization etc. This might not sound like a nano-service, and at first we thought so too, but as the project progressed we found that being a central hub, as seen in figure 10.6 below, made the SAS a performance bottleneck, incurring additional costs (in latency) on a lot of the calls and interactions made by other services. The utility of the SAS, of finding what service instance to talk to, was being diminished by the cost – yep it is a nano-service after all.

clip_image004

Figure 10.6 An example for a nano-service. The SAS service is a performance bottleneck as a lot of calls go through it. It provides an important service but the costs of its are too dear.

To solve the SAS problem we had to put in quite a lot of work. The solution, essentially was to move to distributed resource management, so that each service had some knowledge of what the world looks like so that it could decide what service instant to talk to by itself.

To sum this section, sometimes it is easy to notice that something is a nano-services, chances are that in these cases it would also be easy to take the functionality and group it with related functionality in other services. However on other occasions the fact that a service provides too little benefit is not as apparent and only becomes clear as we move along. In those cases it is also harder to fix the problem. One question we still need to cover is are there any situations where we would go with a nano-service even if we know it is one on the onset.

1.1.4Known Exceptions

When is it ok to have Nano-services? When you are starting out. When your approach to SOA is evolutionary and you don’t plan everything in advance (something that rarely work anyway, but that’s another story), there’s a good chance that first versions of services you build will not show a lot of business benefit, but they will already need the full overhead of a service. The post office service in the example above is a good example for that as starting out it only dealt with a single type of message and it didn’t do a whole lot with it either.

The post office service is also a good example for another reason to have a nano-service which is when you want to build an adapter or bridge to other systems be that legacy systems or 3rd party ones. In these cases you need to weight the advantage of using a service vs. building the same functionality as a library that can be used within services, but in many cases keeping the flexibility and composability of SOA can triumph over the overhead associated with having an additional service to manage.

Lastly, one point to keep in mind is that NanoServices is a rather soft pattern and the value of a small service can radically change from system to system or even in a certain system as time and requirements progress. It is worthwhile questioning our assumptions and looking at the services that we grow from time to time to validate the usefulness of what we’re building.


 
Tags: SOA | SOA Patterns | Software Architecture

It has been quite awhile since I added anything new to the book. I have my reasons (some would probably say excuses :) ) mainly that finding the energy and time to write is very hard with a wife, 3 kids and a startup.

Anyway, I’ve been talking with Manning lately trying to figure out what to do with this project. I was quite amazed to learn that 1000 or so of you purchased the MEAP edition even though it only contains 5 chapters and haven’t been updated in a long time. (by the way I’ve also recently learned that book pirating sites are offering the book for download, but that’s another story). Anyway, we’re trying to decide what we want to do and this is where I’d love to hear some feedback from you

1. Do you think the book is still relevant today?

2. Do you think the book should be restarted form scratch to reflect recent

3. Do we need to cancel the book and end this fiasco ? or push it though to completion?

4. How valuable do you find the information in the book so far?

I am ready to put in time to to add enough patterns/anti-patterns to make the book releasable. but since I know it takes oodles of time I don’t really have I really want to know I am not wasting my time

Please leave a comment/drop me an email if you have anything to say

Thanks (and thanks for continues patience so far)


 
Tags: SOA | SOA Patterns | Software Architecture

Moving to architectures like SOA that increase the number of overall “moving parts” or components in the system means that reliability is going down. It is simple math really – if you have 10 components each with a 0.99 reliability then the total reliability is 0.99^10 or 0.904 and that’s before we take into account messages traveling over the wire and the network’s reliability (or lack thereof). What this does is leave us trying to build reliable systems from (a growing) bunch of unreliable components. I know, I know, there’s nothing new here. We’ve been using techniques like redundancy, statelessness etc. to help mitigate this since the beginning of times. With these techniques we decrease the “Mean Time Between Failure” (MTBF) but increase  the “Mean Time Between Critical Failure” (MTBCF) or the system’s overall MTBF.

Another aspect of reliability (and reliability calculations) is MTTR or “Mean Time To Repair” which in software mainly has to do with how much time does it take before we know something is wrong. The usual approach to that is monitoring which I’ve written about in the past (e.g. the blogjecting watchdog pattern). In this post I want to expand a little on another approach , which while not common in IT systems, can be useful at times.

Enter the BIT – which is short of “Built In Tests”. BIT is a technique I picked up when I worked on multi-disciplinary systems that also included embedded systems. Each and everyone of the embedded systems we developed (or integrated into the solution) supported BIT . Actually they usually supported several types of BIT at least PBIT, CBIT and IBIT

  • PBIT – Power-On Built In Test – usually a short test the system runs to make sure all of its components are ready to go. You actually saw this one a lot of times since this is what motherboards do as you turn them on (all the blips and lights etc.)
  • CBIT – Continuous Built In Test – Make sure the system is functioning, even when it isn’t really busy so we’ll know about problems before we actually try to use the system
  • IBIT – Initiated Built In Test – provides a way to find out exactly what’s wrong when one of the other test types failed

BIT is very understandable for embedded systems, after all these are closed boxes with limited access to their innards and inner workings. but isn’t that also true for SOAs? After all we are building a bunch of blackboxes that interact to provide some business benefit, how can we be sure that everything is working fine esp. when we don’t control fully control some of the parts?

As mentioned above, a system, especially a distributed one, is built from relatively unreliable components. A continuous test helps us make sure things are working as expected. What we are doing is taking some of the code we wrote to run integration and acceptance tests (which runs a scenario end-to-end) deploy it as a service into the system which we call “liveliness check” and have it run periodically. Every time the liveliness  runs it sends a notification (twitter message) so we know the test itself works. If it fails it sends more notifications (twitter, Email, SMS etc.) to an administrator.

This liveliness or CBIT serves as an early warning system. Since the end result is known in advance we can have a pretty good idea if something went wrong. E.g. we know how much time it should take for a test Id, we know what the result of that image is etc. The fact that it works even when the system is in low utilization means we can find out about problems and deal with them before they happen to end-users. That’s a big plus for us.

The advantage over regular monitoring solutions (this is not an either/or – monitoring is also needed) is that you know the specific business scenarios are properly working, which is a higher confidence that things are ok from knowing a specific server or service is running.

On the flip side, or the downside of adding a periodic liveliness is adding complexity into the system. In our case, we have to add a process to clean the traffic data added by the test messages. Also, while we try to make the system behave as usual as much as possible,  certain parts of the system will have to know about the test messages and handle them differently. Again, in our case the reporting has to know to disregard test messages and not count them. This is even more problematic in other types of systems, for instance if you simulate an order, you don’t want the purchase order to actually go out to a supplier.

To sum this up, adding a liveliness check as part of the system to create a continuous built-in-test can increase your confidence that things are working as they should. It can also help you identify problems earlier. Like everything in life, it doesn’t come without tradeoffs and you should weight your benefits vs. costs before utilizing it in your systems.


 
Tags: SOA | SOA Patterns | Software Architecture

October 30, 2009
@ 10:58 PM

Yes, this is another WCF rant…

We were getting ready to launch an open-for-all version of our service, we were also adding more cores to the system, to make sure our computation engine will be able to handle higher request loads (We’re basically implementing the Gridable Service pattern -  if it is interesting, I can expand on that in another post). We tried a few load tests, which on first look, seemed OK. However we then noticed that we get WCF timeouts on some of the calls.

WCF timeouts!? what gives? we already took care of all the annoying throttling defaults. After busting my head (and Google) for a few hours, I found that apparently, when using http bindings in WCF you only get 2 outgoing channels for each server (IP) you are connecting to. yep WCF thinks your services are novice users using a  browser. So if you have a service that tries to access a remote server with more than two requests simultaneously (say, under load…) you may find, like us, that you’d get occasional timeouts.

Yes, there’s also a solution, which also took time to find – it is called ConnectionManagment element in the network configuration (I am yet to find a programmable way to do this). So now the services app.config (see below) sets the value to 16 instead of 2 and everything is well again, at least until the next default hits us..

<system.net>
   <connectionManagement>
     <add address = "*" maxconnection = "16" />
   </connectionManagement>
 </system.net>

 

From trying to work with WCF in the last few years, it seems its abstractions are very leaky.They are almost leaky  to the point  of rendering it useless as a “unified” framework. So, in the spirit of the times, and as the title suggest – maybe they should just rename WCF to Windows Trick-or-treat Foundation – Alas, we are getting more and more of the “trick’ part rather that the “treat”, but at least the acronym fits well.


 
Tags: .NET | SOA Patterns | WCF

Yesterday I gave a talk on SOA pattern on the European Virtual Alt.Net user group. You can find the recording of that talk here as well as download a pdf of the slides.

Before I’ll talk a little about the substance I want to say a few words about office-live meeting (the platform used for the presentation). To sum this in one word the experience was horrid. It took me more than 35 minutes just to upload my presentation. Then I had to switch to windows XP (VM in parallels) to speak since it has problem with Windows 7 (low sound volume). However, the worst thing is that throughout the presentation I constantly lost control of the slides progress (i.e. couldn’t move the slides forward), which was very distracting. 

Anyway, if ignoring all that, I think overall the presentation is still beneficial and  addresses a  few interesting issues that are challenging like flexibility, reporting and management of SOA. If I am to sum the presentation I’d say that  when you build a system on SOA you get a system built of (relatively) a lot of components of questionable reliability. You can reap a lot of benefits in the flexibility department, but you have to address several challenges in the performance, availability, management (etc.) departments. Additionally you  need to look at the overall solution from an holistic viewpoint since different parts of the solution can push in different direction or  only cover part of the picture.

Lastly thanks to Jan and Colin for organizing the event and for all the attendees for giving me an hour and half of their time


 
Tags: .NET | SOA | SOA Patterns | Software Architecture

I begun writing SOA patterns a long time ago. I was making nice progress when suddenly  xsights happened and my free time evaporated. Now,  2 years or so later, we’re finally in production. Since the progress on the book has, hmm how shall I put it, been hampered by xsights,   I thought it would, at least, be appropriate to share a some details on how the ideas presented in the book (written, half-written and yet-to-be-written) are being put to use. As it happens, this also coincided with  Jan Van Ryswyck & Colin Jack  asking me if I’d be interested to in presenting something to the European Virtual Alt.Net group (E-VAN).

So here we are. Next week, I am going to be talking about SOA patterns. I am going to present a few common SOA challenges (availability, flexibility, Reporting, multi-tenancy.) and discuss the patterns and implementation we are using to meet them.  I am still finalizing the presentation so if you have any questions you’d like me to answer, feel free to send them to me (either my email or a comment here) and I’ll do my best to address as many of the questions as possible

Hope to see you there.

Start Time: Monday, October 05, 2009 07:00 PM GMT*

End Time: Monday, October 05, 2009 08:30 PM GMT

Attendee URL: http://snipr.com/virtualaltnet (Live Meeting)

VAN Calendar: http://www.virtualaltnet.com/Home/Calendar

(*) 08:00 PM UK, 09:00 PM Brussels/Israel, 02:00 PM EST and 11:00 AM PST


 
Tags: .NET | SOA | SOA Patterns | xsights

I noticed that the images and code samples are a little off on the blog (I have to admit I just pasted it from word, and we all know the great HTML that produces…). To help remedy this I am also making this pattern available in PDF from.

The next pattern I am going to publish is actually an anti-pattern called “NanoServices”, which as the name implies is about making services too small. I hope to have that ready early next week. Next after that would be the “Aggregated Reporting” pattern. Aggregated Reporting is aimed at solving the dispersed data problem that autonomy and a lot of services creates.

Any thoughts (on the pattern or otherwise) are welcomed


 
Tags: .NET | Java | SOA | SOA Patterns | Software Architecture

September 8, 2009
@ 06:53 PM

1.1 Reservation

When you use transactions in “traditional” n-tier systems life is relatively simple. For instance, when you run a transaction and an error or fault occurs you abort the transaction and easily rollback any changes – getting back your system-wide consistency and peace of mind. The reasons this is possible is that a transaction isolates changes made within it from the rest of the world. One of the base assumptions behind Transactions is that the time that elapses from the beginning of the transaction until it ends is short. Under that assumption we can afford the luxury of letting the transaction hold locks on our resources (such as databases) and mask changes from others while the transaction is in progress. Transactions provide four basic guarantees – Atomicity, Consistency, Isolation and Durability, usually remembered by their acronym - ACID.

Unfortunately, in a distributed world, SOA or otherwise, it is rarely a good idea to use atomic short lived transactions (see the Cross-Service Transactions anti-pattern in chapter 10 for more details). Indeed, the fact that cross service transactions are discourages is one of the main reasons we would to consider using the Saga pattern in the first place.

One of the obvious shortcomings of Sagas is that you cannot perform rollbacks. The two conditions mentioned above, locking and isolation do not hold anymore so you cannot provide the needed guarantee. Still, since interactions, and especially long running ones, can fail or be canceled Sagas offer the notion of Compensations. Compensations are cool; we can’t have rollbacks so instead we will reverse the interaction’s operation and have a pseudo rollback. If we added one hundred (dollars/units/whatnot) during the original activity we’ll just subtract the same 100 in the compensation. Easy, right?

1.1.1 The Problem

Wrong – as you probably know, it isn’t easy. Unfortunately, there are a number of problems with compensations. These problems come from the fact that, unlike ACID transactions, the changes made by the Saga activities are not isolated. The lack of isolation means that other interactions with the service may operate on the data that was modified by an activity of other sagas, and render the compensation impossible. To give an extreme example, if a request to one service changes the readiness status of the space shuttle to “all-set” and another service caused the shuttle to launch based on that status, it would be a little too late for the first service to try to reverse the “all-set” status now that the “bird has left the coop”. A more down to earth (pardon the pun) business scenario is any interaction where you work with limited resources e.g. ordering from a, usually limited, stock.

Consider, for instance, the scenario in figure 6.1 below. A customer orders an item. The ordering service requests the item from the warehouse as it wants to ship the item to the customer (probably by notifying another service). Meanwhile on the warehouse service the item ordered causes a restocking threshold to be hit which triggers a restocking order from a supplier. Then the customer decides to cancel the order – now what?

6.1

Figure 6.1 Chapter 6 focus is about connecting Services with Service consumers in the levels and layers beyond the basic message exchange patterns.

Should the restocking order be cancelled as well? Can it be cancelled under the ordering terms of the supplier? Also a customer requesting the item between the ordering and cancellation might get an out of stock notice which will cause him to go to our competitors. This can be especially problematic for orders which are prone for cancellations like hotel bookings, vacations etc.

Another limitation of compensations and the Saga pattern itself, for that matter, is that it requires a coordinator. A coordinator means placing trust in an external entity, i.e., outside (most) of the services involved in the saga, to set things straight. This is a challenge for some of the SOA goals as it compromises autonomy and introduces unwanted coupling to the external coordinator.

The question then is

How can we efficiently provide a level of guarantee in a loosely coupled manner while maintaining services’ autonomy and consistency?

We already discussed the limitations of compensations, which of course is one of the options to solve this challenge. Again, one problem is that we can’t afford to make mini changes since we will then be dependent on an external party to set the record straight. The other problem with compensations is that we expose these “semi-states” – which are essentially, the internal details of the services, to the out-side world. Increasing the footprint of the services’ contract, esp. with internal detail, makes the services less flexible and more coupled to their environment (See also the white box services anti-pattern in chapter 10)

We’ve also mentioned that distributed transactions is not the answer since they both lock internal resources for too long (a Saga might go on for days..?) as well as put excess trust on external services which may be external to the organization.

This seems like a quagmire of sorts, fortunately, real life already found a way to deal with a similar need for fuzzy, half guarantees – reservations!

1.1.2The Solution

Implement the Reservation pattern and have the services provide a level of guarantee on internal resources for a limited time

6.2

Figure 6.2 The Reservation pattern. A service that implement reservation consider some messages as “Reserving” in which it tries to secure an internal resource and sends confirmation if it succeeds. When a message considered as “confirming” the service validate the reservation still holds. In between the service can choose to expire reservation based on internal criteria

The Reservation pattern means there will be an internal component in the service that will handle the reservations. Its responsibilities include

§ Reservation - making the reservation when a message that is deemed “reserving” arrives. For instance when an order arrives, in addition to updating some durable storage (e.g. database) on the order it needs to set a timer or an expiration time for the order confirmation alternatively it can set some marker that the order is not final.

§ Validation – making sure that a reservation is still valid before finalizing the process. In the ordering scenario mentioned before that would be making sure the items designated for the order were not given to someone else.

§ Expiration – marking invalid reservation when the conditions changed. E.g. if a VIP customer wants the item I reserved, the system can provision it for her. It should also invalidate my reservation so when I finally try to claim it the system will know it’s gone. Expiration can also be timed, as in, |we’re keeping the book for you until noon tomorrow”

Reservations can be explicit i.e. the contract would have a ReserveBook action or implicit. In case of an implicit order the service decides internally what will be considered as Reserving message and what will be considered as confirming message e.g. an action like Order, will trigger the internal reservation and an action like closing the saga will serve as the confirming message. When the reservation is implicit the service consumer implementation will probably be simpler as the consumer designers are likely to treat reservation expiration as “simple” failures whereas when it is explicit they are likely to treat the reservation state.

Reservations happen in business transactions world-wide every day. The most obvious example is making a ordering a flight. You send in a request for a room (initiate a saga) saying you’d arrive on a certain date, say for a conference, and check out on another (complete the saga). The hotel says ok, we have a room for you (reservation) – provided you confirm your arrival by a set-date (limited time). Even if everything went well, you may still arrive at the hotel, only to find out your room has been given to another person (limited guarantee). The idea of the reservation pattern is to copy this behavior to the interaction of services so that services that support reservations offer a sort of “limited lock” for a limited time and with a limited level of guarantee. Limited level of guarantee, means that like real life, services can overbook and then resolve that overbooking by various strategies such as fist come, first served; VIP first served etc

It is easy to see Reservation applied to services that handle “real-life” reservations as part of their business logic, such as a ordering service for hotels (used in the example above) or an airline etc., However reservations are suitable for a lot of other scenarios where services are called to provide guarantees on internal resources. For instance, in one system I built we used reservations as part of the saga initiation process. The system uses the Service Instance pattern (see chapter 3) where some services are stateful (the reasons are beyond the scope of this discussion). Naturally, services have limited capacity to handle consumers (i.e. an instance can handle n-number of concurrent sagas/events).

This means that when a saga initialized all the participants of the saga needs to know the instances that are part of the saga. As long as a single service instance initiates sagas everything is fine. However, as illustrated in figure 6.3 below, when two or more services (or instances) initiate sagas concurrently they may (and given enough load/time they will) both try to allocate the same service instance to their relative sagas. In the illustration we see that both Initiator A and Initiator B want to use Participant A and Participant B. Participant A has a capacity of 2 so everything is fine for both Initiators. Service B, however, has limited capacity so at least one of the Sagas will have to fail the allocation, i.e. not start.

6.3

Figure 6.3 : Sample for a situation that can benefit from the reservation pattern

The reservation pattern enabled us to manage this resource allocation process in an orderly manner by implementing a two pass protocol (somewhat similar to a two phase commit). The initiator asks each potential participant to reserve itself for the saga. Each participant tries to reserve itself and notify back if it is successful – so in the above scenario, A would say yes to both and B would say yes to one of them. If the initiator gets an OK from all the involved services (within a timeout) it will tell all the participants the specific instances within the saga (i.e. initiate it).

The participants only reserve themselves for a short period of time. Once an internally set timeout elapse the participants remove the commitment independently. As a side note, I’ll just say that the initiator and other saga members can’t assume that the participant will be there just because they are “officially” part of the saga and the system still needs to handle the various failure scenarios. The Reservation pattern is used here only to help prevent over allocation and it does not provide any transactional guarantees.

A reservation is somewhat like a lock and thus it “somewhat” introduce some of the risks distributed locks presents. These risks aren’t inherent in the pattern but can easily surface if you don’t pay attention during implementation (e.g. using database locks for implementation).

The first risk worth discussing is deadlock. Whenever you start reserving anything, esp. in a distributed environment you introduce the potential for deadlocks. For instance if both participants had a capacity for single saga, initiator A contacts participant A first and participant B next and initiator B used the reverse order – we would have had a deadlock potential. In this case there are several mechanisms that prevent that deadlock. The first is inherent to the Reservation pattern, where the participants release the “lock” themselves. However, for example, if there is a retry mechanism to initiate the sagas (as both would fail after the timeout) and the same resources will be allocated over and over there may be a deadlock after all

Another risk to watch out from when implementing Reservations is Denial of Service (whether maliciously or as an byproduct of misuse). DoS can happen from similar reasons discussed in the deadlock (i.e. if you incur a deadlock you also have a DoS). Another way is via exploiting the reservations by constantly re-reserving. Depending on the reservation time-out, regular firewalls might fail detecting the DoS so you may want to consider using a Service Firewall (chapter 4) to help mitigate this thread.

Besides the risks discussed above, another thing to pay attention to is that when you introduce Reservation, you are likely to add additional network calls. The system discussed above mention that when it introduce another call tell the Saga members which instances are involved in the saga.

In addition to the Service Firewall pattern, mentioned above, another pattern related to Reservations can be the Active Service pattern (see chapter 2). The Active Service pattern can be used to handle reservation expiration when implemented by timed. Note however, that sometimes better, resource-wise, to handle expiration passively and not actively as we’ll see looking at s implementation options in the next section.

1.1.3Technology Mapping

Unlike a lot of the patterns in this book, the Reservation pattern is more a business pattern than a technological one. This means there isn’t a straight one-to-one technology mapping to make it happen. On the other hand, code-wise, the pattern is relatively easy to implement.

One thing you have to do is to keep a live thread at the service to make sure that when the lease or reservation expires someone will be there to clean up. One option is the Active Service pattern mentioned above. You can use technologies that support timed events provide the “wakeup service” for you. For instance if you are running in an EJB 3.0 server you can use single action timers i.e. timers that only raise their event once to accomplish this. Code listing 6.1 below shows a simple code excerpt to set a timer to go off based on time received in the message. Other technologies provide similar mechanism to accomplish the same effect.

Code Listing 6.1 setting a timer event for a timer based on a message to set the timer (using JBOSS )

public class TimerMessage implements MessageListener {

@Resource

private MessageDrivenContext mdc;

.

.

.

public void onMessage(Message message) {

ObjectMessage msg = null;

try { #1

if (message instanceof ObjectMessage) {

msg = (ObjectMessage) message;

TimerDetailsEntity e = (TimerDetailsEntity) msg.getObject();

TimerService timerService = messageDrivenCtx.getTimerService();

// Timer createTimer(Date expiration, Serializable info) #2

Timer timer = timerService.createTimer(e.Date, e);

}

} catch (JMSException e) {

e.printStackTrace();

mdc.setRollbackOnly();

} catch (Throwable te) {

te.printStackTrace();

}

}

.

.

.

(Annotation) <#1 some vanilla code to process a message and get the interesting entity out of it >

(Annotation) <#2 Here is where we set the single action timer based on the info in the message we’ve just got>

Timer based cancellation, as described above, might be an overkill if the reservation implementation is simple. For instance the Reservation in listing 6.2 below (implemented in C#) is used by the participants discussed in the Saga and reservation sample discussed in the previous section.

Code Listing 6.2 Simple in-memory, non-persistent reservation

public Guid Reserve(Guid sagaId)

        {

            try

            {

                Rwl.TryWLock();

                var isReserverd = Allocator.TryPinResource(localUri, sagaId);

                if (!isReserverd) #1

                    return Guid.Empty;

//Some code to set the expiration #2

                return sagaId; #3

            }

            finally

            {

               Rwl.ExitWLock();

            }

        }

(Annotation) <#1 The allocator is a resource allocation control, which manages, among other things, the capacity of the service. If we didn’t succeed in marking the service as belonging to the Saga, we can’t allocate the service to the specific Saga>

(Annotation) <#2 Here is where we need to add code to mark when the reservation expired, the previous example (6.1) used timers , we’ll try to do something different here>

(Annotation) <#3 successful reservation returns the SagaId this assures the caller that the reply it got is related to the request it sent – a simple Boolean might be confusing >

Since the Reservation in listing 6.2 does not involve heavy service resources (like, say, a database etc.), we can implement a passive handling of reservation expiration, which will be more efficient than a timer based one. Listing 6.3 below shows both a revised reservation implementation which removes timeout reservation before it commits. Note that an expired reservation can still be committed if no other reservation occurred in between or the capacity of the service is not exceeded.

Code Listing 6.3 passive reservation expiration handling (added on top of the code from listing 6.2)

public Guid Reserve(Guid sagaId)

        {

            try

            {

                Rwl.TryWLock();

                RemoveExpiredReservations(); #1

                var isReserverd = Allocator.TryPinResource(localUri, sagaId);

                if (!isReserverd)

                    return Guid.Empty;

                OpenReservations[sagaId] = DateTimeOffset.Now + MAX_RESERVERVATION; #2

                return sagaId;

            }

            finally

            {

               Rwl.ExitWLock();

            }

        }

private void RemoveExpiredReservations()

        {

            var reftime = DateTimeOffset.Now;

            var ids = from item in OpenReservations where item.Value < reftime select item.Key;

            if (ids.Count() == 0) return;

            var keys=ids.ToArray();

            foreach (var id in keys)

            {

                OpenReservations.Remove(id);

                Allocator.FreePinnedResources(id);

            }

        }

(Annotation) <#1 Added a small method (RemoveExpiredReservations which also appears in the listing) to clean expired reservations. This method is ran everytime the service needs to handle a new reservation request and it cleans up expired reservations. Note that there is no timer involved, reservation are only cleaned if there is a new reservation to process>

(Annotation) <#2 Instead of a timer the reservation is done by marking down when the reservation will expire>

The code samples above show that implementing Reservation can be simple. This doesn’t mean that other implementations can’t be more complex. For example if you want/need to persist the reservation or distribute a reservation between multiple service instances etc., but at its core it shouldn’t be a heavy or complex process.

Another implementation aspect is whether reservations are explicit or implicit. Explicit reservation means there will be a distinct “Reserve” message. This usually means there will also be a “Commit” type message and that the service or workflow engine that request the Reservation might find itself implementing a 2-phase commit type protocol, which isn’t very pleasant, to say the least.

The other alternative is implicit where the service decides internally when to reserve and what conditions to commit the reservation and when to reject it. As usual the tradeoff is between simple implementation to the service and simple implementation for the service consumer

1.1.4Quality Attributes

As usual, we wrap up pattern by taking a brief look at some business drives (or scenarios) that can drive us to use the reservation pattern.

In essence, the main drive to reservation is the need for commitment from resources and since it is a complementary pattern to Sagas it also has similar quality attributes. As mentioned above Reservation helps provide partial guarantees in long running interactions thus the quality attribute that point us toward it is Integrity.

Quality Attribute (level1)

Quality Attribute (level2)

Sample Scenario

Integrity

Correctness

Under all conditions, failure receive payment within 5 business days will cancel the order and shipping

Integrity

Predictability

Under normal conditions, the chances of a customer getting billed for a cancelled order shall be less than 5%

Table 6.2 Reservation pattern quality attributes scenarios. These are the architectural scenarios that can make us think about using the Decoupled Invocation pattern.

Reservations is a protocol level pattern which that involves Reservation involves exchange of messages between service consumers and services. The next pattern is one of the enablers of such message exchange , it is also a one of the confusing pattern since a lot of commercial offerings which include it include gazillion other capabilities - yes I am talking about the ServiceBus


 
Tags: .NET | Java | SOA | SOA Patterns | Software Architecture

A lot have been written and said about multiple use (or reuse depending on your definition) of services. I want to touch one aspect of this with this post.

As a general rule, the more something is generic or small the easier it is to use it in different contexts, for example Hash tables are used all over the place in a lot of programs. The Hash table is a generic container and carries very little in terms of business context so it is very easy to use it. A corollary to the above mentioned rule is that the more specific is something the harder it is to use it in different contexts. Unfortunately (from the “use” point of view) specific domain logic is exactly what we strive to have with SOA.  The value of services is derived from the business value they can generate. To add insult to injury, there’s also a limitation on how small we’d want a service to be. The fact that  communicating with  a service requires communication over a network means that if we’ll make it too small, the overhead in getting to it (serialization, network traffic, security etc.) can out weight its utility (an anti-pattern I call nano-services)

Well, one thing you can try to do is remove the business context from the services. before you flame me about  how this matches my previous statement that services’ value comes from the business value or domain know how they provide, you should note that I said “business context” and not business logic.

Let me try to clarify this with a concrete example from my current system. At xsights we provide image identification services for mobile devices. for instance when you see a movie ad, you can take a picture with your mobile, send it to us, via MMS, a specific client or video call, and we provide related information such as the trailer, where to buy tickets etc. Our initial offering supported only video calls (for business reasons irrelevant for this post). In a video call you have a constant stream of incoming video from the handset (10-15 frames per second) so we (try to) identify frames as they come. We mostly use event driven architecture over SOA so (a partial) flow looks something like the following (the events occur in the context of a saga). An extractor service listens on an RTP stream, extract and preprocess images and raises a FrameArrived event on each new frame. An Identification GW decides how to handle an incoming frame and directs it to one or more algorithmic workers (this isn’t event driven). After a successful identification the Identification GW raises a LinkFound event. And a Call Flow service takes it from there:

image 

if we didn’t get an identification within a timeout we can ask the user to better aim the camera or whatnot (behavior controlled by the CallFlow service)

When we first added support for MMS  we wanted to use the same identification logic – there’s a slight difference though: in a video call you have a constant stream of low-quality images where as in an MMS you get a single high(er) quality shot. To add support for MMS we needed to add some logic to the identifier so that it will know whether the origin of the image is an MMS message or a video call. If it is the first then the Identifier needs to raise a “failed to identify” even when it finished processing the image (the video call can use a timeout instead)

But that’s the wrong way to do it – since now we need to know which sagas are MMS sagas and which are Video call ones. Not to mention we would probably need some other “special” logic to handle clients (which indeed we needed) . If we go down this lane and add more and more business context to the identifier we make it less autonomous – even though we are using events they are no longer about the business of the service events (like “FrameArrived” from the extractor) they are system context events (“MMSIdentificationFailed”) our identifier is gaining more and more “reasons to change” and is becoming tightly couples to specific contexts. So yes, we using it over and over again but the costs for that are getting higher with each such reuse

What’s a better way? Remove the business context from the service and focus on keeping the business logic and rules. In this case that would be a NoMatchForFrame event for each failed frame. In an MMS related saga there would be a service that listens to this event, in a video call related saga no service will listen on the event*. Once the business context is removed our identification GW focuses only on its core business activity (routing images to algorithmic workers and notifying the world on success/failure. Adding support for client behavior becomes much easier in this can, in fact the identification GW doesn’t need any changes to support this scenario.

To sum this post – if you want to increase the chance to use services in different contexts you should strive to remove the context specific bits outside of the services. This will simplify the services themselves as well as increase their autonomy


* Out communications framework allows for different event wiring (or route) depending on the saga “type” so actually the event won’t even fire in a video call  as our communication framework will identify there aren’t any subscribers. This is very good from the service point of view as it allows it to fire events and letting the the communications framework worry about the context. The saga initiator is the only place where the context has to be specified (I’ll expand on this in another post)
 
Tags: SOA | SOA Patterns | Software Architecture

Michael Poulin @ ebizq doesn’t like the Active Service pattern I suggest you read his post first but in a nutshell Michael sees two possible ways to understand the term Active Service:

“a) service view - a service that actively looking for companions to complete its own task
b) consumer view – a service which triggers its own execution by itself”

…and he doesn’t like both…

I think that both of these definitions aren’t that far… and I like both :)

The way I see it there are two concern here

1. Are services only reactive (“passive”)  ? - i.e. The service only “works” when it gets a request from a service consumer (user/another service/an orchestration engine) ? If the service also has at least one thread working to do internal stuff (e.g. scavenging outdated data, pre-fetching data from other service etc.) then that’s what I call an Active Service (option “b” above)

2.  How do services get data they need to complete a request when they actually get a request – There are many possibilities here: events, pub/sub, an orchestration engine that takes care of that, services that check for a known contract in a registry and then go to that service, even hardcoded. The options where the service looks for other services (e.g. using a registry) is option "a” above.

So basically all the options are valid a service can be a+b just a or just b or none and, in my eyes, these are orthogonal concerns.

Regarding pre-fetching – I think this can be beneficial as a way to achieve caching. Note that if you control both sides and you’ve got the needed infrastructure then it is probably better to push changes (eventing or pub/sub) but that’s not always the case.

In the comment I left on Michael’s blog I talked about different strategies for services “There are several strategies for that - one is to take that knowledge out of the service (e.g. using choreography or orchestration), providing a subscription and/or wiring infrastructure i.e. something that will tell you where to find certain contracts, hard coding , registry , using uniform interfaces (e.g. REST) etc.”

lets take a concrete (albeit very very simplistic) scenario to illustrate some of the approaches

Business scenario: When a customer makes an order we want to give a 5% discount for preferred customers. A customer get’s a proffered status upon a business decision (annual orders of 1M$ or knowing the CEO or whatever) and the status lasts for a year from the date it was introduced.

For the sake of this discussion say we have two services (again this is overly simplified) an Ordering service and a Customer service.

Here are a few technical options

Technical Scenario 1.

Customer places and order, the ordering service talks to “the” customer service to check if the customer deserves a discount if she does. the ordering service then updates the order with the discount and present it to the customer to finalize the order.

Technical Scenario 2.

Same as 1, with the ordering looking for a service that matches the customer contract it knows about

Technical Scenario 3

The ordering service asks “the” Customer service twice a day for a list of discounts and caches the result. When the user sends her order. it calculates the price and present it to her

Technical Scenario 4

Same as 3, with the ordering looking for a customer service (not using a known service)

Technical Scenario 5

The customer service sends a message to known subscribers whenever a new customer status occurs. The ordering service listens on that and update its internal cache. When the customer places her order, the ordering hits the cache for the discount

Technical Scenario 6

same as 5 but publishing an event to unknown subscribers

Technical Scenario 7

The customer service publish an event with the discounts (or changes in discounts) twice a day. The ordering service listens on that and update its internal cache. When the customer places her order, the ordering hits the cache for the discount

Technical Scenario 8

The customer order is passed to an orchestrating service, which hits a customer service for a discount and then passes all the data to an ordering service

There are quite a few more options and variants on the options listed but which one is best?

Yeah, you’ve guessed it -  it depends.It depends since each option has its own strength and weaknesses which can work best in different circumstances . It also  depends on the available infrastructure, on the structure of other services, on the services being internal or external etc.

for instance scenario 1 is less flexible than most others but it is simple to implement. There is coupling in time between ordering and customer (both have to be up for the order to complete). Scenario 4 needs to solve the problem of finding other services (e.g. using some kind of registry, or other services “pushing” their existence or whatever) but when a customer makes her request it (most likely) have all the needed info to process that request, making the ordering service more autonomous. As a side note, the fact that different approaches to achieve the same end-goal work in different situations is why I decided  to write patterns in the first place

Lastly, in case you are wondering the scenarios are:

1 – choreography with pre-known (configured or hardcoded) companion services

2 – choreography with “active service” of type a (ordering is active)

3- choreography with “active service” type b (ordering is active)

4 – Choreography with “active service” type a + b (ordering is active)

5 – pub/sub (e.g. using an ESB)

6 – eventing

7- eventing with “active service” type b (customer is active)

8 - orchestration


 
Tags: SOA | SOA Patterns | Software Architecture

May 8, 2009
@ 10:59 PM

Apropos the Blogjecting  Watchdog pattern,  In addition to blogging I recently added to our system the ability to twitter. I am using Tweet# from DimeBrain (thanks Mark Nijhof for the tip via twitter).

Tweet# makes using tweeter really simple (I included the code below in case you find it useful).

The tweeter sender is part of a PostOffice service (I thought that it would be problematic to present it as SpamServer which was its original name :) ).

image Update 11/05 Here it is working on our staging environment :)

A few points about our design in general that are interesting in this regards are

  • The PostOffice is a “Server” type service – we have 3 types of services: server which has one instance per node, channel which has multiple instances per node and algorithmic which has one instance per core
  • The PostOffice implements a pattern I call “Legacy Bridge” – which is basically an SOA version of an adapter+facade in OO terms. The post office supports the events (over WCF) mechanism we have in our system from one side  and connects to external systems (SMS, coupons and twitter) on the other. The PostOffice, basically contains an Edge Component which accepts the requests and funnels them to *Sender classes that interact with the external systems.
  • from contract design perspective – The events I added into the system are StatusEvent and AdminStatusEvent (and not TwitterEvent and DirectMessageEvent). this is better, in my opinion, as it carries the intent of what I want to achieve. It also means that if I choose to change technology or use multiple destinations the events will stay meaningful. For instance, the AdminStatusEvent will be used by our monitoring system to send a notification if the system crashes. I’ll probably want that as an SMS, maybe even a phone call as well as a twit (so the AdminStatusEvent will have a severity to designate how it should be handled)
   1: using System;
   2: using System.Collections.Generic;
   3: using System.Linq;
   4: using System.Text;
   5: using Dimebrain.TweetSharp.Fluent;
   6:  
   7: namespace xsights.Apps.PostOffice.Server.Twitter
   8: {
   9:     class TwitterSender
  10:     {
  11:         private string account;
  12:         private string password;
  13:         private string admin;
  14:  
  15:         public TwitterSender(string tweetAccount, string twitterPassword,string adminAccount)
  16:         {
  17:             account = tweetAccount;
  18:             password = twitterPassword;
  19:             admin = adminAccount;
  20:         }
  21:         public void Update(string msg)
  22:         {
  23:              foreach (var tweet in BreakToTwitts(msg))
  24:             {
  25:                 var update =
  26:                     FluentTwitter.CreateRequest().AuthenticateAs(account, password).Statuses().Update(tweet).AsJson();
  27:  
  28:                 update.Request();
  29:             }
  30:         }
  31:  
  32:         public void SendAdminMessage(string msg)
  33:         {
  34:             foreach (var twit  in BreakToTwitts(msg))
  35:             {
  36:                 var dm =
  37:                 FluentTwitter.CreateRequest().AuthenticateAs(account, password).DirectMessages().Send(admin, twit).AsJson();
  38:  
  39:                 Retry(2,dm.Request,false);
  40:             }   
  41:             
  42:         }
  43:  
  44:         private IList<string> BreakToTwitts(string originalString)
  45:         {
  46:             var list = new List<string>();
  47:             for (int i = 0; i < originalString.Length; i += 140)
  48:             {
  49:                 var len = 140;
  50:                 if (originalString.Length - i < 140) len = originalString.Length - i;
  51:                 list.Add(originalString.Substring(i, len));
  52:             }
  53:             return list;
  54:         }
  55:  
  56:         private void Retry(int retries, Func<string> call,bool shouldThrow)
  57:         {
  58:            
  59:             try
  60:             {
  61:                 call();
  62:             }
  63:             catch (Exception ex)
  64:             {
  65:  
  66:                 if (retries > 0)
  67:                     Retry(--retries, call,shouldThrow);
  68:                 else
  69:                 {
  70:                     if (shouldThrow)
  71:                         throw;
  72:                 }
  73:             }
  74:           }
  75:           
  76:         }
  77:     }
  78:  
  79: }

 
Tags: .NET | OO | SOA | SOA Patterns

As I mentioned in the previous post I got a few interesting questions lately. The first from Colin regarding developing a customized solution for the blogjecting watchdog pattern vs. integrating/developing for a commercial monitoring suite (e.g. Unicenter/OpenView etc.). The second question I received was from Dru on running multiple versions of services (e.g. during upgrade) with active Sagas in the background. I think these questions are interesting enough to be answered as blog posts.Also since both these questions are related to the Blogjecting Watchdog pattern I thought it would be better to explain what it is actually first..

So here it is :)

Blogjecting Watchdog

Achieving availability is a multi-layered effort. I’ve already talked about how services should be autonomous (see for example Active Service pattern in chapter 2) , the Blogjecting Watchdog pattern will take a look at another aspect of autonomy. The Blogjecting Watchdog pattern shows how a service can proactively try to identify faults and problems and to try to heal itself when it identifies these problems.

1.1 The Problem

The Service Instance pattern (see section 3.4) for example, demonstrates a strategy that a service can implement to be able to cope with failure. The question is – is that enough? Is it enough for the service to try to cope with everything by itself? My answer is no, that is not enough. For one once we dealt with the failure within the service, the service ability to cope with the next failure would probably be diminished. For example if we found a failure in a server and moved to a standby server, the new server does not have another stand-by server to move to if another fault occurs.

Additionally, the failure might be too much for the service to be able to overcome it by itself. Like a switch going down - So we would have something external that looks after the service and could help the service (see Service Monitor pattern in chapter 4).

To increase the service autonomy and to increase the overall availability of our SOA we need both to try to identify and repair problem and to be able to notify the world about the service’s current status.

The question is then:

How can we identify and attend to problems and failures in the service and increase service availability?

One option is to try to infer the state of the service from the way it looks to the outside – yes this is as crude as it sound. You try to call the service, it doesn't respond you know it is down; you call the service, you expect to get a reply in 5 seconds you get it in 10 seconds, you understand that the service is congested. This is not a very good option as the external behavior only gives us coarse knowledge on the service's state. For example, if the services has a decent fault tolerance solution, we wouldn't know that anything happened – but the truth is that the service ability to handle the next fault might not exist anymore.

Another way is to install agents on the service's servers, this will give you a much better picture of what happens (vs. the option above). For example, you will also be able to get trend information (e.g. You can watch how much disk space is left and alert when it is getting low). There are several problems with this solution. One is that you need to actively install software on the service's servers which both decreases the service autonomy and creates a management hassle in itself. Another problem is that you still only get an external view of the service behavior (you just gain access more information). There are situations (see for example the Mashup pattern in chapter 7) where not all the services are under your control and you cannot access their hardware.

Yet another option is to actively question the service about it state. The has one big advantage over the two previous options since you also get some inside information regarding what the service has to say about its state. This enables the service to communicate trends in problems that will actually make it fail. For example if the service does not write any information into the local disk a low disk space is not a problem at all, if this is the disk where the database is located it is very much a problem. The solution is not perfect since it is the observers responsibility to go after the information. If the rate at which the observer samples the service is not fast enough it can miss on vital information.

As I mentioned earlier we want something that will help increase the service’s autonomy so a better approach in this regard would be for the service to watch over itself

1.2 The Solution

Watching over itself is also not enough as we also said we need the “world” to know what happening with the service, thus a combines solution is to :

Implement the Blogjecting Watchdog pattern and have the service actively monitor its internal state, try to heal itself and continuously publish its state and other important indicators.

clip_image002

Figure 3.14 The blogjecting watchdog pattern. The blogjecting. The blogjecting component that send the reports out and and listens for requests. The watchdog component monitor the status of the business service, tries to heal stray components and log any failure.

The pattern revolves around a single idea – to increase the service responsibility by using two complementary concepts reporting and self healing. The first is the Blogjecting concept where the service implements the Active Service pattern (see chapter 2 for more details) and a component which is in charge of monitoring the service's state. The component publish (see the publish/Subscribe interaction pattern in chapter 6) also the service's state on a cyclic basis or when something meaningful occurs. It is important to note that the fact that the service actively publishes its state doesn't have to mean it cannot also respond to inquiries regarding its health (akin to living a comment on a blog and getting a response from the author)

What are Blogjects

The term Blogjects was coined by Julian Bleecker back in 2005 (Bleecker, 2005) to describe "edgy designed objects that report themselves, or expose their experiences in some fashion" or in other words Blogject == Objects that blog. Julian Bleecker's vision for Blogjects is wider than the one suggested here. Jonathan's vision is for things that participate in the Web 2.0 sense of social-web or even further than that – to use Julian’s words :“Forget about the Internet of Things as Web 2.0, refrigerators connected to grocery stores, and networked Barcaloungers. I want to know how to make the Internet of Things into a platform for World 2.0. How can the Internet of Things become a framework for creating more habitable worlds, rather than a technical framework for a television talking to an reading lamp?” . I highly recommend taking a look at the full paper “A Manifesto for Networked Objects – Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things” (Bleecker, 2006) to get the full picture.

 

The second concept that plays in the Blogjecting Watchdog pattern is the watchdog, The idea here is to have a component that listens in on the information gathered and published by the blogject component and then to acts on that information in a meaningful way to increase the reliability and availability of the service. The possibilities for implementing self-healing are endless, two simple examples for self-healing actions are restating failed components and cleaning temporary files.

Watchdogs

Watchdog (actually watchdog timer) is a term borrowed from the embedded systems world. A watchdog is a hardware device that counts down to zero, and when it gets there it reset the device. To prevent this reset the application has to “kick the dog” before the timer runs out. If the application does not reset the counter it means that the application is hanged and the idea is that the reset would fix that.

 

How is the Blogjecting Watchdog pattern better than the other options mentioned above?

Even if we just consider the blogjecting part of the pattern we can see several advantages over the other approaches. The Blogjecting Watchcdog combines the benefits of an agent that actively monitors the service's health with the internal knowledge of what's important for the service continuity and what's not. Unlike the external agents solution, using Blogjects, the service retains its autonomy. The autonomy is increased even further when you combine the self-healing features of the watchdog. Thus the end result is a service which is more resilient (and thus has higher availability), which lets the world know both its current state as well as future trends.

In one project I was working on we inherited a situation where there were interdependencies between executable installed on different servers (within a service) – for example when one process was down on server A the objects running on server B could not function well and other such dependencies (this isn’t the brightest design, but sometimes you have to compromise - in this case there was no time and budget to redesign these applications). What we ended up with, is something like the situation in figure 3.15 below:

clip_image005

Figure 3.15 a sample deployment of a blogjecting watchdog. The daemons on the servers monitor the running components on each server. The Watchdog edge exposes the current the current state both through a web-services API and as SNMP traps

The watchdog agents on each of the server nodes monitors the components. The agents communicate amongst themselves to examine the dependencies and actions taken. The watchdog Edge component provides a WSDL based endpoint where other services can query it for the service’s health. It also publishes SNMP traps to an external SNMP monitor (e.g. HP-Openview). As an implementation hint, I can suggest keeping the watchdog components in a separate very simple executable (preferably a daemon that runs when the OS loads). The simpler the component, the lower the risk it will fail in itself (you can of course have a backup in the form of a hardware watchdog ..). Let’s take a more thorough look at the technology mapping options

1.3 Technology Mapping

Implementing Blogjecting Watchdog in an enterprise will usually pre-determine the protocols you will have to use for your “blog”. The IT team will most likely already standardize on one of the leading monitoring suites (CA-Unicenter, HP-Openview, IBM-Tivoli or if you are an all Microsoft shop Microsoft Operations Manager). In these cases you can use the SDK of the monitoring software (e.g. the Unicenter Agent SDK or MOM management pack developer guides). There are even 3rd party software packages to help you build such agents (for example OC Systems have a Universal Agent that makes it easier to write agents for Unicenter).

Note, that this is not always the case though, and sometimes you do have the freedom to choose you protocols. Few projects I worked on chose to standardize on using web-services with specific messages for monitoring the health of service (so we had a specific endpoint for each service where these messages were supported). With the emergent of SOA specific tools like the ones by Amberpoint and Weblayers you will see more and more WS-* based monitoring.

Other ways for reporting your internal state can be to use standards like SNMP (Simple Network Management Protocol) or plainly the windows Event logs An interesting option, which will let your Blogjecting Watchdog literally blog is to use a product called RSSBus. Whish is an ESB implementation that uses RSS protocol for communications. At the time I am writing this, the product is still in beta, so I haven’t used it for a serious system yet. Nevertheless, it looks like an interesting direction which I’ll consider when it is released.

Regarding the self-healing part (watchdog), self-healing is still more prevalent in hardware then in software (watchdog timers, RAID, IBM , hot spare memories, hot spare drives etc.) in a sense any solution that builds on clustering technology also has some of that built-in. The virtualization trend will also help in this sense (see discussion on utility computing in this chapter’s summary). You can already read papers that talk about self-healing web services (G. Kouadri Mostéfaoui, 2006) or see some projects that tries to look into this problem (e.g. WS-Diamond - DIAgnosability, Monitoring and Diagnosis). Nevertheless, all of them are still in the research phase and if you want something now, you will probably need to implement something by yourself. In my experience, it won’t take you too much time to have a basic watchdog up and running , but it will take you sometime until you will have it predicting and acting as an advanced warning system.

1.4 Quality Attribute Scenarios

The Blogjecting Watchdog is an interesting pattern (and not just because of its odd name) as it can really help on the way to autonomous computing. The effect of this proactive approach is to increase the overall reliability of the service. A service which is self-healing can overcome (at least) minor problem which results in better availability overall. Additionally the monitoring aspects of the Blogjecting Watchdog also help enhance availability by notifying administrators that something is amiss (which will enable them to fix it).

Quality Attribute (level1)

Quality Attribute (level2)

Sample Scenario

Availability

Failure detection

Upon a failure or degraded performance, The system will alert the system admin (via SMS) within 3 minutes.

Reliability

Increased autonomy

During normal operations, the system will clear all its temporary resources (e.g. files) continuously

Table 1.1 Blogjecting Watchdog pattern quality attributes scenarios. These are the architectural scenarios that can make us think about using the Blogjecting Watchdog pattern.

Once we introduce a monitor and start to collect data, we can start to find new uses for that data, for  example we can use the information on incoming request to try to locate attacks on the service etc. Saved monitoring data can be used to analyze the service’s behavior over time, predict failures and thus increase its maintainability etc.



 
Tags: Q&A | SOA | SOA Patterns | Software Architecture

the previous installment provided some context as to why I want to implement this pattern. This
installment will look at some of the implementation options.

As I noted before, WCF provides quite a lot of extension points on the route the message pass from arriving on the service to the point WCF
calls the actual method in the service instance. Several of those extension points are possible candidates for the Service Firewall for instance

  • Contract Filter-The contract filter is responsible to route messages to the appropriate contract. It needs to be a subclass of a MessageFilter. It looks that the contract filter is a good option since it intercepts the call rather early so it means it would probably be the fastest option. Also its name (filter..) implies it is a good option
  • Message Inspector - The Message inspector is responsible for looking at or modifying messages when they enter a service and looks like a natural candidate for the job. There are two kinds of Message Inspectors: Those who look at messages on the client side (implement the IClientMessageInspector interface) and those that look at the server side (implement the IDispatchMessageInspector). It seems that the latter is the type of inspector we need here.
  • Service Authorization Manager - responsible for evaluating policies, claims etc. of the client to make sure that a call is valid from the security perspective. This looks like it would be a good class to use for a real service firewall. It seems it won't be a good fit for the purpose of what we need here.

When I need to choose  between several technical options that seem to be similar I usually do a POC - proof of concept.  A piece of throwaway code to get a feel of the different options and better understand their strengths and weaknesses (in the context of the solution I seek).

What I did was to take a class I prepared for some of the integration tests of the EventBroker and build a few extensions that interact with them. Here is some of the setup code of the environment:

testServer = new Tester();
 service1 = new ServiceHost(testServer, new Uri(string.Format("http://localhost:{0}", TestServerPort)));
 var binding = new WebHttpBinding
 {
     ReaderQuotas = { MaxArrayLength = 600000 },
     MaxReceivedMessageSize = 800000,
     MaxBufferSize = 800000

 };

 var ep = service1.AddServiceEndpoint(typeof(TestingContract), binding, string.Format("http://localhost:{0}/S1", TestServerPort));
 ep.Behaviors.Add(new WebHttpBehavior());
 ep.Behaviors.Add(new InspectorBehavior());
 service1.Authorization.ServiceAuthorizationManager = new TestAuthorizer();
 var cp = service1.AddServiceEndpoint(typeof(ImContract), binding, string.Format("http://localhost:{0}/Control", TestServerPort));
 cp.Behaviors.Add(new WebHttpBehavior());

The two redlines above are the ones responsible for injecting the POCs the InspectorBehavior is reponsible for inserting the ContractFilter and the MessageInspector and the TestAuthorizer is the Authorization Manager test implementation.

We also need some code to raise an event:

public void SendMessage()
 {
     var evnt = new TestingEvent { sagaId = Guid.NewGuid() };

     moqRA.Expect(x => x.GetChannel<TestingContract>(evnt.sagaId, true)).Returns(channel1);
     moqRA.Expect(x => x.GetChannel<TestingContract2>(evnt.sagaId, true)).Returns(channel3);

     eb.BeginNewSagaEvent(evnt.sagaId, evnt);
     eb.CloseSaga(evnt.sagaId);

 }

And now we can look at the different options. The InspectorBehavior is just a helper class to wite the filter and/or inspector to the endpont. (The Authorization Manager is setup at the service level (i.e. for all endpoints))

public class InspectorBehavior : IEndpointBehavior
 {
     
     public void AddBindingParameters(ServiceEndpoint endpoint, BindingParameterCollection bindingParameters)
     {
     }

     public void ApplyClientBehavior(ServiceEndpoint endpoint, ClientRuntime clientRuntime)
     {
         throw new NotImplementedException();
     }

     public void ApplyDispatchBehavior(ServiceEndpoint endpoint, EndpointDispatcher endpointDispatcher)
     {
      var inspector = new TestInspector();
      endpointDispatcher.DispatchRuntime.MessageInspectors.Add(inspector);
      endpointDispatcher.ContractFilter = new TestFilter(endpointDispatcher.ContractFilter);
         
     }

     public void Validate(ServiceEndpoint endpoint)
     {
     }

The first thing I tried was the "ContractFilter". It is actually very simple to use. You inherit from MessageFilter and there are two "Match" method you need to override. One that accepts a buffer and one that accepts a (WCF) Message. WCF calls the Match method which accepts a Message.

WCF's Message class is interesting in the sense that it has a one-time touch feature. i.e. only one piece of code can read/copy it and the next piece of code which will try to do the same will fail

So the match method you can do something like the following:

public override bool Match(Message message)
 {
     var buffer = message.CreateBufferedCopy(Int32.MaxValue);
     message = buffer.CreateMessage();
     var r = buffer.CreateMessage().GetReaderAtBodyContents();
     .
     .
     .
 }

Which basically means get a buffer of the message, create one copy to preserve and the get another copy for internal use and work with that to parse and verify the actual message. Unfortunetly, this doesn't really work - the message parameter is not passed as ref so the original message is lost on the first line of the method and that's it. Note that you can access the header part of the message without problem, however that's not a good fit for what I am trying to do.

The next thing I looked at the MessageInspector. Again implementing it is rather simple, you just need to implement the IDispatchMessageInspector interface. This interface has two methods BeforeSendReply and AfterReceiveRequest. We'll look at the AfterReceiveRequest method. Again we try the message copy trick:

public object AfterReceiveRequest(ref Message request, IClientChannel channel, InstanceContext instanceContext)
 {
     var buffer = request.CreateBufferedCopy(Int32.MaxValue);
     request = buffer.CreateMessage();
     var temp = buffer.CreateMessage().GetReaderAtBodyContents();
     .
     .
     .
 }

This time it works since we get the request parameter as ref. At first it seemed to me that while you can inspect and alter the message as your heart wishes there is no way to say that the message is bad. One option is to alter the message to a faulty message and let the application handle it - but that means too much coupling between infrastructure and application. Another, better, option is to throw an exception.

So using the MessageInspector is a usable option. It is very good if you want to alter the incoming message but throwing an exception when the message is bad is not very clean

Which brings us to our third option Authorization Manager which, surprisingly  turned out to be the best option

public class TestAuthorizer :ServiceAuthorizationManager
 {
     public override bool CheckAccess(OperationContext operationContext, ref Message message)
     {
         var autorized= base.CheckAccess(operationContext, ref message);
         var buffer = message.CreateBufferedCopy(Int32.MaxValue);
         message = buffer.CreateMessage();
         var testMessage = buffer.CreateMessage();
         .
         .
         .
         return autorized;
     }
     
 }

Like the message inspector it receives the message as ref and like the filter it allows a single yes/no answer to decide if a message should continue or be discarded. Additionally it notifies the client that the message was rejected if that is what you choose to do (in the WebHttpBinding I used that means a 400 bad request return code)

Ok, so we've seen some of the options for implementing the Service Firewall and briefly went over thier different behaviors. The next part in this series will take a look at some of the actual implementation I did



 
Tags: .NET | SOA | SOA Patterns | WCF

One of the SOA patterns I already described is the Service Firewall. The idea behind the service firewall is to have an intermidiator between the actual service and callers and inspect in an applicative level incoming and outgoing messages.


Anyway, while I documented the pattern as a security one, I am actually going to implement it for another purpose - a saga filter.
In our implementation of EventBroker I made the design decision to have services expose regular WCF contract. i.e. services can communicate with each other directly and not just via eventing. This design decision is there to allow both interaction with non WCF services and to allow flexibility for multiple message exchange pattern (where events are not the best choice).
Another design decision we have is that we have two types of services. Servers and Channels. Servers handle multiple sessions and are (relatively) heavy to write. Channels on the other hand are light-weight services that  are stateful and dedicated for a specific session. Naturally there are a lot of instances of channels to allow supporting multiple sessions (and there are infrastructure bits to allow allocations and propagate liveliness etc. but that's another story). Channels have several benefits like increasing the systems capabilities to cope with failure (if a channel is down only the session it supported fails). One of the benefits of Channels is simple coding model. The Channel is dedicated to a session (typically a saga) and thus it doesn't have to handle all the routing of messages to sagas etc. that Servers have to cope with. This is where the Service Firewall comes to play.
In order to keep channels' code simple "someone" has to make sure the channel doesn't get messages that are not related to the saga it is part of. Otherwise the Channel will have to know about its current active saga and filter messages by itself - which kind of misses the point.
Making sure other services will not send messages while not in saga etc. will only take us so far (you know - latencies and stuff). A service firewall will let us intercept the messages before they reach the service and only allow the messages related to an active saga to pass through (while maintaining the benefits of direct contracts)

WCF has a rich extensibility model (see figure from MSDN below). This series will show how you can use some of these extension points to implement a service firewall and achieve the goal depicted above.  I hope you'd find it interesting




 
Tags: .NET | SOA | SOA Patterns | WCF

January 16, 2009
@ 07:08 PM
In a post called "Rhino Service Bus: Saga and State" Ayende said
"In a messaging system, a saga orchestrate a set of messages. The main benefit of using a saga is that it allows us to manage the interaction in a stateful manner (easy to think and reason about) while actually working in a distributed and asynchronous environment."

I really don't agree with this definition of a saga. The Saga provides a context for set of messages to allow manging an effort for distributed concensus. It does not "orchestrate" messages (that's what workflows are for) - you can read more on Saga's in an excerpt from my SOA patterns book:  Saga pattern.

Here's the comment I left on Ayende's site:
"What you describe is nice except it isn't a Saga it is more of a workflow. The notion of Saga which is originated from databases relates to the overall coordination of state between the different services - or the context for the whole business process.
In the coffee shop example you use that would be the whole "transaction" from the point the customer orders her coffee until she either gets it or the transaction is canceled (e.g. it took too long and the customer leaves or the coffee shop is out of milk etc.)
Unlike database (or distributed) transaction when/if a saga is aborted the different component of the system might not return to their previous state e.g. if the customer complains that the coffee is not good and gets her money back. the milk is not separated back from the coffee beans and returned to the bottle - rather the coffee cup goes to the trash.

Workflow is one strategy a service can take to handle the long running interaction within a saga. In your case the BristaSaga class (which I think should be BristaWF) orchestrate the internal state transitions depending on the different messages that arrive within the saga. In your case you have a hardcoded workflow - but it is also possible to use a workflow engine for the job.

By the way, in the above example you could also use a statemachine instead of a WF to manage the process "
In another comment Kristofer asked me:

Arnon: I'm not 100% sure of how you distinguish a Saga from a Workflow, could you elaborate some more on this?

A Saga involves a number of underlying workflows?
A Saga might as well contain a number of underlying Sagas?

Isn't it just a question of at what level it is initiated?

If a Saga should represent the whole transaction / business process, then who should handle it? Couldn't it be implemented as a Saga, exactly as Ayende describes it, by the initiating service (in this case the ordering)?, which then also is given the responsibility to handle restoring the total state etc of underlying/involved services if the transaction is aborted? The possibility to restore state does of course depend on what the specific Saga is handling, some processes might not be able to "rollback" completely, it's rather a question of rolling back all involved parties to a known/acceptable state."

The answer is that ,again, Saga is similar to a transaction in the sense that it provides a shared context for an attempt to get a distributed consensus  Unlike a transaction which insures ACID properties. Sagas are not.
The concept of dissipating that shared context, having each party (service) affect whether the saga should be aborted or successful etc. is what I call a saga.
When a saga is aborted the only thing the coordinator can do is pass the status to the participants. Each of the services is responsible to do its best effort to handle the abort (either by rolling back, compensation or whatever)

Workflow is another thing altogether. which keeps a context between calls and means externalizing the decisions on the logic flow from the business logic (usually with a workflow engine). You can use workflows within a service (a pattern I call workflodize) or you can use them externally (a pattern I call orchestrated choreography e.g. BPM)
You can use either form of workflow to support the implementation of a saga but you can also implement sagas without workflows.
In our system we use an "event broker" (see www.rgoarchitects.com/.../EventingInWCF.aspx) the event broker infrastructure dissipates the saga context when you raise a saga event. A service that initialized a saga (by sending the first event) can choose to close the saga (commit) or abort it. etc. We don't currently have any workflow driven services (but some of them use a state machine as an alternative)

(I think the term Saga does not describe Ayende's class since the "barista" is just on of the participants in the saga there are other participants.)

Powered by ScribeFire.


 
Tags: SOA | SOA Patterns | Software Architecture

The year is almost done so I'd thought it would be a good time for a short retrospective into what I blogged here. The 13  posts below are the ones  I liked best this year. Turns out these posts touch on a lot of different subjects: requirement, software management, agile development, architecture, SOA and programming.



 
Tags: Agile | Project Management | SOA | SOA Patterns | Software Architecture | TDD

December 16, 2008
@ 10:36 AM
An initial draft for the Knot anti-pattern, As usual any comments are welcomed. You can also download it in PDF form

Everything starts oh so well. Embarking on a new SOA initiative the whole team feels as if it is pure green field development. We venture on - The first service is designed. Hey look it got all these bells and whistles; we are even using XML so it must be good. Then we design the second service, it turns out the first service has to talk to the second – and vice versa. Then comes a third, it has to talk to the other two. The forth service only talks to a couple of the previous ones. The twelfth talks to nine of the others and the fourteenth has to contact them all – yep our services are tangling up together into an inflexible, rigid knot

 

The above scenario might sound to you like a wacky and improbable scenario - why would anyone in the right mind do something like that?  Let’s take another look, with a concrete example this time and see how the road to hell is paved with good intentions. In Figure 10.1 below we see a vanilla ordering scenario. An ordering service sends the order details to a stock service, where the items are identified in the stock, marked for delivery and then sent to a delivery service which talks to external shipping companies such as DHL, FedEx etc.




Figure 10.1 a vanilla ordering scenario. An ordering service sends the order to a stock service, which provisions the goods to a delivery service which is responsible to send the products to the customer

 

If we think about it more we’ll see that when an item is missing from the stock we probably have to talk to external suppliers, order the missing items and wait for their arrival- so the whole process is not immediate. Furthermore since the process takes time, it seems viable to cancel the process if an order is cancelled.  It seems we have two options (see Figure 10.2) either the ordering service will ask the two other services to cancel processing related to the order or the two services call the ordering service before they decide what to do next.   Naturally the system wouldn’t stop here, we would want to introduce more services and more connections e.g. an Accounts Payable service  that interacts with the external suppliers, the stock service and the delivery  service(since we also need to pay shipping companies) etc. 



Figure 10.2 a little more realistic version of the Ordering scenario from figure 10.1. Now we also need to handle missing items in the stock, cancelled orders and paying external suppliers. In this scenario the services get to be more coupled. For instance the Ordering service is now aware of the delivery service and not just the stock service.

 

With each new service we draw more lines going from service to service, and with each new service we update the services’ business logic with the new business rules as well as knowledge of the other services’ contracts.

 

1.1.1 Consequences

Well, so we get more lines going from service to service that normal isn’t it? After all if the services won’t talk to each other they won’t be very useful? Isn’t that the whole point of SOA?

 

Well, yes – and no. Yes it is normal for services to connect to each other.  After all, creating a system in an SOA is connecting services together.  As for the “no” part, the problem lies with the way we develop these integrations   if you are not careful it is easy to  get all the integration lines in a big, ugly mess – a knot

 

A knot is an Anti-pattern where the services are tightly coupled by hardcoded point-to-point integration and context specific interfaces

 

For instance, what happens when we want to reuse the ordering service mentioned above? No problem, we just call it from the new context. Alas, the knot prevents us from reusing it without hauling in the rest of the baggage - all the other services we defined above (the stock, delivery etc.) if the new context is not identical in it ordering processes and matches what we already have we can’t use it. Or we can’t use it without adding one-off interfaces where we add specific messages for the new context and all sort of “if” statements to distinguish between the old and the new behavior. Another option is to make this distinction in the original messages, which either not possible or forces us to make sure the other services are still functioning. In any event it is a big mess.

 

Let’s recap. We moved to SOA to get flexibility, increase reuse/use within our systems, prevent spaghetti point to point integration – what we see here is not flexible, hard to maintain and basically it seems like we are back in square one and we invested gazillions of dollars to get there.

 

 

1.1.2Causes

How did that happen?  How can a wonderful, open standards, distributed, flexible SOA deteriorate to an unmanageable knot?

 

It is tempting to dismiss the knot as the result of lack of adequate planning. If we only planned everything in advance we wouldn’t be in this mess now. Well, besides the point that trying to plan everything ahead of time is an anti-pattern in itself (an organizational anti-pattern – which isn’t in the scope of this book). There’s still a good chance you’d get to a Knot anyway since the problems are inherent in the way business work.

 

If we take a look back at the Integration Spaghetti scenario discussed in chapter 1 (depicted as figure 10.3 below), we can see that the phenomena was there as well, when we our business processes evolve we find we need to interact with information from other parts of the system. The flow of a business process expands to supply that needed information or service and thus the Knot grows.



Figure 10.3 the Knot anti-pattern is similar in both effect and origin to the spaghetti integration in non-SOA environments

 

From the technical perspective, we have two forces working here. One is the granularity of the services. On the one hand, Services are sized so that a business process requires several of them to work together. On the other hand they aren’t small enough so that they would be an end-node in the process (i.e. only other services would call the service and it will just return a result). Note that this isn’t a bad thing in itself, after all if each process was implemented by a single service we’d have silos not unlike the ones we try to escape by using SOA and if we set the services too small we’d fall into another trap (see the Nanoservices anti-pattern later in this chapter).  The bottom line is that while the granularity is a force that drives us toward the Knot, there’s not a lot we can do about it without getting ourselves into worse problems.

 

The second, stronger, force that pushes a system into a Knot is the business process itself.  Since, as we mentioned above, the process flows through the services, the services needs to be aware of the flow and then call other services to complete the flow.  In order for a service to call another service it has to know about its contract and know about its endpoint. When another business flow goes through that service we not only add the new contracts and endpoints but also the contextual knowledge of which other services to call depending on the process. And that’s my friends, is exactly the thing that gets us into trouble – the services start to tie themselves to each other more and more, as we implement more business process and more flows.

 

Hey, you say, but SOA should have solved all that, surely there is something we can do about it – or is there?

 

1.1.1Refactoring

 

The previous section explains that most of the problem is caused by having the services’ code determine where to go next and what to do with the results of the services’ processing. If there was only a way to somehow pry these decisions away from the services’ greedy hands…  As you’d probably guessed there is such away, in fact there are several such ways and this book lists three of them: The Workflodize pattern (Chapter 2), Orchestrated Choreography (Chapter 7) and Inversion of Communications (Chapter 5). Let’s take a brief look at each of these patterns and see how they help.

 

The workflodize pattern suggests adding a workflow engine inside the service to handle both Sagas (i.e. long running operations, see chapter 5) and added flexibility. The “added flexibility” is the card we want to play here. When we express the connections as steps in the workflow they are not part of our services’ business logic. They are also easier to change in a configuration-like manner both of these points are big plusses.

Still, a better way to solve the service to service integration problem is to use an external orchestration engine. The idea of using the Orchestrated  Choreography pattern is to enable Business Process Management- or a way for the organization to control and verify it processes are carried out as intended (you need an orchestration engine for that but it helps…). In the context of solving or avoiding the Knot anti-pattern, Orchestrated Choreography is better than Workflodize since it centralizes and externalizes all the interactions between services and thus effectively removing all the problematic code from the services themselves. Note that there’s a fine line between externalizing flow and externalizing the logic itself (see discussion in Orchestrated Choreography pattern, in chapter 7).

 

The third pattern we can use to refactor the Knot is Inversion of Communications. Inversion of Communications means modeling the interactions between services as events rather than calls. Inversion of communications is, in my opinion, the strongest countermeasure to the knot. The two patterns mentioned above bring a lot of flexibility in routing the messages between the services. The inversion of communications pattern also helps the message designers remove specific contexts from the messages since when the service’s status is raised as an event it isn’t addressed to any other service in particular. Note that using inversion of communications doesn’t negate using  either of the two other patterns mentioned above since that once the event is raised we still need to route it to other services and using a workflow engine is a good option for that. Another implementation option is to use an infrastructure that supports publish/subscribe (see the pattern’s description in chapter 5 for more details.)

 

Going back to the ordering scenario we mentioned above. As I mentioned, the services grow with needless knowledge of specific business process. So for instance, the ordering service had to know both about the stock service and the delivery one. Refactored with the Inversion of Communications pattern, the same Ordering service doesn’t have to know about any of the other services. In Figure 10.4 we can now see that the Ordering service sends two business events (new order, cancelled order) and the routing of these messages is no longer the responsibility of the service



Figure 10.4 the Ordering service using the Inversion of Communications pattern. Now the service doesn’t know/depend on other services directly. It is only aware of the business events of new order and cancelled order which are relevant to the business function that the service handled

 

Refactorings aside, one question we still need to think about is whether there are any circumstances where having a Knot is acceptable.

 

1.1.1Known Exceptions

 

In a sense the Knot is a distributed version of an anti-pattern described by Brian Foote and Joseph Yoder as “Big Ball of Mud” – spaghetti code where different types of the system tied to each other in unmanageable ways. The reason for mentioning the connection is that the reason that “Big Ball of Mud” might be considered a pattern rather than an anti-pattern also apply here:

 

“[when] you need to deliver quality software on time on budget… focus first of feature and functionality, then focus on architecture and performance”

 

Starting out on a large project, such as moving an enterprise to SOA, is difficult enough as it is. You can’t figure everything in advance; you need to deliver something – so as Nike says “just do it”. Get something done. You do need to be prepared to let go and redesign further down the road. In the current system I’m working on – a visual recognition/search engine for mobile, we went with a “knot” approach for the first release. The simplicity of the implementation, i.e. less investment in infrastructure, ad hoc integration etc. enabled us to deliver a first working version in less than 6 months. These 6 months also helped us understand the domain we are operating in much better and more importantly get to market with the feature the business needed in the schedule the business wanted. We spent the next 6 month rewriting the system in a proper way, including applying the Inversion of Communications pattern mentioned above.

 

To sum this up, coding the integration code into services is likely to end as a Knot. It is acceptable to go down this path for a prototype or first version i.e. to show quick results. However you do need to plan/make the time to refactor the solution so you will not get stuck down the road.





 
Tags: SOA | SOA Patterns | Software Architecture

December 8, 2008
@ 10:56 PM
I am (finally) writing some new stuff for my SOA book - working on a few Anti-patterns
  • The Knot - The distributed version of "big ball of mud" basically point to point integration
  • NanoServices - designing/building fine grained services (methods != services)
  • 3-tiered SOA - dressing up 3-tier architecture in SOA clothing (e.g. database as a service)
  • Whitebox Services - exposing internal structure - comes in two flavors exposing technology and allowing access not through contracts
  • Transactional Integration - inter-service transactions (use Sagas instead)
  • RESToid- combing SOA and REST without understanding the full implication of either
I am going to publish one of them (probably the "knot") in a few days but I thought I might be able to get a little feedback before that. I chose to describe anti-patterns in the following format:

  •  Context - Presenting the problem (probably through an example)
  •  Consequences - Explaining what the problem is. i.e. what happens when the anti-pattern is prevalent
  •  Causes - discussion on the forces that lead to the anti-pattern
  •  Refactoring - The patterns (and/or other tips) that can be used to fix the design
  •  Known Exceptions - Are there any contexts where using the anti-pattern is acceptable
I'd be happy to hear any comment you have on the anti-patterns listed above as well as comments on the structure for describing them

Thanks
Arnon


 
Tags: REST | SOA | SOA Patterns | Software Architecture

October 18, 2008
@ 10:58 PM
As I mentioned in a couple of previous posts (like "Using REST along with other architectural ), I've been spending the last few weeks writing an Event system over WCF (probably also explains posts on  WCF gotchas like this;) ). Being a communication infrastructure it is still a long way from being completed, but it seems to be stabilizing and I think it turned out nicely so I thought I'd share a few details.

Let's start with the simple part - the usage.
The eventing is built on the idea of a bus (i.e. no centralized components) and the resources/services that want to use eventing have to use a library which I call EventBroker.  There are two modes for using the EventBroker. one is "regular" events which are contexless. This means that consecutive events can reach different services, and there is no context that flows from event to event:

bool raisedEvent = eb.RaiseEvent<SampleEvent>(new SampleEvent());
The second type of events are Sagas, which represent long running interactions. Sagas does have a "best effort" guarantee to reach the same recipients over consecutive calls. Also you can also End sagas (sucessful termination), Force End Saga (successful termination by a service that didn't initiate the saga) and Abot Saga (unsuccessful termination): Here is how you raise a saga event.
var evnt = new SampleEvent { data = somevalue};
var SagaId = Guid.NewGuid();
eb.RaiseSagaEvent<SampleEvent>(SagaId, evnt);
if you use the same Saga Id, the events are handled as part of the same saga, if you use a Saga Id that wasn't previously defined it will initialize a new saga.
The eventbroker translates events to the relevant contract and dispatches the events over to the different subscribers. Which brings us to to the next part which I  guess,   is also a little more interesting. How subscriptions are defined.

The first thing to do is to define the event itself.
    public class SampleClassEvent :ImEvent
{
public string DataMember1 {set;get;}
public int DataMember2 { set; get;}
}
There aren't any real constraints on the event, except that it has to "implement" the ImEvent interface. Which is really an empty interface but it marks the event as one for the event broker.
Then you have to define an interface for handling the event. The event broker, builds on the idea of convention rather than configuration (an idea popularized by the rails framework) so it is easier to generate the interface (something I do with a resharper template)
    [ServiceContract]
public interface IHandleSampleClass
{
[OperationContract]
int SampleClass(SampleClassEvent eventOccured);

}
The convention is that the interface will have a IHandle prefix followed by the name of the event. It will hold a single operation named like the event (without the Event suffix) and will recieve a single parameter which is the event data. Currently  events do return a value (int) but I am thinking about changing it to void and have everything marked as OneWay for added performance

Now, when we create a service which needs to handle events it will do that by specifing which events it handles. E.g.
    [ServiceContract]
public interface ImSampelResource : ImContract, IHandleSampleClass, IHandleSomeOtherThing
 {
}
So each contract declares all its subscriptions (by a list of IHandleXXX). It should also include the ImContract interface which holds all the service operation used by the eventbroker (e.g. ending sagas etc.).
Services that want to raise events should inherit from a ControlEdge class (base class Edge component that delegates control events to the event broker)

There's still the question of how does the event broker knows where to find other services. There are several ways this can be done (e.g. a service repository) but since we have  blogjecting watchdogs in place anyway, we use them to propagate liveliness (and location ) of services.

This sums up this post. It is basically just a little context for several planned posts where I hope to talk about some of the challenges, alternatives and design decisions that led me to the current design. Meanwhile, I'd also be happy to hear any comments, ideas or reactions you may have
 
Tags: .NET | Design | OO | SOA | SOA Patterns | Software Architecture | WCF | xsights

Retrospectives, every "agile" team does retrospectives.What are retrospectives anyway?

A retrospective is a meeting where the team takes a look and inspect the past, in order to adapt and improve the future.

Agile or not, our team does a retrospective at the end of each iteration (every two weeks in our case). We try to look at what worked, what didn't , how we are meeting our goals etc, how is the product going etc.. These meetings provide a lot of value for steering us at the right direction.
On going retrospectives that look at the near past allows for suppleness and change adaptation and they are very powerful at that - However it is sometimes worthwhile to reflect over longer periods of time.

One area where longer perspective is important is the architecture of the project. Evolving an architecture you run the risk of accepting wrong decisions - mostly because architectural decisions have long term implications, while YAGNI, time constraints and life in general drive you toward short term gains.

Again, taking an example from my current project, working towards the first release, we took a few major decisions during the development e.g.
  • federated resource management - Taking into consideration the fallacies of distributed computing we decided that we'd have local resource managers that will take care of resource utilization and allocation. The resource managers will have a hierarchy where they'd communicate with each other to gain the "bigger picture"
  • Introduce Parallel Pipelines - handle image understanding by dividing the work between specialized components.
  • RESTful control channel - to use a "lingua franca" between all component types so that we can easily integrate across platforms and languages
  • local failure handling - resources and components handle failure by themselves
  • Communication technology (WCF in our case) is isolated from the business logic by an Edge Component
  • etc.
Once we finished delivering the first release. We took a few "days off" to consider what we've done thus far. updated our quality attribute list per our knowledge working with the system and looking at some customer scenarios. studies the things we liked/didn't like in the design and architecture of the working system. and revised a few of our decisions for instance
  • We found that rushing to a working system we introduced some excess coupling to a specific technological solution (for video rendering). We initiated a few proof of concepts and found out how to both isolate the technology from the rest of the system as well as allow more technology choices.
  • We found that the some of the data flows were not as clean as we thought they'd be - adding new features caused more resource interactions than we thought when we partitioned the resources. We redefined some of the resource roles to get less message clutter (and higher cohesion)
  • The federated resource management works well, but introduce needless latency in session initiation. We now opted for introduce "Active services" which are more autonomous.
  • Add a blogjecting Watchdog in addition to local failure handling to both increase the chances of failure identification and recovery as well as get a better picture in a centralized Service Monitor.
  • RESTful control channel worked well and will continue for later release
  • Some of the scale issues will be handled by introducing "Virtual Endpoints" while some would continue to use autonoumous endpoint creation and liveliness dissemination (hopefully learning from the mistakes of others)
  • etc.
The result of these and the other decisions we've maid is a rework plan that will (hopefully anyway) make our overall solution better.
What we see is that we evolved our architecture as we went forward. While all the the decisions we made seemed right at the time we took them, only through reviewing them in a wider perspective (architecture retrospective) we identified the decisions that we need to change and the ones that we have to enhance. The insight you gain after working on a project for awhile are much better than the initial thoughts you have or the understanding you master in the initial interations.
I think it is essential to review the architecture once you've gained more experience with the realities of the system you write (vs. the precieved realities you have on the get go)

By the way if you work with a waterfall approach your situation is worse. Since in this case you take your decisions before you write any code so, you don't even have the benefit of POCs, and working code to enhance your insights


PS
if you have the MEAP version of SOA Patterns you can read more on the patterns I've mentioned here: Active service in chapter 2, blogjecting watchdog in chapter3, Service Monitor in chapter 4, Parallel Pipelines in chapter 3, Edge Component in chapter 2


 
Tags: Agile | Project Management | REST | SOA | SOA Patterns | Software Architecture

DZone recently published an interview with me on my  SOA Pattterns book. Along with the interview you can also download chapter 2 of the book (I think you need to be a DZone member to actually download it).

Chapter 2 includes  the Edge Component , Service Host , Active Service , Transactional Service and the Workflodize patterns. Additional downloads related to the book include
Lastly, you can ownload the first version of chapter 1, which I mention in the interview and the slides of a presentation on few of the patterns from Dr. Dobb's Architecture and Design World last year


 
Tags: SOA | SOA Patterns | Software Architecture

IT Business Edge published a short Q&A with me on SOA patterns - you may want to check it out :)


 
Tags: SOA | SOA Patterns

Someone calling himself r r left the following comment on part IV of my series of posts on SOA definition:

"I keep trying to read this series on SOA unfortunately suffers from the same disease as the rest of literature on the subject. stays general to a comfortable level so it can't really be applied anywhere, tends to complicate things where is not clear if it's needed, and encourages philosophical debate on what ultimately is a business (and so concrete) requirement. Meanwhile the serious (IMO) issues stay untouched - how does one actually approach an integration project with functionality, performance and security in mind. Which should be the standards used (considering the tens of standards on WS out there). How granular should the WS be (I'm done with answers like "not too much, but enough", or "well, depends on your project"). "
Before I talk a little about the "serious issues" mentioned above - I want to point out that the point of this series of post, as stated in the first post is to take a formal / semi-academic look at SOA. I started these posts as a reaction to a comment that Pete Lacey left on my blog stating that my view of SOA (as published in "What is SOA anyway?") does not demonstrate that SOA is an  architectural style. I don't pretense that this is some fully thought out academic dissertation or anything but I do try to look at the architectural roots of SOA.

That said let's take a look at the more interesting parts of this comment. First, the thing that bothers me about this reaction is (what seems to me as) the quest for final and concrete recipes. For instance consider the comment on service granularity
"How granular should the WS be (I'm done with answers like "not too much, but enough", or "well, depends on your project"")
The problems is - it does depend! and if you forgive me taking another philosophical detour, if you try to provide a hard definition for a service granularity you get  something like the heap paradox - When you remove individual grains  from a heap of sand is it still a heap when one grain remains. So while it is obvious that hiding a complete system as a single service is wrong and that exposing every little object as a service is wrong (even though for some inexplicable reason Juval lowy seems to thing that the latter is good practice) it isn't really obvious when you get too granular.

Nevertheless it is not a pure guess either. You can use some guidelines and measure them against your specific project/system/enterprise needs. Personally The set of guidelines I use is based on the fallacies of distributed computing :
  1.  The network is reliable
  2.  Latency is zero
  3.  Bandwidth is infinite
  4.  The network is secure
  5.  Topology doesn't change
  6.  There is one administrator
  7.  Transport cost is zero
  8.  The network is homogeneous
Since a service edge is boundary which may (usually is ) be accessed remotely you need to think about the incoming and outgoing interactions of the service within the fallacies stated above. if the proper behavior of the service depends on one of the above there's probably something wrong.

Regarding the other questions (how do you approach a real system), well, if you pardon me for banging my own drum, that's exactly why I started to write my experience on these matters as patterns. for instance if we look at the saga pattern (one of the patters I published online). you'd see that it is talking about achieving distributed consensus in a transaction-like manner. I talk about the problems of using distributed transaction etc., offer an architectural solution (the saga ) and then discuss relevant technology issues (e.g. WS-BusinessActivity ) as well as its implication from quality attributes perspectives (Integrity and reliability). Nevertheless even these patterns aren't an end-all solution. different circumstances require different solutions
Both my previous job and my current one involves building a scalable solution on-top of algorithmic engines. In my previous job I  managed the construction of a biometric solution that allows using multiple biometrics. In my current job I manage the development of  a mobile visual search solution . Again, while on the surface both needs to get some data, run a few  algorithms and produce an answer. These systems have very different quality attributes. On the first system we had to handle very large databases, hundreds of queries, an emphasis on modifiability and security, the current one needs millions of queries, almost no database, low latencies and emphasis on usability.  These differences result in radically different solutions, with different services, different interactions , use of different patterns etc. There's no "one right answer" (tm)


 
Tags: PaperLnx | SOA | SOA Patterns | Software Architecture

This post is part of a series of posts trying to define SOA as an architectural style. In the previous post I talked about SOA and the Layered architecture style (which generated a couple of follow-ups - one on layered architecture in general, one on its importance for SOA and on on layers in enterprise architecture vs. solution architecture)

The next architectural style SOA builds on is Pipes and Filters, Unlike Layers and Client/server which I described in previous installments, Pipes and Filter is not also a base style for REST. This basically, this style is where SOA and REST begin to diverge.
The pipes and filters architectural style defines two types of components - yep you've guessed it, Pipes and Filters.
Filters -  are independent processing steps they are constrained to be autonomous of each other and not share state, control thread etc.
Pipes - are interconnecting channels


Each filter exposes a relatively simple interface where it can receive  messages on an inbound pipe, process tthem and produce  messages on outbound pipes. The idea behind this is to allow easy composability thus allowing greater usage (also known as "reuse" - I'll discuss the difference in another post). Systems are composed of several filters working together, filters can be replaced with newer version (provided they keep the same interface) etc.
On the downside the overall latency is increased , since to accomplish a task you have to move from filter to filter.

The pipes and filters style brings to SOA things like the autonomy of services, the sense of explicit boundaries. For instance, this is the basis for why you wouldn't want to do distributed transactions across service boundaries, which I blogged about several times before.

The pipes part of the "pipes and filters" also means that the wiring can be taken care of outside of the services themselves and that you can control them externally, this works well with ithe use of middleware (service bus). Additionally Fielding (you know, the REST guy) also mentions that
"One aspect of PF styles that is rarely mentioned is that there is an implied "invisible hand" that arranges the configuration of filters in order to establish the overall application. A network of filters is typically arranged just prior to each activation, allowing the application to specify the configuration of filter components based on the task at hand and the nature of the data streams (configurability). This controller function is considered a separate operational phase of the system, and hence a separate architecture, even though one cannot exist without the other."
Which is the harbinger of the orchestration/choreography aspects of SOA.

So as you see, pipes and filters is one of the important pilars of SOA, in the next part (unless I'll have to clarify things about this post) I'll talk about the last architectural style SOA builds upon "Distributed Agents".


 
Tags: SOA | SOA Patterns | Software Architecture

Great news. Two of my friends and fellow DDJ bloggers, Eric Bruno and Udi Dahan have agreed to join my (now ours) SOA Patterns book which will be published by Manning.

Both Udi and Eric are competent and experienced architects who have experience designing SOAs . On the technology side -  Udi (“The software simplest”) specializes in .NET development e.g. his nServiceBus framework – which is a very good example for an endpoint-ware ServiceBus (vs. middleware ServiceBuses which is what most ESBs do).And  Eric, on the other hand, is a Java and C++ expert . Eric is the author of Java Messaging (one of the best books on JMS and web services ) and has also has a lot of experience in Financial systems. Together, the three of us bring a lot of real-life experience of building large and complicated system into this project.

The current game plan is for Eric to focus on the SOA pitfalls (“anti-patterns”) part of the book, Udi to provide a “putting-it-all-together” chapter , and for me to cover what’s left. I am sure however, that their experience and insight will also help make the other parts of the book (even? ) better.

If you are not familiar with the book - you may want to take a look at the first chapter and/or some of the published patterns like Saga, Service Firewall, Gridable Service, Edge Component and (a very early draft of) Aggregated Reporting pattern . Also you can take a look at the slides from my "SOA Patterns" presentation at Dr. Dobb's Architecture & Design world last year, which illustrates some additional patterns


 
Tags: .NET | Java | new | SOA Patterns | Software Architecture

This post is part of a series of posts trying to define SOA as an architectural style. In the previous post I talked about how SOA builds on the Client/Server architectural style. In this post I'll talk about how SOA builds on the architectural style of Layered System.

Layered System or Layered architectural style is one of the most basic and widely used architectural styles. Here is a definition of Layered architecture I posted in the past
The layered style is composed of layers (the components) which provides facilities and has a specific roles. The layers have communication paths / dependencies (the connectors).

In a layered style a layer has some limitations on how it can communicate with other layers (the constraints). Typically a layered is allowed to call only the layer below it and be called only by the layer above it (but there are variants e.g. a layer can call to any layer below it;  etc. - all is fine as long as the layers communication paths are limited and restricted by some rules)
SOA takes the strict layers definition and restricts the knowledge of one service only to the service interface/contract of the other services. This means the services cannot be aware or care about the internal structure of other services. Services don't mind the internal structure of other services. This helps with introducing the  "boundaries are explicit" tenet  (although, it build on more than just layering)

The layered nature of SOA means you can also add additional layers between the services. One very common example is adding a servicebus (e.g. using an ESB or tools like NServiceBus) other examples can include load balancers, firewalls (see Service Firewall pattern) etc. Naturally, When you add intermediary layers  services don't talk to each other directly rather accept the services (such routing , message persistence etc.)  from the intermediary layer.

It should be noted, that in the context of SOA the layers are, in most cases, actually tiers. The difference is that tiers provide (potential) physical separation where as layers provide logical separation . When a layer is actually a tier it has extensive implication on the level of trust between the tiers (see my post "Tier is a natural boundary" for more details)

The next post in the series will talk about the "Pipe and Filters" style  and SOA. This is the first place where the REST architectural style and SOA diverge.


 
Tags: REST | SOA | SOA Patterns | Software Architecture

Sam Gentile and myself exchanged a few blog posts on the definition of SOA, in the latest installment Sam disagrees with me that SOA should first be looked at in the pure architectural sense without bundling in the business and enterprise aspects.
In a nut shell I have two main reasons to prefer looking at SOA at the core as a pure architectural style.
The first is the when you bundle in enterprise-wide aspects of implementing SOA you loose out on the option (or the audience) that can use it to solve more local problem (i.e. at the product/solution level) using the same principles that bring the benefits on the enterprise scale.
The other reason I have  for separating the concepts is that the business encompassing definitions tend to be fluid, hand waiving ones and cannot be measured for compliance.
Consider the definitions Sam quotes from  Thomas Erl's books:
"SOA establishes an architectural model that aims to embrace the efficiency, agility, and productivity of an enterprise by positioning services as the primary means through which solution logic is represented in support of the realization of strategic goals associated with service-oriented computing." (emphasis by Sam)"

SOA represents a model in which functionality is decomposed into small, distinct units (services), which can be distributed over a network and can be combined together and reused to create business applications. [3]
Now what the hell is that? These are all noble goals but shouldn't this be the goal of any enterprise architecture ? What makes SOA unique in this sense?
Also how does these definitions help us build services? what makes a service a service ? Why is (or isn't) any web-enabled component a service?
Definitions that distance themselves from the architectural roots seems to me like smoke and mirror and contribute to the general confusion around SOA - to the point where even people like Harry Pierson wonder why we should even bother defining it

Personally, I still think it is worth while defining *** ( the architectural style, formerly known as SOA) since as I mentioned earlier it is (in my opinion) a useful architectural style for building distributed systems - whether the distributed system is a solution, a product, a product line or a complete enterprise





 
Tags: SOA | SOA Patterns | Software Architecture

In the previous post  on defining SOA I claimed that SOA is an architectural style building on 4 other architectural styles. The first one of these is Client/Server.
Describing client/server is easy - not because I am such a genius (far from it) but it has already been done before numerous times. Let's take a look at the definition from  Roy Fielding  in his famous dissertation (The link is to chapter 3, REST is defined in chapter 5 if you are interested)

The client-server style is the most frequently encountered of the architectural styles for network-based applications. A server component, offering a set of services, listens for requests upon those services. A client component, desiring that a service be performed, sends a request to the server via a connector. The server either rejects or performs the request and sends a response back to the client. A variety of client-server systems are surveyed by Sinha [123] and Umar [131].

Andrews [6] describes client-server components as follows: A client is a triggering process; a server is a reactive process. Clients make requests that trigger reactions from servers. Thus, a client initiates activity at times of its choosing; it often then delays until its request has been serviced. On the other hand, a server waits for requests to be made and then reacts to them. A server is usually a non-terminating process and often provides service to more than one client.

Separation of concerns is the principle behind the client-server constraints. A proper separation of functionality should simplify the server component in order to improve scalability. This simplification usually takes the form of moving all of the user interface functionality into the client component. The separation also allows the two types of components to evolve independently, provided that the interface doesn't change.

The basic form of client-server does not constrain how application state is partitioned between client and server components. It is often referred to by the mechanisms used for the connector implementation, such as remote procedure call [23] or message-oriented middleware [131].

SOA takes from the Client/Server style the two roles - ie. in each interaction one party is the client (what I call service consumer) and the other is the server (service) which  handles the request coming from the client*. Unlike traditional client/server, the roles are held only for a particular set of interactions - a given interface that the service exposes. In another set of interactions the roles can be reversed and a component that once was a server can now act as a client even working with the very same component that was previously its client.

Like REST, SOA takes the constraint of separation of concerns which allow the service and its service consumers to evolve independently (as long as the interface is kept).
In order to support this, services should takes care of all its internal state without exposing its internal state or its internal structures outside of the service. This also allows the service to scale behind the interface but for that we also need constraints and capabilities from the next architectural style layered system, which I'll discuss in the next installment on this subject.


* You can compose SOA with other architectural styles to get different behaviors. E.g. compose SOA and  EDA and you can have the service also push data.This t isn't, however,  something SOA ,manifest in its basic form


 
Tags: REST | SOA | SOA Patterns | Software Architecture

November 24, 2007
@ 06:34 PM
A few weeks ago I posted a reaction to a post by Pete Lacey that asked what is SOA. In a comment to my post Pete said that my definition isn't good since
"...even according to your definition, an architectural style contains constraints, and to date neither SOA nor web services have been shown to exhibit any constraints"
The idea behind this series of posts is to try to take a little more formal view at what I think SOA is. It is based on my thinking for the past few weeks but it is also still a work in progress (so any comments are welcome)

The way I see it SOA is an architectural style which is derived from the following architectural styles:
  1. Client/Server
  2. Layered System
  3. Pipe and Filters
  4. Distributed Agents
Note that if you add to the above statelessness, uniformed pipe and filters and a cache you can get a RESTful SOA. This is not REST as REST itself does not require distributed agent or even pipes and filters (but it does build on client/server and layered system). In other words not all RESTful systems are SOA, you can build SOAs which are not RESTful and you can build RESTful SOAs.

The main components of SOA are Service,Message, Contracts and Consumers. Policies also exists but now I tend to think they are optional. The four architectural styles mentioned above affect the definitions of the different components and the way they interact together

In the following posts on this subject I'll first take a look at each of the contributing architectural styles and how they affect SOA and later try to provide a definition that builds on them


 
Tags: REST | SOA | SOA Patterns | Software Architecture

September 4, 2007
@ 11:03 PM
When I begun writing SOA patterns, the first version of chapter 1 was a general introduction to Service Oriented Architecture from the perspective of Software architecture. When the editors saw the patterns chapters they've felt the chapter wasn't focused enough on patterns so I rewrote it untill it finally molded into the current version.

Nevertheless, I think that the first version has value on its own providing some guidance on the influences on architecture and putting SOA in an architectural context. I will probably edit it a little over the next few days so that it would be standalone (i.e. disconnected from the book). Meanwhile you can download the original version from here



 
Tags: Everything | SOA | SOA Patterns | Software Architecture | Papers

August 1, 2007
@ 09:52 PM
I won't say anything about my presentations (that's for others to say :) ). The point of this post is just to let you download them. So here they are:
  • SOA Patterns (2.14mb) - Takes a look at different strategies (patterns) to solve common SOA pitfalls
  • Getting SPAMMED for architecture (4.56mb) - Takes a look at the activities architects can/should do when they think about software architectures. The presentation also covers architecture in agile projects.


 
Tags: .NET | A&D2007 | Agile | Everything | SOA | SOA Patterns | Software Architecture | SPAMMED Process

While I am getting ready to fly to A&D world 2007 where I'll present both SOA patterns and the SPAMMED architecture framework, I thought I'd throw in a little update on the book as well.

I've made a small change to the way chapters 5-7 are organized. They are now grouped under a separate part called "Service Interaction Patterns" (and chapters 2-4 are grouped under "Structural Patterns").
  • Chapter 5 is focused on Message Exchange Patterns (MEP): synchronous, asynchronous, events and transactional  - The patterns there are not new for SOA, instead the focus is on the meaning of implementing the usual MEPs under SOA constraints. I sent it to manning early last week so hopefully it would be available on MEAP soon.
  • Chapter 6 is called "Consumer Interaction patterns" and includesthe UI interaction patterns as well as interaction pattern with other types of consumers. This is the chapter I am currently working on.
  • Chapter 7 is unchanged for now
Lastly,  as you may remember,  I publish online one pattern from each chapter so I'd be happy to get comments on which of the following three patterns (from chapter 6) you like  to see on-line: Reservation pattern (making partial commitments), Client/Server/Service (integrating Legacy or thin clients with SOA) , Client/Service (integration Rich clients with SOA) - if you want to vote just send me an email or leave a comment


 
Tags: Everything | SOA | SOA Patterns

Evan H asked a question about distributed transactions and services in the MSDN architecture forum:

Are distributed transactions (ie.. WS-Transaction) a violation of the "Autonomous" tenant of service orientation?   Yes or No and Why?  Kudos if you can address concurrency and scalability (in an enterprise with multiple interacting services).

I answerd this questions back in april when I wrote a couple of posts that explained why cross-service transactions are a bad idea:cross service transactions and some more thoughts on cross service transactions.
Roger Sessions also agrees with this view (well, it seems actually, he wrote about it well before I did :) ):
When the WS-Transaction specification was first proposed, back in 2002, I wrote an article explaining why I thought the idea of allowing true transactions to span services was a bad idea. I published the article in The ObjectWatch Newsletter, #41: http://www.objectwatch.com/newsletters/issue_41.htm. Nothing since then has changed my mind. Atomic transactions require holding locks, and spanning transactions across services requires allowing a foreign, untrusted service to determine how long you will hold your very precious database locks. Bad idea. Just because IBM and Microsoft agreed on something doesn't make it good!

The reason I am bringing this issue back is that Juval Lowy (who wrote the article that triggered my first post on the subject) has recorded an Arcast with Ron Jabobs. Where he re-iterated the idea that "Transactions is categoricaly the only viable programming model" and you should strive to use it whenever you can. It seems Juval admits you sometimes need to use Sagas (which he called "long running transactions" - you can see in my link why I think that's a wrong name). He also agrees that you can also use a transactionable transport and then only do internal transactions from each service to the transport (a pattern I call "Transactional Service"). However, at the end of the day, he still thinks you should use WS-AtomicTransactions whenever you can.

I agree that transactional programming is important. I think it is the simplest programming model (from the developers side). I would probably never write an interaction with a database that is not transactional; I look very favorably at initiatives for in-memory ACI (no Durability) transactions such as the one Ralf talks about.  Until we get to Distributed Transactions...

First, we should note that transactions are not "the only viable" option.As Martin Fowler notes Ebay seems to be doing fine without distributed transactions. Not only that, they abandoned distributed transaction and went "transactionless"because they needed one simple thing... Scalable performance .

In most COM+ scenarios you have a single server or a few internal servers where the distributed transaction happen - and even there you should plan your transactions carefully if you want to get any kind of decent performance. In SOA scenarios the situation is more complicated as the distribution level is expected to be higher (even if you don't involve services from other companies). More distribution means longer times to complete transactions (especially if a participant can flow the transaction and extend it). It also means increasing the chances of failure (see Steve Jones series of posts on five nines for SOA). In my opinion, the more distributed components you have the more you want their interaction to be decoupled in time - i.e. the opposite of transactions.

Juval also said he doesn't buy the denial of service problem I mentioned (supporting a transaction means you allow locks - if an external party doesn't commit you retain the lock..). Juval said he assumes that a solutions has both authentication and authorization so this shouldn't be an issue. For one, I have seen too many projects where security was something that was neglected or quickly patched in at the latest moment - so I would hardly assume security. Even with security on - you increase your attack surface.
But that's just the half of it. Even if all your service consumers have good intentions - you still don't know anything about their code. SOA is not like the "good old days" where you owned the whole application  - this means you cannot trust their security to be ample. Also you don't know anything about their code quality. Services are likely (in the general case) to be deployed on different machines, even if they start co-located. I think that a Service boundary should be treated as a trust boundary just like a tier boundary. I strongly believe you should have reduced assumptions on what's on the other side of the service's boundary - transactions are not reduced assumptions

SOA and distributed transactions do not go hand in hand - it isn't just autonomy at stake here. It is a problem for performance and scalability and even security period.

To finish this post - I would also highly recommend looking at Pat Helland's paper "Life Beyond Distributed Transactions: an Apostate's Opinion" and a post he recently made called  "SOA and Newton's Universe", where he explains more eloquently than I ever could why SOA is not a good fit for distributed transactions.



 
Tags: .NET | Everything | SOA | SOA Patterns | Software Architecture

In addition to the drafts of selected patterns I publish on my site, you can now purchase my book via the Manning Early Access Program (MEAP).
MEAP means you can get chapter drafts as I write them and the complete book when its done (ebook or printed). Here is Manning's explanation:
"Buy now through MEAP (Manning Early Access Program) and get early access to the book, chapter by chapter, as soon as they become available. You choose the format - PDF or ThoutReader - or both. By subscribing to MEAP chapters, you get an opportunity to participate in the most sensitive, final piece of the publishing cycle by offering feedback to the author. Reader feedback to the author is welcome in the Author Online forum. As new chapters are released, announcements are made in the MEAP Announcement Forum. After all chapters are released, you will be able to download the complete edited ebook. If you order the print edition, we will ship it to you upon release, direct from the bindery, weeks before it is widely available elsewhere.
By the way, this is probably also a good time to mention that I'll be speaking about quite a few of the patterns in Architecture & Design World 2007 which will take place this July.

There is still a lot of work, but I already like to thank all the people in manning that helped me get this far. especially to Cynthia Kane my editor (hey, maybe now she'll give me more slack :) )
Ok, 'nuff blubbering, back to completing chapter 5...


 
Tags: .NET | Everything | SOA | SOA Patterns | Software Architecture

June 6, 2007
@ 10:02 AM
While I am on the topic of REST, it is probably a good time to comment on my (first) post on InfoQ "Debate: Does REST need a Description Language"

Personally, I think there's merit in Services publishing their message structures in a machine readable format. When a Service has a machine readable contact. generated stubs allows you to make the interaction with less bugs vs. hand crafted interactions. It also makes it easier to test the service itself.

I do agree with Stefan's views on runtime interface dependency where he said that if a service consumer needs just 20% of the information in a service it shouldn't be forced to deserialize (i.e. know or care about) the whole message.However, I think this is a weakness of tooling not the concept. What if you had a tool that reads the machine readable contract, allow you to pick the 20% you need and generate for you a stub that ignores all the other 80% and "hand pick" the 20% you need. This is what you would personally do yourself anyway, and since the code is generated from the Service's definition it would be more resilient and error-free This is effectively designing a personalized mini-contract from the published general one. It does mean that when that 20% changes you will be affected, but this is something you'd have anyway.

I also agree that that the WS-* standards and resulting contract are (and getting more) complicated. Much of this can probably be attributed to the "design by committee" effect. However, there are also some real challenged that the SOA and ROA architectural styles do not address and we still need to solve those. Trying to solve these challenges is, by the way, what prompted me to write my SOA patterns book...


 
Tags: Everything | SOA | SOA Patterns | Software Architecture

Udi Dahan writes that ".NET/Java Interop is not a reason for SOA". Udi writes that companies that  need to integrate two technologies turn to web-services and that
"The only problem is that in order for things to work right, they really must have a chatty interface, and flow transaction context between these “services”, and all the other things I describe as anti-patterns"

Udi is right that if you don't rethink and remodel your systems you will (probably) not  have an SOA as you are likely to find your self implementing  anti-patterns such as the ones he mentions.

However, using Web-services does not automatically mean that you are doing an SOA. If you don't think about moving to SOA you can still opt to use web-services as a remoting  or RPC technology to connect two systems. The advantage over the other proprietary products Udi mentions is that web-services are a standard technology. This will work well or fail is orthogonal to the technology choice. It depends on the architectures of the systems you integrate. If you need to flow transaction between the systems you'd also need that even if you cross-compile one of the applications in the other environment.

Another thing I don't agree with is the word must Udi uses. First, while it is likely that older systems has chatty interfaces it is not a must. The designers of the legacy system may have thought about the consequences of distribution without regard to SOA. Also you can still wrap an existing system with a service contract (using web-services or any other technology) and not get to chatty interfaces etc. However that means that the wrapper should have some substance or business logic inside it to mask the old system's behavior this is especially important  if you are thinking about moving to SOA and you take into consideration that the business will not just halt and wait there until you are done. You have to think about interim solutions, such interim solutions can include wrapping a legacy system with an Edge Component and a SOA facade (a pattern I call Legacy Bridge) while you move in the grader direction of a full blown SOA.


 
Tags: .NET | Everything | SOA | SOA Patterns | Software Architecture

It has been awhile since I last published a pattern draft - but I guess it is better late than never.
The saga pattern deals with manginc complex interactions between services without the use of atomic transactions, which as I mentioned in the past are not a good idea (see "Transactions between services? No, No, No!" and "Some more thoughts on Cross-service transactions" )

You can download the draft for the Saga pattern from here.
I'll also add a link to it from the SOA Patterns book section (where you can also download the other pattern drafts I published)

By the way, I am not happy with the current  sketch (the pattern illustration) in this pattern, so it will probably change in later drafts. I would be happy to hear any suggestions you have for improving it.


 
Tags: Everything | SOA Patterns | Software Architecture

May 15, 2007
@ 08:16 AM
Pat Helland is back in Microsoft (after a two years vacation in Amazon)  and more importantly he also restarted blogging. I only met him in person a few times - but he is definitely one of the few persons really worth listening to - especially when it comes to distributed computing. Not only does he make interesting observations he is also capable of explaining them in a crisp and interesting manner.  Indeed, it didn't take too long (his second post) before he blogged some valuable content. The post is called Memories, guesses and apologies (go read it).

Pat talks about how the notion of time in a distributed environment is subjective and you can really know what happened before what and what we can do about it (I really think you should  just go read it :) ).
Another related aspect of the phenomena Pat mentioned is that taking a snapshot in time, the chances of having a single unified truth in a distributed system degrade in a proportional manner to the system's load. I  had a chance to work on a few systems where some of the sites had either occasionally connected or connected over  low bandwidth networks. This situation makes the whole notion of guessing the state and compensating and/or apologizing for wrong conclusions much more explicit than in always connected high bandwidth system. Nevertheless, latency still exists even in connected systems and and you should really be weary of assuming a universal truth - unless you can stop the businesses  long enough to allow complete synchronization.
As I mentioned a few days ago, we can't afford to have cross-service transactions (I also think we can't afford too many distributed transaction in non-SOA architectures, but this is a especially true for SOA) which makes things even worse in this sense. One thing we can do in an SOA to achieve distributed consensus is to run a Saga. Saga, which is a long running conversation between services, is probably one of the most important interaction patterns for SOA.
You know what? instead of trying to explain it here in a haste i'll just publish the pattern draft - I'll try to do that before the end of the week.






 
Tags: Everything | SOA | SOA Patterns | Software Architecture

April 30, 2007
@ 07:34 PM
An article I wrote on Business Intelligence (BI) and Service Oriented Architecture (SOA) has just been published on MSDN.
You can find it here http://msdn2.microsoft.com/en-us/library/bb419307.aspx.

The article explains the SOA & BI mismatch and how to bridge it by adding EDA to SOA. (I bloged about it here before, but the article is more ordered and complete)


 
Tags: .NET | Everything | SOA | SOA Patterns | Software Architecture

I'll be presenting a  90 minute class on SOA Patterns on the upcoming Architecture & Design world 2007 - which will take place in Chicago on July 24-27th.
If any of you happen to be there, I'd be very happy if you drop by and say hello :)

 


 
Tags: Everything | General | SOA Patterns

April 17, 2007
@ 01:58 PM

After seeing Juval Lowy's article on WCF transaction propagation in the May issue of MSDN magazine. I posted  " Transactions Between Services? No, No, No! " in my DDJ blog. I've got a few comments which I thought warrant a post in their own-right.

The previous post was triggered by an article that promoted flowing transactions (i.e. you perform a transaction against one or two services and then one of the services calls an additional service and it joins the transaction). It is important to say that I think transactions between services should be discouraged regardless of automating extension of transactions. Transaction propaqgation just makes the matters worse.


There might still be some edge case where you have to have an atomic transaction from a service consumer to the service. I think that in the vast majority of SOA implementations you shouldn't do that and I would think real hard about the other options before allowing it in my architecture.In general  I think cross-service transactions are an antipattern (and that's the way you'd find them documented in my SOA patterns book :) )

One of the comments I received began with:

"Cross service transactions are a sure way to introduce coupling and performance problems into your SOA." I'm not sure I agree with that thought. Logically speaking, cross service transactions are a must. The question is how to implement them. There are two mechanisms we can use for implementing TXs: (1) ACID TXs; (2) Long-running TXs. The latter is preferable for the cases Arnon is talking about (large geographical distances, multiple trust authorities, and distinct execution environments). ACID TXs are more suitable for what Guy has mentioned (DeleteCustomer service invokes the DeleteCustomerOrder service internally). I agree with Arnon the a-synchronicity is preferable, but we all have encountered use-cases where ACID-ness is required from a business requirement level... [snipped]


One minor point in regard to this comment is that I don't like the term long running transaction - there is a long running interactions between services and I think the term SAGA describes them better. Sagas are made of a series of business activities that flow back and forth between services to realize a larger business process. Note that these interactions doesn't necessarily have transaction-like behavior.


which brings me to the more important point of looking at the statement "Logically speaking, cross service transactions are a must". I don't think so. For instance, if a service that manages the inventory in a warehouse receives a request for some items and later a cancelation of that request. The first request can trigger the inventory service to order some more items from a supplier. Whether or not the cancellation would cause a cancellation of the order of the supplier depends on the business rules of the inventory service for inventory levels for the items ordered. it might also depend on whether or not the items have already been received etc. The cancellation (the "abort") of the original request does not have to translate to an abort (or compensation) on the request receiver. Furthermore if the service communications model is based on the push model (e.g. using EDA with SOA) the cancellation notice would just be propagated without regard to the inventory service -. It is the inventory service's responsibility to understand the ramifications of this event and act accordingly. Even the example given in the comment 'DeleteCustomer service invokes the DeleteCustomerOrder service internally" is not a good candidate from ACID transactions (there's also a problem of service granularity here - I'll talk about it later). Since when the customer service decides to delete a comment and request the Orders service to delete orders - there's a reasonable chance that some of the orders are already paid for but not delivered. In this case the customer cannot really delete the customer until all the paid orders are resolved. Or maybe the order service is a facade to a night batch that does the actual deletion. - I know I am just fantasizing with these examples but the point is that the customer service has no knowledge on the order service or the inventory service above except the messages supported in their contract. To assume something about the internal behavior is problematic. Even if you know about the internal structure on the onset, the whole idea of SOA is that the services can evolve independently from each other...


Another thought triggered by the example in the comment originated by the granularity of the services (DeleteCustomer service vs. a Customer Service that also supports deleting customers) is that we should be really conscious to the difference between other architectures like 3-tier client/server and SOA. SOA is actually more distributed than 3-tier - we cross a distribution boundary every time we pass a message from a service to a service and not just when we move a massage from a client-tier to an application server. We add this distribution to gain advantages in flexibility and agility. However, we should note that this is a weakness of SOA (considering for example, that Martin Fowler's first law of distributed object design is" Don't distribute your objects!") means we should really pay attention to the way services interact with each other.

  • The granularity of services - having a lot of fine grained services means there will be a lot of interactions over the wire (even if you don't go out to the network you still have to serialize/deserialize, follow the security policy etc.) rather than internal interactions that much faster
  • The Granularity of messages - The same considerations should also guide us to try to create larger and fewer messages. for the example above . Instead of a DeleteCustomerOrder message maybe something like an UpdateCustomersOrders message that can hold a list of customers and orders and the status changes or . by the way this would also support off-line clients better since they can accumulate changes.
  • The assumptions we can make on the other service's availability, performance, internal structure, the trust we have for it etc. - We should try to minimize the assumptions we make and concentrate on what can be inferred from the contract. Remember that policies can change externally so the business logic within a service cannot count on them being constant. this brings us back to the issue of transaction. every cross-wire interaction increases the chances of failure - in transactions one failure invalidates all the transaction is invalidate. every cross-wire interaction within a transaction increases the length of time we lock internal resources (even if we do trust all the involved parties) - especially if that transaction can extend itself automatically. Also as I've mentioned in the previous post the transactions also open the door for denial of service attacks.

If we want to reap the benefits that are sold under the SOA moniker, like flexibility and agility, we really have to pay attention to this extra distribution and design our services differently than we would components in a 3-tier architecture - but hey, that's why they pay us the big bucks, right ? :)

I should probably also add  that building SOAs is not a goal in itself. We can build perfectly good solutions using other architectures - but if we find that we do need SOA (or any other architecture for that matter) we have to pay attention to the way we implement it to both keep its benefits and not harm other quality attributes like performance, security etc..



 
Tags: .NET | SOA | SOA Patterns | Software Architecture

I've updated the draft for the Edge Component Pattern to a more legible version (thanks to Cynthia Cane my editor @ manning).

The Edge component pattern solves the following dilemma:

How do we allow the business aspects of the service, technological concerns and other cross-
cutting concerns like security, logging etc. to evolve in their own pace and independently of
each other?

 


 
Tags: Everything | SOA | SOA Patterns

I was going to try to explain why it took me so long since I've posted the last pattern draft on-line when I saw that a couple of my fellow Manning authors already did that. See Roy Osherov's "Writing a book is like developing Software" and Fabrice Marguerie's "My Writing Experience". I have similar experience here -there are a few commonalities for software writing and it seems that the counter measures of shorter iterations, refactorings (which I guess writers know as rephrasing) and increased inspections seem to work here as well.

Finally, I am back to writing new stuff and I am completing Chapter 4 now. Chapter 4  deals with SOA security pattern, and I've decided to release the "Service Firewall" pattern as free draft. Note that it is a draft and it can change by the time it gets to publication for example the Edge Component, which I published a few months ago already went through some extensive rewrite (maybe I'll post the updated draft..)

The Service Firewall helps deal with malicious "service consumers" and protect the services from several types of attack including for example XDoS (XML Denial of Service), malicious content, preventing leaking private information from the service etc.

You can download the draft for  Service Firewall  pattern  from here .


 
Tags: Everything | SOA | SOA Patterns | Software Architecture

February 11, 2007
@ 08:25 PM

Udi has some comments on my SOA definition. Udi says that the definition I provided does not support  the notion of publish/subscribe using topics for services. My answer to this is yes and no :)


First thing first, I never said (or at least I never meant to say) that contracts are limited to only incoming messages. Contracts contain incoming and outgoing messages.   I probably should have stated it more clearly though.
Udi says “Contract: Who owns the message type being published? The publisher or the subscriber? Common SOA knowledge would say that the message belongs to the contract of the service that receives it”


I don’t know who is “Common SOA knowledge”. In my opinion, this thinking is a wrong “even” for request/reply. The reply message belongs to the service the sends the reply


Regarding Endpoints – if the subscribers go to a topic as in “ServiceName\TopicName “ then yes I would call that an Endpoint since this is a well known address consumers (subscribers) go to find messages published by a service


Regarding consumers Udi says “ Is the publishing service “using” the subscriber when it publishes a message? I don’t think so, and the subscriber definitely isn’t using the publisher at that point either. So, we’ve got some inter-service message-based communication going on and it isn’t clear if we even have a service consumer. In fact, if all a service ever did was subscribe to some topics, and publish messages on other topics, it looks like we’d have very loose-coupling but be straying from the common SOA wisdom.”


Maybe that’s just semantics but I don’t see why the subscriber isn’t using the publisher- The publisher publishes a message on a topic this is part of its offering. The subscriber chooses to consume that information and maybe do some stuff with that – possibly publishing some other messages. That’s a “using” relationship to me.


Nevertheless - SOA is not a synonym for "Distributed system" so there are cases when distributed components that communicates through messages aren’t SOA. For example publish/Subscribe using topics where the topics are common and shared between components so that multiple services can publish on the same topic does not, in my opinion, fall under the definition of SOA . This doesn’t say that this is a bad architecture in any way – but it isn’t SOA either.
As I said in the “What is SOA posts” for an architecture to be SOA you need autonomous components , that publish and accepts messages defined in contracts, delivered at an endpoint and governed by policies to service consumers – no more, but no less either.


 
Tags: Everything | SOA | SOA Patterns | Software Architecture

February 9, 2007
@ 06:50 AM

I've been talking about SOA for a while now it's finally time to (try to) properly define it

I've publised this as a 5 posts on my DDJ blog and I thought it was good enough to be publised as a single whitepaper:

"Service Oriented Architecture or SOA for short has been with us for quite a  while.  Yefim V. Natiz, a Gartner’s analyst, first talked about SOA back  in 1996. However it seems that only in the recent year or so SOA has matured enough for real systems based on the SOA concepts to start to appear – or has it?  There is so much hype and misconceptions surrounding SOA that we first have to clear them all up before we can explain what SOA is – let alone identify who really uses it...." (Download full PDF (670K))

You can see additional presentations and papers I wrote here


 
Tags: Everything | SOA | SOA Patterns | Software Architecture

January 7, 2007
@ 11:02 PM

[based on a few posts from my DDJ blog]

Implementing Business Intelligence (BI) solution on top of Service Oriented Architecture (SOA) is not a simple feat. A recent survey by Ventana Research shows that "...only one-third of respondents reported they believe their internal IT personnel have the knowledge and skills to implement BI services.". There's a good reason for that since there an inherent impedance mismatch between BI and SOA which takes some effort to overcome. The purpose of this paper is to look to explain the problem as well as look at the possible solutions.

Service-Oriented Architecture is about autonomous loosely coupled components. These traits gives you lots of benefits such as greater flexibility and agility but it also means that services have private data. Data that you don't want to expose to the outside as exposing it will decrease autonomy and increase coupling. This is why services only expose data and processes via contracts rather then exposing their internal structure.

That is all fine until you start to think about business intelligence. The cornerstone of any business intelligence initiative is gathering, collecting and consolidating data from all over the place. Once you have the data, you can use tools to analyze it, data mine it, slice, splice, aggregate, and whatnot. Traditionally BI builds on ETL (Extract, Transfer, Load) which goes directly to the database of the involved sources.

And here lies the problem: On the one hand we have services that want to keep their data private, and on the other we have a datamart or warehouse that wants that data badly.

What are our options?

  • If you go with traditional ETL, you introduce coupling into your service.
  • If you only rely on contracts that were constructed for business processes you may be missing out on important data.
  • If you build a specific contract that exposes "all" the data you are back at the point-to-point integration -- solving point-to-point integration is one of the reason we want SOA in the first place.

The second option seems to be the most reasonable choice of the three -- but it also has several problems. One problem is that the BI needs to know about all the contracts. The second was already mentioned -- important data might be missing. The third problem is that the BI system need to fetch data from the services which means it may miss out on data in the intervals between request. On the other hand, too frequent requests and you can congest your network easily as well as cause DOS on your own services.

Clearly we need a fourth option

In my opinion, the best way to tackle BI in SOA is to add publication messages into the contract. By "publication messages", I mean that the service will publish its state either in a periodic manner or per event to anyone who listening. This is a service communication pattern which I call "Inversion of Communications" since it reverse the request/reply communication style which is common for SOA.

To make the solution complete, you can add additional requests/reply or request/reaction messages to allow consumers to retrieve initial snapshots. Following this approach, you get an event stream of the changes within the service in a manner that is not specific for the BI. In fact, having other services react on the event stream can increase the overall loose coupling in the system - for instance by caching results of other services

Why is this better than the other three approaches? For one , you can get a good picture of what happens within the service. However the contract is not specific for the BI and can be used by other services to cache the service state (thus increasing their own autonomy), for reporting (you can see an early draft of the aggregated reporting pattern), and for BI purposes. By working against a steady stream of events, the BI platforms can Analise treands, keep history and get the complete picture they need.

The approach above is sometimes referred to as "Event Driven Architecture" (EDA) and while I (and others) see EDA as another facet of SOA, not everyone agrees. Gartner, for instance, sees EDA as another paradigm and SOA just for request/reply, or client/server. Recently, however, they published a paper that calls the approach described here as "Advanced SOA". I tend to agree more with the "advanced SOA" definition and don't see a contradiction with EDA and the SOA definitions. We are still using the same components and the same relations only adding an additional message exchange pattern into our toolbox.

A note on implementation: If you are implementing SOA over an ESB that is rather easy to implement as most ESBs support publishing events out of the box. Using the WS* stack of protocols, you have WS-BaseNotification, WS-BrokeredNotification and WS-Topic set of standards. If you are on the REST camp, then I guess you will need to implement publish/subscribe by yourself.

Once you have event streams on the network, The BI components grab that data scrub it as much as they like and push it to their datamarts and data warehouses. However, event steams can also enable much more complex and interesting analysis of real time events and real time trend data using complex event processing (CEP) tools to get real-time business activity monitoring (BAM)

You can also get post as as a presentation down loadable from the papers section on my site or directly from here. (The download is about 3MB.)



 
Tags: Everything | General | SOA | SOA Patterns | Software Architecture

December 15, 2006
@ 10:50 AM

One unique aspect of SOA vs. other architecture styles like Object Orientation , Client/Server or even 3-Tier architecture is that it is built for highly distributed systems. Each and every service is a sub-system in itself it can run on its own machine and be located everywhere in the world . Many times, the service itself needs to be distributed in its own right. One reason to use distributed computing inside the service is computational intensive tasks.

 

 One of my recent projects was the development of a  biometric platform.  The platform can be used for many usage scenarios. A simple scenario is an access control systems - e.g. authorize entrance into a secure building or area. This is a relatively simple scenario as you usually only have to deal with few thousands of people and as a person requests entry she also declares who she is (e.g. using an RFID card with her ID). In these cases you can go to the database, lookup the appropriate record , run the biometric algorithm or algorithms and verify the person is who she says she is. However the same platform also has to work for other, much more demanding and computing intensive scenarios. For example consider a forensics scenario where you have a fingerprint collected at a crime scene, in this case you don’t know who the person you are looking for is, and you have to run your search on basically all the database which can contain millions of records. Keep in mind that when you match a biometric template[1] you calculate the probability of a match (based on the internal structure of the template) and  that each template weights about a one kilobyte you quickly realize that this can be quite a CPU intensive task.

Sometimes when you develop you SOAs you will have algorithmic tasks or other computational heavy tasks such as the one mentioned above and the question is

 

How can a Service handle  computational heavy tasks in a  scalable manner?

 

You can get the full pattern from here

[This is an early draft of one of the Performance, Scalability and availability Patterns from my SOA Patterns book]



[1] you can think of biometric template as a signature or a hash that represents the biometric sample. The template is smaller than the sample but contains enough information to identify the original.

 


 
Tags: Everything | SOA | SOA Patterns | Software Architecture

December 5, 2006
@ 08:16 AM

I've added a section called SOA Patterns on the site while holds the current draft for the table of contents of the SOA Patterns book I am writing. The section lists the problem each pattern addresses as well as links to published patterns. Also, you can  use this to monitor my progress (patterns that already have their problem written down already have drafts; the others are in-progress or not started).

I am currently working on chapter 4: Security & Manageability patterns (not counting delays mentioned in the previous post).

Also, as I think I've already mentioned, I'll make public at least one pattern per month, if you are interested in a specific pattern in particular (from those which are ready - now chapters 2&3) drop me a note and I'll publish the one that gets the most votes

 


 
Tags: Everything | SOA | SOA Patterns | Software Architecture

December 1, 2006
@ 09:29 PM

My editors at manning think that my chapter 1 of the SOA patterns book is not good enough.

They basically say that the chapter talks about too much theory vs. the other chapters which contain much more down-to-earth stuff (e.g. Edge Pattern, Aggregated Reporting Pattern, Decoupled Invocation Pattern ). Also they’ve said that I spend too many pages explaining what architecture is or taking about distributed system before I get to SOA – which is the topic of the book.

The way I see it, understanding architecture and distributed systems is essential to understanding SOA (from the development side i.e. when you want to design and build services). For example the discussion on quality attributes explains how you can use scenarios to find architectural requirements (and each pattern then has a section on relevant scenarios to help you find if the pattern is applicable to your needs)

I would be very interested in hearing what you have to say (either as comments here or emails to me) about the Chapter’s structure and content (considering most of the books will be patterns like the Edge pattern)

Thanks in advance


 
Tags: Everything | SOA | SOA Patterns | Software Architecture

November 15, 2006
@ 09:33 PM

The business rationale behind going on the SOA road is increasing the alignment of the business and IT, so we divide the business into a bunch of business services and everything is just fine. However the minute we start diving into the SOA implementation details we are swamped by a horde of technologies, cross-cutting concerns (auditing, security, etc.) and whatnot.

For example, in one project I was involved with, we implemented an SOA over a messaging middleware (Tibco's Rendezvous). Just when everything was fine and dandy - along came another project which could potentially use few of the services. Well, almost, it needed a slightly different contract and it also used completely different wire protocol - WSE 3.0 (Microsoft interim solution for the WS-* stack before Windows Communication Foundation). And that's just one simple example - cross cutting concerns and implementation details are everywhere. The question is then:


How can you handle cross cutting concerns like multiple technologies, protocols, changing policies etc. while keeping the service's focuses on its core concerns - i.e. the business logic.


You can get the full pattern from here

[This is an early draft of one of the Service Structural Patterns from my SOA Patterns book]


 
Tags: Everything | SOA | SOA Patterns | Software Architecture

The draft for the first chapter of my SOA Patterns book is available on-line from Manning Publications Co.

The first chapter talks about  software architecture and the inputs the architect can/should use to design one (emphasizing Quality Attributes); Explains the challenges of distributed systems and takes a look at the SOA from an architectural perspective.

You can download the chapter from here

Any comments are welcome (you can also leave your comments at soa@rgoarchitects.com)


 
Tags: Everything | SOA | SOA Patterns | Software Architecture