As promised, here is the first pattern. If you like this pattern but
you think there is something missing to gain better understanding
please drop me an email: arnon at rgoarchitects.com . Naturally any
other comments are also welcome :)
Getting
an SOA right is very hard, not so much because of the technical problems (we
know how to deal with those, don't we?), but rather it is very hard to figure
where to put the borders and keep the right business alignment. Assuming you somehow managed that, the real
fun begins - you now have to produce reports, dozens and dozens of reports.
Many reports will fall within the boundaries of single services (if you have a
good partition), however many reports will also require adding data from
several services. For example, in a Telco scenario, you may have a Customer,
Billing and Provisioning Service (a real-life example would have dozens of
additional services) now a customer is calling customer care and you want the
CRM to show everything about the customer what outstanding invoices does she
have, what equipments and services (GPRS, UMTS, friends and family etc.) she
got, what her status as a customer (loyal , VIP, senior citizen …) open service requests etc. Things get much
more complicated when you need to summarize or group data from multiple
services
How do you get a decent cross business entities report with
the data scattered about in all those services?
One
possible solution would be to create the report at the consuming end (e.g. UI)
visit all of the services involved then do all the grouping, cross-cuts etc.
This solution is not very good from the performance perspective (you need to
get more data then needed and you have to post-process it). It is also
problematic from the flexibility perspective each service involved has to
expose interfaces to get the data for the specific query (otherwise you
mobilize even more data).
Another
option is to go straight to the data, you may still need to hit multiple
database servers to get to the data but the performance will be better. The
problem is this is throwing your service boundaries down the drain and
introducing a lot of dependency.
A
third is to create interim Services ("Entity Aggregation") - this
works fine as long as you have real business reasons to do the aggregations
(there is an overhead with adding business logic to handle the aggregated data)
and as long as you only have few of those
(or you might end up with a single "service" with all the
business).
Create an Aggregated Reporting Service by building an Operational Data Store (ODS) to enable
creating sophisticated reports on otherwise dispersed data

The
ODS is similar in concept to a data mart e.g. data is subject based,
integrated, scrubbed etc. However, the
main differences are that the data is up-to-date and that there is little or no
history.
For
incoming data the Aggregated Reporting Edge performs the data transformations
from contract data into reporting data. The service updates the ODS by
scrubbing the data (can be limited unless the data has to go to a data mart /
data warehouse) and then integrating it and De-normalize into subjects. Incoming report request fill parameters for
the pre-prepared reports.
One
problem with Aggregated Reporting is that it is not a Business Service (i.e. it
is a technical solution rather than a business oriented one) - however since
unlike Entity Aggregation the data in Aggregated Reporting is Read-Only this doesn't affect
the business.
Aggregated
Reporting is easier to implement when combined with Inversion of Communication
Aggregated reporting with Data Mart/Data Warehouse
Instead
of just storing recent operational data, this version enhances the depth and
complexity of queries that can be executed against the service. The downside is
the increased complexity in setting up the data mart - both from the
operational costs perspective (e.g. additional storage) and from the design and
development perspective (you need to think about long term aspects, indexing
etc.) as you also need to scrub data and consider the structure of your schemas
much more carefully.
Sidebar:
Operational Data Store (ODS)
The
ODS is probably the best kept secret of data warehousing technology. It has
been around almost as long but it isn't as famous.
The
data in the ODS is operational - live data and not static data. The ODS can be
thought of the as the cache memory of the data mart / data warehouse.
It
is important to note that while it doesn't need the same amount of planning and
set-up as a data mart, an ODS still requires careful planning in order to bring
real business value.
The
figure below shows the classical usage of an ODS in an OLTP/Data Mart
environment.

Originally
it was thought there would be 4 types of ODS
Class
I - Near Real-Time synchronization of the ODS with operational data from the
OLTP databases. an implementation of
Class I is the preferred type for the Aggregated Reporting pattern
Class
II - Update the ODS every four hours or so
Class
III - Overnight updates of the ODS
Class
IV - the ODS is updated from the data
mart / data warehouse
In
reality there are more variants - for example a powerful (and complex to build)
option is to merge a Class IV ODS with one of the other Classes and get.