Few months ago I wrote here about solving the mismatch between Service
Oriented Architecture (SOA) and Business Intelligence (BI) (see
papers and articles section). Recently I got the following question from Ben:
One
major question I have is around large data sets. As an experienced
BI/DW architect and developer I have worked on a number of large scale
data warehouses. Retrieving large data sets (i.e. millions of records)
doesn't seem to fit well into SOA. As you state in your article, we
could have another point-to-point interface, where the service which
houses data we need gets a request and writes out a batch file (xml or
plain ascii text). Then using typical ETL, we grab the file and load
it. The underlying source system (service) can use optimization in
generating a large data set (vs. record by record) and
the data warehouse can correspondingly load in bulk.
Like most architectural questions - the answer is "it depends"
For
instance, if you do a run-of-the-mill ETL as a on-time setup then it is
just that- a one time setup and I, personally, don't see any
contradiction between SOA goals or tenets and that.
I do think
that iit is better to enhance SOA with EDA interactions to provide a
long term solution to the BI problem. You can also have a dedicated
component that aggregated the information that flows in in these events
and builds batch files that are suited for the ETL you've used during
the setup phase (mentioned above).
It is true though that moving an
SOA which is already in-place to EDA is not a small feat, but adding
EDA layers does not have to mean that the old interfaces go away -
especially not immediately (remember to
treat services as products)
If
you have a business that generated millions of records on a daily basis
- then the situation is more complicated. Now you have to think about
the trade-offs between "compromising" SOA and adding a dedicated
interface (or a backdoor to the database) for the ETL vs. the
implications of performance, bandwidth, transition costs, ROI etc. of
pushing that information with EDA.
I, personally believe in pragmatism and the "
no-silver-bullet" approach so I can't say that EDA is always the best solution (As an aside, this is part of the reason I write
my book
as patterns not as "best-practices guidance"). You may find that ETL is
the best trade off in your situation. Yes I know that it isn't a
definitive answer - but real life is (usually) a little more
complicated than black and white solutions. As architects we need to
find the best trade off for the situation at hand.