January 5, 2009
@ 08:02 PM
We are going to use some of our test code in production. Yes you read it right test code in production. Here are the details
In our system, among other things, we support visual search in video calls. i.e. an end user calls the system, points the camera at something she is interested, and (hopefully :) ) gets relevant information. Basically the system is made of several resources (image extraction, identification etc.) that collaborate via an event broker. We have a blogjecting watchdog that makes sure everything is up and running and we have applicative recovery service to handle failures.
The watchdog makes sure resources/services are up, resources report their liveliness and wellness so we know more about the resources than the fact that they are up. However, we still need a way to make sure that resource instances  can collaborate to provide the service.

Enter our automated acceptance tests. Part of our development effort included building a test runner for automated tests scenarios, e.g. load tests, verifying algorithms correctness etc. One of these tests is the smoke test (run after each successful build) which includes a sunny-day scenario of a video call- as described above. What we're going to do now is build on the test runner and the sunny day scenario a "keep-alive" tester that will periodically make test calls to the system (depending on the current load etc.) and make sure that everything is still working correctly.


So there you have it, an unexpected benefit of automated acceptance tests, who would have thunk it :)



 
Tuesday, January 06, 2009 11:26:17 AM (GMT Standard Time, UTC+00:00)
Reminds me of "Routine Audits" described in "Patterns for Fault Tolerant Software" by Robert Hanmer. Is the "keep-alive" tester built in as a part of the system or externally deployed, more or less acting as scripted user? If external, how and to where does it report failures that the service itself might not have detected? Does the "keep-alive" tester initiate any error handling/correcting activities?

PS. Looking forward to your book, hoping it's like the next step after having read "Enterprise Integration Patterns"

/Kristofer
Kristofer
Saturday, January 10, 2009 11:08:27 PM (GMT Standard Time, UTC+00:00)
I didn't read Robert's book (added to my to read list :) ) so I don't know if it is similar.

As for our tester. It is built as a scripted user, however it also listens (subscribes) to the events in the system so it can verify the different resources operate correctly. The first version will just notify an administrator. Later we can make it integrate with the Recovery service

Arnon
Comments are closed.