IBM, SDOs, WPS and SOA Hell
Sunday, June 10th, 2007I get a lot of emails about my critique of IBM’s Websphere Process Server, mostly along the lines of this email:
I’ve sent this email to you a couple of months back but didn’t get any reply. just thought would try my luck again…
we’ve made some progress with WPS and identified what areas are less risky so we can use at least some of the product features. we’re now building a prototype that uses mostly human tasks and some processes wrapped around them and we expose this all via webservices. there’s some pain around versioning, BPEDB queries, security, etc but at least we’re making a progress and early indicators seem to be on a positive side. we have had a couple of IBMers on site for a week and they helped us somewhat to rejig the design and focus on the features that seem to work and would add most value
still would be extremely interested to find out if you have done more with the product since you blogged about it
Well, obviously for competitive and disclosure reasons I can’t spill the beans on where my current employer is, what they are doing or why they are doing it. With those hand-cuffs in full public display, I will effort a more general entry about the state of affairs of this odd stack IBM has built.
In order to understand IBM’s approach to Websphere Process Server, you have to understand SCA/SDO, the unholy combination that seems to be driving the SOA mythology as of late.
About SDOs
There are currently three different implementations of the SDO spec, including Apache’s Tuscany (the one I’m cheering for), IBM’s from their EMF project for Eclipse and the interesting EclipseLink from Oracle and Interface21 (when those two get in a room together you should listen). EclipseLink is more of a consequence of combining SDOs and JPA, however.
I’m extremely excited about SDOs in general, to the point that sometimes I sit on my deck and dream about them. I even write presentations about their architectural implications. SDOs are awesome things.
They are, in essence, disconnected DataGraphs of DataObjects from some source. This source could be relational databases, entity EJB components, JCA, XML pages, Web services, or a combination of them all.
They can be acted upon and changed, updated and deleted, transformed, even serialized and sent to other objects and then connected back to the original data source with a ChangeSummary() to boot. For a more detailed understanding, check out IBM’s Introduction To Service Data Objects.
This is what they do in a nutshell:
1) You have data in any form from anything: Databases, XML, text, anything (or a combination of them).
2) You feed that data in to a Service Data Object.
3) Things can get pretty loose in there, so you can impose an XSD to define the types or just create the model on your own as you feed data in.
4) You disconnect the SDO from it’s source.
SIDEBAR: Last year I wrote a design pattern for SDOs in Eclipse’s UML2 framework that you can import in to Rational Software Architect.
Now you can do anything with this thing. Change the data inside, add more things to it, remove things from it, serialize it and send it across the wire, query the data inside of it, introspect it, combine it with JPA and tell it to save itself away from it’s source, anything.
Imagine the implications.
Design Strategy: Using SDOs with POJOs
No longer do objects have to have a strong contract, or any contract at all besides taking a type of DataObject. Your Business Objects can introspect the object itself to see if there is data inside of the DataObject it requires to do a task, do work on your behalf, update the DataObject’s DataGraph with new information (or replace existing information), and pass that SDO right back. The calling object can ask for a ChangeSummary() to see what occured to the data in that object, act on that data, and send it right along again to another object. When you are finished passing it around, it can immediately turn back in to the XML or database row it was loaded from.
Your Business Objects can change, the data inside your DataObject can change, or be a totally different DataObject all together. As long as the Business Object can query to see if the data it cares about is in there nothing about the contracts between your objects need change. Further, because of the API that comes with the SDO spec, your object doesn’t really need to know anything about the DataObject it’s feeding data into or getting data from. It can work with the part of the DataGraph it needs in a hands-off way and be done with it.
Think of it from a automobile insurance Use Case:
Say that you had a document based webservice that took an automotive insurance policy. This automotive policy would have an XSD that ensured it contained the data necessary for a complete policy. That XML document, once received, would be loaded in to an SDO with the XSD which would “strongly type” the data inside the DataObject that was generated.
Now, imagine this policy was sent to the webservice for a claim to be issued. You could pass this new SDO in to a “claims” system to be processed. The contract for the Claims system is just an SDO (type DataObject). The Claims system would do a simple XPath query, or do a get() on some data that it expected any DataObject that would come in to the system to have (remember, the data inside does have querable types because it was loaded with an XSD). Technically, it could even introspect the SDO DataGraph to find the data it needed. Whatever.
It would probably want to get Customer Information, Units Insured, ect. It would then do it’s own thing looking up the coverages, the limits and going about process to start a claim.
After processing, it could then append it’s update to the SDO by inserting data back in to the DataObject. The DataObject would then be sent back to the webservice, which would change the SDO back in to the XML document (again, the XSD ensures compliance) and sent back to the ESB or some other thing, but now with information on the claim that was created for it.
Now, remember that the Claims system doesn’t really know or care that the SDO is a policy. In fact, the SDO really has no type at all. You could have policy information, a recipe for Apple Pie and an iTunes playlist in the SDO. As long as the Claims system can get the right data from the SDO, it’ll work.
You could just as easily send the Claims system an SDO that only has the data necessary for issuing a claim. You could also change the Claim system so that it requires more information from the SDO, or writes additional information to the SDO (say requirements change).
None of this need impact the contract or the SDO that gets sent to the system.
So essentially, this is what an SDOs used with POJOs alone gives us:
- Objects are loosely coupled
- Object’s data can be discovered at runtime through query
- Object’s contracts are data based, no types or tightly binding interfaces
- Object is protected from contract and interface changes in system
- Objects can be placed anywhere - maximum reuse
- Object can modify it’s types and data structure during runtime
- Object has no restriction on the data it can share
- Object is decoupled from the format and source the data came from
- Object automatically can roll back to the previous data state or act on changes
This is just the impact SDOs have on sharing data between objects. SDOs have much more up their sleeves, including transformations and mapping.
It’s easy to see how this would fit in nicely to the theory of an SOA architecture. If you could just push around self-contained datagraphs of dataobjects, disconnected for their source, and transform them, map them together or change them in to other types of data, a lot of the complexity of dealing with data in an SOA would be solved… or at least abstracted.
SDOs and Websphere Process Server
It’s helpful to keep in mind that all workflow engines (WPS, JBoss jBPM, BlueSpring’s BPM) is a movement of the business logic and process flow from inside the code to outside, and that can only mean better flexibility for organizations going forward, even if current architecture design patterns regarding Business Objects have to change and give up their power or expose it more freely for orchestration (Fowler be damned).
The most intriguing thing about WPS, or any modern workflow engine, is the idea of visually managing the flow and business logic of services in an SOA. WPS promises to separate this orchestration from any code (save the code that it generates) and mix in human tasks to boot.
Workflow tools can be great when you have an Enterpise Service Bus or other solution where you have data flows coming at you from different technologies and different protocols that you want to feed in to a workflow. When you pull in a webservice or other Type that WPS supports in to it’s grahical tool, Websphere Integration Developer, it essentially reverses these data streams or objects in to strong types that are represented visually so that you have common base on which you can orchestrate, transform, and map the data between the various other imported flows. It’s easy to see how it leverages SDOs to do this, as it’s essentially what SDOs are best at. Mapping between these things and orchestrating them are easier if you have an abstract representation of the service or object.
The idea of building an entire enterprise from the ground up with this product is intriguing. However, I doubt that any project already in motion could manage to migrate from business logic in code to business logic in a product without significant rework, and the consultants specializing in WPS (and they are starting to emerge) seem to only have experience starting from scratch and in projects with human interaction. Regardless, this orchestration is most likely where WPS has the most benefit. It’s idea of transformation and consuming of the documents in the mediation itself is where things break down.
In Websphere Process Server, IBM essentially decided to use SDOs to parse and transform any data that comes in to their system. They coupled this with their previous Websphere Business Integrator product (WBI) for business orchestration of the processes that use these SDOs, stapled on an ESB implentation, stuck a feather in it’s cap and called it Maccaroni.
A product being a webservice de-seriailizer, data mapper and transformer, workflow tool and process manager is what ultimately makes Websphere Process Server the strage beast it is, and is also a major source of it’s problems.
In the work I’ve have done with WPS from a purely transformation and webservice flow standpoint, I’ve have found it to be slow but it works. There has been significant problems with the limited amount of flexibility WPS has in processing XML, managing messages and executing processes. IBM pitches that with WPS/WID business analyst or architects would be the ones that would do the mapping and process orchestration using visual tools. Often, however, we have been forced to fall down past the level of abstraction that WPS provides down to the Java code itself, which that is a sign of, if not failure, then immaturity.
Also, Websphere Process Server’s IDE where this mapping takes place visually, Websphere Integration Developer, is (since it is built on top of the Rational/Eclipse platform) hard for non-programmers to grasp; the idea of having to switch perspectives in Eclipse is foreign and unhelpful.
Ye Ol’ Impedance Mismatch
Unfortunately, even when you do go down to the Java level, things aren’t easy.
There are many times that what a WSDL specifies (compliant XML mind you) as a certain “type” and what “type” WPS renders for it are totally incompatible. For XSD:ANYTYPE, it dies flat out. However, so does any WSDL2Java tool. To combat this, code that was using the Collections framework has to be migrated to fixed length arrays so that WPS can interpret the output, as it sees any List as “AnyType”
Certainly, this type mis-match is experienced whenever one jumps from objects to documents in representing data (JSON anyone?), but that just points to the enormity of the problem WPS is trying to solve, and how trying to reverse XSDs without developer intervention is painful and filled with problems.
My Update
So, my update is that it still is not delivering on it’s promises, yet. However, I have heard from IBM that version 7, based off the Rational 7 (Eclipse 3.2) platform should be out in August / September.
I hope the reply was worth the wait.
