Tuesday, September 16, 2008

Enchancing ECM-SOA with Enterprise Content Bus

Other capabilities we are looking for in our SOA approach is to leverage an Event Driven Architecture (EDA) focusing on Document Oriented Messages (actually Content Oriented Messages) and the ability to choreograph many content oriented services. This is the job of a message broker, but more specifically an Enterprise Service Bus (ESB).

We wish to build configurations oriented to choreographing processes for:
  • Acquisition - acquire content from many sources and many modes (push, pull, scheduled)
  • Processing - transform, aggregate, index, repurpose, replicate content
  • Delivery - syndicate or publish in multiple formats over multiple channels
  • Integration - integrate with other information sources to enrich content
We believe to fully allow us to integrate with the rest of the enterprise, we will need the flexibility that an ESB brings to a SOA. It may be helpful to begin by trying to explain what an ESB is (at least from my perspective). To truly leverage a multitude of services distributed throughout the enterprise, we need to choreograph multiple services together in a process flow. These services may need to be accessed differently using REST, SOAP, JMS or FTP. An ESB centralizes integration and choreography of services and provides many capabilities to talk over multiple channels, route and process messages in many different ways, all mostly through configuration rather than code. Also an ESB can be considered as a container for services oriented toward integration such as routing and transformation. An ESB brings shared capabilities like security, transformations, routing, security, transactionality and high-availability so these cross cutting concerns can be applied uniformly and and not have to be custom developed. An ESB is a specialist in integration and choreography. It allows integration logic to move out of each client, service or service container and reside in a Service Oriented Integration container.

An example may help. A process flow in this case can be a pathway for content to travel from acquisition, to management, processing, to delivery. Each one of these stages in the content's life-cycle might require integrating with many distributed services. Suppose I am automating the process of acquiring word documents to index and post on an intranet.
  • For acquisition, i may want to poll an external ftp drive to see if new content has arrived
  • For processing, I may want to enrich this content with computed metadata and store it in the content repository for retension
  • I also I may want to transform this content into a different representation (i.e. word to PDF)
  • For delivery I may want to deliver this pdf to a website
  • Further, I want to update an index page to this document
One approach is to build this as a customization of a content manager. Most content managers offer a way to ftp content to their content repository. But to 'poll' an ftp site would require custom coding. Once the content is in the repository, a workflow or rules can be triggered that integrates automated actions and human tasks. The first step would be to create a PDF copy. Once a PDF copy is created, a job can be scheduled to index the PDFs as a web page index, and then publish the index and latest PDF files to the site.

Another approach would be to leverage an ESB. Leveraging an ESB may improve agility of developing these processes. For instance servicemix has pre-developed and configurable ways to 'poll' and 'send' messages via an ftp channel. One a file is found on the ftp via polling, Further using servicemix a route can be configured by using various Enterprise Integration Pattern implementations. In this case, a 'pipeline' can be configured to choreograph services:
  1. Store acquired word doc via content service
  2. Transform word to pdf via transformation service
  3. Store pdf via content service
  4. Generate index invoking a template service
Finally, the pdf and the index page can be stored to the site's file system via the file sender (another configuration).

Further benefits exist leveraging an ESB. Each step in this process is made via a SEDA architecture using durable queues. That means that if any of the choreographed services break, take longer than expected or otherwise behave unexpectedly, the process continues unabated. The ESB route can be configured to handle error cases such as a service returning an error result. The processing routes can be transactional, and roll back all changes if one step changes. The ESB can be clustered to support High Availability, and can be configured to route based on the type of processing. Further, the services choreographed can be accessed over numerous channels: REST (http), ftp, file, jms, jabber,SOAP (http). And modifying the process in many cases requires reconfiguring, not recoding. And the process configurations are not distributed in many services or clients, but centralized on the ESB.

While there are many ways to choreograph services, an ESB approach may improve agility by leveraging a set of configurable components specialized in integration, orchestration and choreography and that can speak many languages to different distributed systems. The purpose of the ESB is not to take over services from the content manager or other systems but to leverage them. Moving choreography of services to a specialist like an ESB removes the need to create a lot of custom scripting in a content manager which may not be as good at these tasks.

But existing ESB implementations are focused on choreographing messages, not content. Existing ESBs don't have configurable components around processing of content, and may not do well passing around large content. An ESB needs to be customized to manage content and provide configurable components to enhance content processing. Thus, we are constructing an ECB (Enterprise Content Bus) that builds these content centric capabilities on top of an Enterprise Service Bus.

Although not mentioned in the example, choreography using enterprise integration patterns provides a lot of flexibility in combining many services, but the addition of Business Process Management allows these services to be orchestrated according to configurable business processes, and is a great addition to the ESB. Simple processes from acquisition to management to deployment can be implemented via piplelines and wiretaps and content switches. But processing content often requires a business process that integrates invocation of services, integration of systems and human tasks, and provides the visibility into the content processing pipeline. (See following posts on our approach to integrating BPM to our ECB.)

The details to follow in the next post...

No comments: