Monday, September 22, 2008

Enterprise 2.0

I recently attended a workshop sponsored by the Chicago chapter of AIIM about Enterprise 2.0. The audience was a typical spread of IT professionals interested in document management, Image scanning and management, records management and of course consultants looking to make contacts. My team, we numbered 4, were first time attendees and were interested in getting ideas relating to our web content management applications.

Two presentations were given: Jim Vaselopulos (VP and Partner of PSC Group) focused on the human perspective in collaboration, a key aspect of Enterprise 2.0, and Andrew MacMillan (VP of Product Mgmt at Oracle ECM (formally Stellent)) was focused on its technology aspects. Both presentations were compelling: Jim's presentation brought the point home that Enterprise 2.0 technologies were focused on enabling collaboration among content stakeholders, Andrew's presentation talked about the evolving nature of technology through web 1.0 to web 1.5 and web 2.0 like technologies. After the presentations, they were joined by others in a panel discussion.

The first question on the tips of participants was 'what was Enterprise 2.0?' This was answered variously as Web 2.0 technologies applied to the enterprise, and enablement of more ad-hoc collaboration to manage content (the 'adhocracy' as Jim put it). Web 2.0 brings to the table technologies like blogs, wiki's, messaging and tagging ('tagonomy' and 'folksonomy' were entered into the lexicon). Other capabilities of Web 2.0 like 'mashups' were also mentioned. These Web 2.0 technologies are user directed evolve in an ad-hoc manner. This differs from older technology enablers that are more directed, such as workflows and taxonomies that are defined by business analysts and direct collaboration.

My take... So what is E20? E20 is Web 2.0 for the Enterprise. This soundbite will help explain it to curious management. To technologists, it is using Web 2.0 technologies like wiki's, blogs and mashups evolving in the Internet to web based business applications behind the firewall. For ECM, E20 is the addition of pure web based services and collaboration tools to enable user directed, ad-hoc collaboration and use.

The next obvious question of the group was 'what does E20 mean to me?' The group varied in their sophistication and technical backgrounds, but a rift became obvious. This rift was pointed out very succinctly by Jim Vaselopulos in his talk. As he defined it, there are 'digital immigrants' and 'digital natives'. Jim looked at the generation gap between those who were born with digital technologies and those who were introduced to it much later. Their sensibilities were different. The 'digital native's' acceptance (and demand) for truly self controlled digital collaboration far exceeded the 'digital immigrant's' desire and tolerance for the same. From a generation's perspective, baby boomer types look at technology as an addition to traditional communication and formality, necessarily organized in hierarchies. From the perspective of the 'millenniums' (born 1980 - present), technology enables 'flat' and immediate collaboration and informality.

So, from the above distinctions, users may adapt to E20 differently. A pure 'adhocracy' where users have complete freedom to structure information and collaboration may be perceived as a threat to formality of policy and hierarchical organizations. A wiki for instance is controlled by no one. Everyone is a contributor and a consumer. There is no formal structure, there is no formal control. It is subject to the whims of the users. To digital natives (typified by the 20 somethings) this isn't a problem, this is essential. To digital immigrants (the gray hairs), this is an out of control mess. How can you manage important business information this way and expect the information and its structure to just 'evolve?' Playing off of the generation gap isn't fair or accurate, but it does represent the problem in some way.

The room tended towards the 'gray hairs.' Most in the room came from IT where information management policies were more formal, and information organization existed in well structured taxonomies. For instance, introducing a wiki to manage important business information was looked at suspiciously. 'How do we control it? How do we organize it?' they would ask. Our contingent tended toward the other perspective. To a digital native, this is the very point of a wiki: 'no one controls or manages it and everyone controls and manages it.' To the digital immigrant, this can only lead to a mess. To the digital native, control and formal organization limit usefulness and frustrate collaboration. To the digital immigrant, this ad-hoc nature would limit usefulness and frustrate collaboration. Clearly, different users have different perspectives.

In my company, we are wiki happy. Everything goes on the wiki. There are no controls. There is no formal organization. Its organization evolves. The integrity of the information is the responsibility of all. Occasionally, some refactoring has taken place. There is an element of security. Groups of wiki pages are organized into departments. Users within a department are free to contribute. Others may be restricted to viewing. Personal pages remain in control of the author but anyone can read. Inappropriate content would cause some to complain to our tools team. Although this has rarely happened. Organizational structure as evolved. It is still hard to find what you are looking for. Index pages are created occasionally to organize information into loose taxonomies but not as a formal exercise--someone gets tired of hunting for things and creates a page that organizes it.

For our company, this has worked for most people. But we have also recognized that others are not that comfortable. They have tasked to our internal tools team to reconstitute a more formal intranet. This intranet will be controlled and organized formally. Information will be authored by our tech writers at the direction of the management hierarchy. The very thought of this had caused a furor amongst some in our workforce. Some were vehemently apposed to this 'throwback' to the old. Some who were baffled and dismayed by the by the wild west of the wiki were relieved and fought vigorously for its return. It was an interesting example of the digital generation divide pointed out so deftly by Jim Vaselopulos. We should use this as an example that there are more viewpoints to serve. Perhaps the lessons learned are that Web 2.0 is not for everyone. Perhaps E20 adds elements of control and formality that businesses need. Perhaps we should take away from this effort that sometimes the wild west needs a sheriff.

Tuesday, September 16, 2008

Enchancing ECM-SOA with Enterprise Content Bus

Other capabilities we are looking for in our SOA approach is to leverage an Event Driven Architecture (EDA) focusing on Document Oriented Messages (actually Content Oriented Messages) and the ability to choreograph many content oriented services. This is the job of a message broker, but more specifically an Enterprise Service Bus (ESB).

We wish to build configurations oriented to choreographing processes for:
  • Acquisition - acquire content from many sources and many modes (push, pull, scheduled)
  • Processing - transform, aggregate, index, repurpose, replicate content
  • Delivery - syndicate or publish in multiple formats over multiple channels
  • Integration - integrate with other information sources to enrich content
We believe to fully allow us to integrate with the rest of the enterprise, we will need the flexibility that an ESB brings to a SOA. It may be helpful to begin by trying to explain what an ESB is (at least from my perspective). To truly leverage a multitude of services distributed throughout the enterprise, we need to choreograph multiple services together in a process flow. These services may need to be accessed differently using REST, SOAP, JMS or FTP. An ESB centralizes integration and choreography of services and provides many capabilities to talk over multiple channels, route and process messages in many different ways, all mostly through configuration rather than code. Also an ESB can be considered as a container for services oriented toward integration such as routing and transformation. An ESB brings shared capabilities like security, transformations, routing, security, transactionality and high-availability so these cross cutting concerns can be applied uniformly and and not have to be custom developed. An ESB is a specialist in integration and choreography. It allows integration logic to move out of each client, service or service container and reside in a Service Oriented Integration container.

An example may help. A process flow in this case can be a pathway for content to travel from acquisition, to management, processing, to delivery. Each one of these stages in the content's life-cycle might require integrating with many distributed services. Suppose I am automating the process of acquiring word documents to index and post on an intranet.
  • For acquisition, i may want to poll an external ftp drive to see if new content has arrived
  • For processing, I may want to enrich this content with computed metadata and store it in the content repository for retension
  • I also I may want to transform this content into a different representation (i.e. word to PDF)
  • For delivery I may want to deliver this pdf to a website
  • Further, I want to update an index page to this document
One approach is to build this as a customization of a content manager. Most content managers offer a way to ftp content to their content repository. But to 'poll' an ftp site would require custom coding. Once the content is in the repository, a workflow or rules can be triggered that integrates automated actions and human tasks. The first step would be to create a PDF copy. Once a PDF copy is created, a job can be scheduled to index the PDFs as a web page index, and then publish the index and latest PDF files to the site.

Another approach would be to leverage an ESB. Leveraging an ESB may improve agility of developing these processes. For instance servicemix has pre-developed and configurable ways to 'poll' and 'send' messages via an ftp channel. One a file is found on the ftp via polling, Further using servicemix a route can be configured by using various Enterprise Integration Pattern implementations. In this case, a 'pipeline' can be configured to choreograph services:
  1. Store acquired word doc via content service
  2. Transform word to pdf via transformation service
  3. Store pdf via content service
  4. Generate index invoking a template service
Finally, the pdf and the index page can be stored to the site's file system via the file sender (another configuration).

Further benefits exist leveraging an ESB. Each step in this process is made via a SEDA architecture using durable queues. That means that if any of the choreographed services break, take longer than expected or otherwise behave unexpectedly, the process continues unabated. The ESB route can be configured to handle error cases such as a service returning an error result. The processing routes can be transactional, and roll back all changes if one step changes. The ESB can be clustered to support High Availability, and can be configured to route based on the type of processing. Further, the services choreographed can be accessed over numerous channels: REST (http), ftp, file, jms, jabber,SOAP (http). And modifying the process in many cases requires reconfiguring, not recoding. And the process configurations are not distributed in many services or clients, but centralized on the ESB.

While there are many ways to choreograph services, an ESB approach may improve agility by leveraging a set of configurable components specialized in integration, orchestration and choreography and that can speak many languages to different distributed systems. The purpose of the ESB is not to take over services from the content manager or other systems but to leverage them. Moving choreography of services to a specialist like an ESB removes the need to create a lot of custom scripting in a content manager which may not be as good at these tasks.

But existing ESB implementations are focused on choreographing messages, not content. Existing ESBs don't have configurable components around processing of content, and may not do well passing around large content. An ESB needs to be customized to manage content and provide configurable components to enhance content processing. Thus, we are constructing an ECB (Enterprise Content Bus) that builds these content centric capabilities on top of an Enterprise Service Bus.

Although not mentioned in the example, choreography using enterprise integration patterns provides a lot of flexibility in combining many services, but the addition of Business Process Management allows these services to be orchestrated according to configurable business processes, and is a great addition to the ESB. Simple processes from acquisition to management to deployment can be implemented via piplelines and wiretaps and content switches. But processing content often requires a business process that integrates invocation of services, integration of systems and human tasks, and provides the visibility into the content processing pipeline. (See following posts on our approach to integrating BPM to our ECB.)


The details to follow in the next post...

ECM SOA standards with CMIS

Selecting the 'content services' layer -

One of the chief complaints has been the lack (or multitude) of multi-vendor standards for ECM that can fit into a SOA architecture. Recently, EMC, IBM and Microsoft have announced a new standard: CMIS that intends to address this. This standard will be submitted to OASIS for further work. In the mean time, several vendors such as Alfresco Oracle and others are developing implementations of this standard.

In my previous post, I described the components I thought were needed to support a fully functional SOA approach to ECM. Included was an item 'content services'. I believe that this standard fits well into this category. Other 'standards' also exist that we have used:
* WebDAV offers simple access to content and metadata over http.
* JSR 170 offers more sophisticated content model access for Java based clients.

Typically writing content aware applications involves customizing a proprietary content manager application using proprietary APIs and possibly scripting languages. If one is brave enough to attempt to develop an independent application that needs to integrate to a content repository (as we have been), only proprietary API's exist that lock the application into that vendor's content manager. Alternatively, most vendors support some level of WebDAV. So it is possible to develop a content aware application leveraging WebDAV, but different vendors support WebDAV to different levels of compliance. Also since there is no typing model, enforcing a content model standard is difficult. To overcome this in the past, we have developed a WebDAV client that adds typing capability. This effort drove us to look seriously at using JCR (JSR 170). Using JSR 170 means implementing a session based approach and, of course, locks you into Java. But it does offer rich content model semantics. However, its session based, its Java centric nature and its lack of support RESTful or SOAP channels makes it a poor choice for a SOA.

Since CMIS offers a stronger type model and can be developed in SOA friendly SOAP or RESTful manner, it seems it has the right stuff for a standard 'content services' layer to access the Content Repository. However, we need to leverage what exists now in our approach. Since we are a Java shop and will be leveraging one content vendor, JSR 170 still seems the best approach. For RESTful access strongly suggested at our organization, WebDAV may provide a stop gap. We will likely follow CMIS closely and prototype with it.

Next post, a discussion of the Enterprise Content Bus...

Monday, September 8, 2008

ECM-SOA with an Agile Attitude

In this entry, I will identify the attitudes and concepts that define our ECM-SOA vision. Next entries, I will dive into the technical selections and customizations.

The first challenge is to think of our tooling as not a custom application, but more as a set of adaptable services, applications and integrations. This requires a change of thought. Our previous efforts were to drop a monolithic application called a Content Manager into the middle of things, and then propose to change the business process around this application, ostensibly obsoleting the existing applications and ad-hoc processing to customizations within this new application.

During our previous attempt, we underwent a lengthy analysis phase and generated a 500 page requirements document detailing taxonomies, content types, work flows and templates that would solve our content management (web content management) needs. We then spent lots of time and treasure implementing these requirements. In the end, we built some of the requirements taking far longer and far more resources than anticipated, and we found that most the requirements and subsequently most of the customizations we built were wrong. The heroic content managers and brand managers made it work anyway, developing more ad-hoc, complicated and time consuming steps around yet another application that was supposed to help them. This story is not unique.

We must shift our processes as much as our technology. We are focusing on smaller efforts, more agility and more feedback and move away from 500 page requirements documents. To do this successfully, our architecture also needs to be agile and amenable to change. Our architecture must be a framework to grow on: to grow useful services, and to grow and integrate with useful applications. It must follow user demand that learns from using and refining our processes and tools.

At the core of our architecture is the Content Repository. This tool will store content through various stages of use. It must have versioning capability, search capability, auditing capability, it must provide a flexible capability to organize content into collections, and it must provide flexible meta data structures that can apply both simple and hierarchical views of content cross cutting storage structure.

On this repository, we must define a content model that is flexible and extensible. But the cost of changing and refining storage and meta-data structure must be tolerable. Our definitions must be incremental, careful not to lock ourselves into bad organization and not to anticipate too far in the future. Changes to the content model are always the must costly and must be considered with greater care. The analogy is to a database. Many applications can use the database, and it can store and retrieve a lot of information, but defining and changing its structure must be done with care. Eventually, all content must be converted and stored in the repository in order to maximally leverage the content throughout the organization.

On top of the Content Repository are a set of content services. These services allow applications to be quickly developed and adopted that leverage content and content features. These content services should be developed by leveraging standards including JSR 170 and WebDAV to make them accessible to existing and future application needs. Typically, a Content Manager user interface comes with the content repository that provides a clear view and access to the content. This application can be customized and with enough effort it can host all application needs, but we are weary of this monolithic approach. Current applications exist that may be more suited to business needs. They must be adapted at least to leverage a common repository by using these content services. But they will continue to supply unique views and functions. Later, application integration through the use of portalization may help to alleviate the chaos of multiple applications and views.

Accepting that content exists in many places and is required by many applications, we look to the concept of an Enterprise Service Bus. We will have many integrations to other applications, and content must be acquired, processed, and delivered in unique ways. An ESB allows the centralization of this integration and processing, reducing coupling to the rest of the organization and potentially increasing agility by being more adaptable to change. However, the concept of the ESB focuses on messaging, not content. We will adapt this to provide content processing capabilities and tag it as an Enterprise Content Bus (ECB). This ECB concept will evolve over time and will be detailed here over many posts.

Finally, choreographing acquisition, processing and delivery of content becomes a challenge. Both automated and manual steps must be organized into business processes that adapt and grow over time. Over engineering the business process can be just as damaging as not managing it at all. The content architecture must support a Business Process Management (BPM) applicaiton capable of orchestrating both manual and automated steps. It must be integrated well with the content management application and repository, and the Enterprise Content Bus (ECB). The use of configurable Business Rules Engine (BRE) will help define controls through the steps, enforcing standards, validating processing and ensuring high quality. This capability will allow content management to grow in a managed way over time, in a fully exposed and auditable way.

This architecture is a suite of application cababilities, each highly adaptable and customizable.

To sum up:
  • The Content Repository stores all content applying standard features
  • The Content Model enforces standards and increases usefulness across usages
  • The Content Services allows access to the content reposisotry by manay applications
  • The Content Manager provides a user interface to manage content
  • The Enterprise Content Bus (ECB) allows content to be acquired, processed and delivered
  • The Business Process Management (BPM) frawork allows processing to be orchestrated
  • The Business Rules Engine (BRE) inforces consistency and quality throughout processing
This is our ECM-SOA vision. How do we implement it? The next posts will tell..

Thursday, September 4, 2008

The answer is ECM-SOA

Of course the answer (to the previous post's concerns) is to try and leverage a Service Oriented Architecture as part of the ECM strategy. So that starts me on my way.

In order to understand my approach, let me share with you some articles and books that have guided me.

Some articles of interest focusing on ECM-SOA:
But more specifically, my guidance has been more general SOA/Web Services books and guides. is on what SOA is/isn't has been influenced by these books:

So what is the scope of our work?

The part of ECM we are building now is focused on Web Content Management. We have a common platform on which we can configure, skin and run many travel sites. We are a global company, and our sites must support many languages. The platform must host many brands so each site must be able to be skinned, configured and customized as if it was purpose built.

A key challenge for this is managing web content in multiple languages and for multiple brands. Each type of content we publish to the site needs to be translated to local languages and adapted to fit the brands. Each type of content is currently managed in unique applications, processed with a cadre of custom code and delivered in unique formats.

With all of this custom code, storage and formats, it is difficult to create functions that extend to many types of content, such as those to handle translation and branding. Our challenge is to unify content management and extend our capabilities to efficiently stand up new brands and languages without sinking the content managers and brand managers in a mire of complex, ad-hoc tools, technologies and practices. So our rally cry is "Stand up a new site in 1 day! (any language, any brand)." At least the tools shouldn't be a bottleneck!

Next, our vision...


Wednesday, September 3, 2008

ECM SOA Strategy

I am attempting to build a ECM-SOA strategy at my company. This blog will identify key issues we have run across and the solutions we are attempting. I will get very technical since I am the primary technologist for content technologies at my company, but I will also maintain a higher level focus on the business aspects of this initiative, since the two are inextricably tied together.

At my company, we have many applications that manipulate content, both for site management and for document management. Each application was built as a silo, unique technology to suit a unique job -- no sharing, no reuse. This is typical for many companies, whether content management applications or other types of applications. Service oriented architectures (SOA) try to address this issue by placing a service layer between applications that consume services and service provider applications, allowing many applications to leverage shared services. Implementing this forces the organization to think about standards and common facilities.

Enterprise Content Management (ECM) is a term referring to a strategy to manage content across the enterprise in a common way, leveraging common facilities. ECM covers types of content applications including Web Content Management (WCM), Document Management (DM), Records Management (DM), Digital Asset Management (DAM) and others. Companies like mine typically have all of these needs. And we have applications that address these needs, but, like many companies, they are composed of unique purpose built applications with no overarching strategy or technology. Also vast holes exist in this application web that necessitate ad-hoc approaches and manual intervention, costing in labor and quality. This silo approach is costly and limiting.

Vendors attempt to address this need by selling monolithic systems that replace these many purpose built applications with customizations and tailoring of their product. Aggressive implementations of these systems force the organization to change and adapt in order to be successful. Failure typically happens because the organization is unable or unwilling to make these changes, and the customization efforts of these products is vastly underestimated. Much is promised by these solutions, but little is delivered. Sometimes millions of dollars of 'shelfware' or hobbled and cobbled systems limp along adding little and causing frustration and empathy. This is nothing unique to ECM: try CRM, ERP and others. Consultants love them. Companies hate them. We can do better.

My approach is to be more evolutionary. We cannot rip out the many processes and programs that make up a web of content management. They work (more or less), and they focus on solving specific problems. There is a lot of knowledge embedded in each application. There is a lot invested in them: both mindshare and monentary. But change is forcing us to change. We have to improve our efficiency at the same time we expand to a global presence. We have to adapt our management systems to serve our new global platforms. In particular, site content management (Web Content Management and Digital Asset Management) must improve or we will sink under the weight of all the content management needs. In short, we have to change everything to realize our vision. But how can we be evolutionary and still change everything?

I'll leave with this cliff hanger for now...