I’ve talked to many developers who have recently started doing some SharePoint programming and have gotten hooked. In fact, they like SharePoint development so much, especially the event receivers, that these developers are starting to think about using SharePoint as a central clearinghouse for document processing.
SharePoint event receivers allow developers to programmatically interact and alter data before or after the data is saved into, updated in, or removed from SharePoint. Event receivers can work on list items as well as documents, including Office documents.
A world of possibilities
Combined with SharePoint’s metadata storage and workflow capabilities, you can imaging some pretty compelling applications built using event receivers. For example, you could create some code to run as a document is saved into SharePoint. If the document is in a draft state according to its metadata, the code would alter the document to include a “Draft” watermark right in the contents of the document. Once the metadata changes to a published status, the code could remove the watermark and also save a PDF copy of the document.
What not to do
A common implementation strategy is to use a SharePoint event receiver or workflow to invoke the Word object model using the office primary interop assemblies (OPIA). Then using the OPIA add or remove the watermark or otherwise after the contents of the document according to the metadata associated with the document.
This is a bad architecture.
For details on why this does not work, please see my post: “AnyCPU, x86, x64 – What’s the Difference?” To summarize that post, when SharePoint is installed on a 64-bit operating system, the various SharePoint processes are running in 64-bit mode, therefore your assemblies will be running in 64-bit mode. A 64-bit process can not load a 32-bit DLL. Office is a 32-bit application and so are the primary interop assemblies most .Net developers use to automate Office. So you will receive a BadImageFormatException when trying to invoke any of the Office applications from your SharePoint code when SharePoint is running on an x64 operating system.
Beyond the issue of x86 vs. x64 interoperability, there are other reasons to avoid this software design. First, to save money many developers will implement the software in a way that requires installing Office on the SharePoint web front end servers. That’s a very bad idea, period. Security, threading, and DCOM issues will make the software a nightmare to maintain.
Even if the approach functions on the developer’s x86 development SharePoint machine with a single user and the test documents are 20 KB, what will happen in a production environment where there are thousands of users and 10 MB documents? What happens when 20 documents per minute are being checked in and need to be processed?
A better idea
Don’t get me wrong. I am not saying that you should not try to write an application that does document processing using the SharePoint and Office technology stacks. Document processing applications can serve to increase the data quality of documents and increase worker productivity by reducing the labor involved in document maintenance. Just don’t use the approach outlined earlier. Use the shared a services approach instead.
Document processing as a shared services
The concept of shared services was introduced in SharePoint Portal Server 2003 (SPS) and greatly expanded upon in Microsoft Office SharePoint Server 2007 (MOSS). In a nutshell, the bulk of SharePoint is content driven. Shared services provide data storage and processing that are available globally to the SharePoint farm and are not tied to a particular SharePoint content repository.
Content indexing is a good example. The content indexer is a process that runs outside the scope of any SharePoint web application. The data the indexer stores is not user content and so does not go into any SharePoint content database. The data the indexer collects and processing the indexer performs is available to all SharePoint web applications in the farm, and potentially to other farms.
I am advocating that the document processing operations described earlier should be treated the same way. Sure the documents themselves are user content and reside in a SharePoint content database, but the process of adding a watermark to or removing a watermark from a Word document has nothing to do with a SharePoint web application and thus should not be performed in an event receiver, or from anywhere within the w3wp.exe process.
How do you Easily create a SharePoint Shared Service?
There are a couple of constraints our solution needs to adhere to in order to function properly. The solution needs to be able to:
- detect document changes
- process document changes
- process long-running operations
- support both x86 and x64 SharePoint installations
- support both XML (.docx) and legacy file formats (.doc)
Detecting document changes is fairly easy. Use a list item event receiver to catch the changes.
Processing the document changes, especially changes to large documents can be more involved. If you throw in the requirement to support legacy Office document formats and x64 installations, there seems to be only one way to do it: create a Windows Service (can’t use a timer job) that is compiled with the x86 option. See my post on the differences between AnyCPU, x86, and x64 compiler options for details on why we need to be explicit about the target platform. This service can reside on one of the SharePoint application servers in the farm, on on a server that is outside of the farm. How?
Create a web service interface that accepts document processing requests and triggers the processing service to do its work. The web service doesn’t actually do any work, it simply accepts documents as input and drops them into a folder on the file system that the processing service is watching. The web service must be compiled with the AnyCPU option so that it will run properly on IIS on both x86 and x64 systems. Because the document acceptor web service and the document processing Windows service are communicating out of process, the web service can run as an x64 process for compatibility with x64 IIS, and the Windows service can run as an x86 process for compatibility with Office.
That reminds me, Office will need to be installed on the server that is running the document processing service. While it is easy to work with the XML Office file formats without Office, working with the legacy file formats either requires that you have Office installed or that you use third party middleware.
Putting everything together
What does the final solution look like and how does it operate? The list item event receiver detects a change to a document that needs processing. The event receiver invokes the document acceptor web service running on a SharePoint application server or on an external server. The document acceptor service receives the file to be processed and save the file to a file system folder. The document processor service sees the saved file on the file system and processes the file. Once processing is complete, the document processing service saves the file back to the SharePoint document library.
For a more in-depth discussion of using Shared Service architectures in a SharePoint environment, join me at the Best Practices Conference. I will be covering this and related topics in my presentation: Using Service Oriented Methodologies to Create SharePoint Products.