Standing on the shoulders of giants. RSS 2.0
# Monday, April 28, 2008

Recently a client wanted to import binary files with metadata into SharePoint 2007 from a directory. Because BizTalk was available and I suspected there would be some scaling issues if I would go the custom NT service route, I decided to prototype the solution in BizTalk first. The prototype was finished in just 2 days, which is a lot faster than a custom solution which is as scalable and reliable.

Customer question

The customer wanted to import scanned documents and metadata from their OCR scanners into a SharePoint document library. A scanned document consists of a PDF file and an XML file with the OCR results, both files have the same name (with a different extension) and are uploaded to a fileshare.

Solution

The solution consists of a BizTalk server which uses a File Receive Adapter to monitor a specific directory for a combination of a PDF and XML file, those are combined in an orchestration and send to a document library using the WSS Adapter.

Import Process 

The BizTalk solution consists of several pieces:

  1. A schema to describe the received XML file,
  2. An orchestration, to receive the files, create the new message and send that to the document library,
  3. A pipeline component to rename the received files (and 2 pipelines to host it),
  4. A custom component to encode the values send in the metadata.

The schema

The schema can be any valid XML-schema, but to make it easier to add the properties to the WSS message I set the type of the nodes to string and promoted them. Note that if some of the fields are nullable, we have to take care when accessing those.

The orchestration

The orchestration consist of 3 steps:

  1. Receiving the file,
  2. Creating the new message,
  3. Sending the message.

Step 1: Receiving the file

 receive

There are 2 receive ports, one for each of the file types, both match to a receive action and are able to start the orchestration. Between the receive actions exists a correlation, which correlates the received messages on their (normalized) filename. The filenames are normalized using a custom pipeline component, see below. Only when both ports have received a file the next step of the orchestration is started.

Step 2: Creating the new message

Create message

Creating the message consists of 2 steps to make it easier to organize the code. In the first step, "SetWssProperties", a global variable is filled with a string containing an XML node with the custom properties send to SharePoint. All values are encoded using a custom component, the PropertyEncoder, see below.

WssProperties = "" 
+ "Team" 
+     ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.Team ) + "" 
+     "Client" 
+     ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.Client )+ "" 
+     "Document_x0020_type" 
+     ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.DocumentType )+ "" 
+    "Gross_x0020_amount" 
+     ""+ Paulb.PropertyEncoder.Encode(InvoiceXml.fields.TotalAmount) + ""; 
if( false == System.String.IsNullOrEmpty(InvoiceXml.fields.InvoiceDate)) {
      WssProperties = WssProperties + "Invoice_x0020_Date" 
      + ""+ InvoiceXml.fields.InvoiceDate + ""; 
}

WssProperties = WssProperties +    "Invoice_x0020_Number" 
+     ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.InvoiceNumber )+ "" 
+    "ScanID" 
+     ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.ScanID )+ "" 
+    "Vat_x0020_number" 
+     ""+ Paulb.PropertyEncoder.Encode ( InvoiceXml.fields.VatNumber )+ "" 
+ ""; 

In the second step, the incoming PDF message is copied to the outgoing WssMessage and it's properties are set. The filename is encoded using the custom ProperytEncoder component to meet the requirements for a filename in a SharePoint Document Library.

InvoicePdfMetaData = InvoicePdf;
InvoicePdfMetaData(WSS.Filename) = "FLT" + Paulb.PropertyEncoder.EncodeFileName(InvoiceXml.fields.ScanID) + ".pdf";

InvoicePdfMetaData(WSS.ConfigPropertiesXml) = WssProperties;

Step 3: Sending the message

Send Message

Sending the message is the final and simplest step: create a Send object and link it to the port. The configuration of the port is done using BizTalk Administration Console after deploying the solution to the BizTalk server.

The Pipeline

The solution involves 2 receive pipelines, one for each of the filetypes (xml and pdf). It is necessary to create 2 pipelines because the pipeline for the xml file requires the XML disassembler to recognize the schema and match it to the receive action. Both pipelines contain the custom FixFileName component, which normalizes the filename for the incoming file (it removes the path and the extension).

The actual work is done in the Execute method of the IComponent interface, the remaining interfaces either must be implemented, but are not used (IPersistPropertyBag) or are used by the designer environment (IComponentUI). The IBaseComponent interface supplies basic information about the component, such as Name and Version.

IBaseMessage IComponent.Execute(IPipelineContext pContext, IBaseMessage pInMsg)
{

    // make sure we have something to work with
    if (pInMsg == null)
    {
        return pInMsg;
    }

    string receivedFileName = pInMsg.Context.Read("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties") as string;
    if (string.IsNullOrEmpty(receivedFileName))
    {
        // nothing we can do
        return pInMsg;
    }

    string newFileName = Path.GetFileNameWithoutExtension(receivedFileName);

    pInMsg.Context.Promote("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties", newFileName);

    return pInMsg;
}

One thing to remember, is to Promote the property you're writing otherwise you're not able to use it for routing.

Another thing about pipeline components is: BizTalk expects them to be deployed to a specific location on the filesystem, the Pipeline Components directory of the BizTalk installation directory (the default location is: C:\Program Files\Microsoft BizTalk Server 2006\Pipeline Components).

FixFileName.cs

Custom component for encoding properties

The component for encoding the properties is a standard .NET assembly. It is not necessary to implement any interfaces, all assemblies work when they are referenced in the BizTalk orchestration project. The only requirement is that the assembly is deployed to the GAC.

This project required some additional encoding of the XML values, beside the normal HTML encoding, to encode the characters used by BizTalk macros and to strip characters which are illegal in filenames in a document library.

PropertyEncoder.cs

See also:

Information about convoys

Developing pipeline components

Monday, April 28, 2008 10:25:18 PM (W. Europe Daylight Time, UTC+02:00)  #    Comments [0] - Trackback
BizTalk | Development | Sharepoint
Comments are closed.
About
© Copyright 2009
Paul van Brenk
Sign In
newtelligence dasBlog 2.3.8275.16006
All Content © 2009, Paul van Brenk
DasBlog theme 'Business' created by Christoph De Baene (delarou)