Recently a client wanted to import binary files with metadata into SharePoint 2007 from a directory. Because BizTalk was available and I suspected there would be some scaling issues if I would go the custom NT service route, I decided to prototype the solution in BizTalk first. The prototype was finished in just 2 days, which is a lot faster than a custom solution which is as scalable and reliable.
Customer question
The customer wanted to import scanned documents and metadata from their OCR scanners into a SharePoint document library. A scanned document consists of a PDF file and an XML file with the OCR results, both files have the same name (with a different extension) and are uploaded to a fileshare.
Solution
The solution consists of a BizTalk server which uses a File Receive Adapter to monitor a specific directory for a combination of a PDF and XML file, those are combined in an orchestration and send to a document library using the WSS Adapter.
The BizTalk solution consists of several pieces:
The schema
The schema can be any valid XML-schema, but to make it easier to add the properties to the WSS message I set the type of the nodes to string and promoted them. Note that if some of the fields are nullable, we have to take care when accessing those.
The orchestration
The orchestration consist of 3 steps:
Step 1: Receiving the file
There are 2 receive ports, one for each of the file types, both match to a receive action and are able to start the orchestration. Between the receive actions exists a correlation, which correlates the received messages on their (normalized) filename. The filenames are normalized using a custom pipeline component, see below. Only when both ports have received a file the next step of the orchestration is started.
Step 2: Creating the new message
Creating the message consists of 2 steps to make it easier to organize the code. In the first step, "SetWssProperties", a global variable is filled with a string containing an XML node with the custom properties send to SharePoint. All values are encoded using a custom component, the PropertyEncoder, see below.
WssProperties = "" + "Team" + ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.Team ) + "" + "Client" + ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.Client )+ "" + "Document_x0020_type" + ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.DocumentType )+ "" + "Gross_x0020_amount" + ""+ Paulb.PropertyEncoder.Encode(InvoiceXml.fields.TotalAmount) + ""; if( false == System.String.IsNullOrEmpty(InvoiceXml.fields.InvoiceDate)) { WssProperties = WssProperties + "Invoice_x0020_Date" + ""+ InvoiceXml.fields.InvoiceDate + ""; } WssProperties = WssProperties + "Invoice_x0020_Number" + ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.InvoiceNumber )+ "" + "ScanID" + ""+ Paulb.PropertyEncoder.Encode( InvoiceXml.fields.ScanID )+ "" + "Vat_x0020_number" + ""+ Paulb.PropertyEncoder.Encode ( InvoiceXml.fields.VatNumber )+ "" + "";
In the second step, the incoming PDF message is copied to the outgoing WssMessage and it's properties are set. The filename is encoded using the custom ProperytEncoder component to meet the requirements for a filename in a SharePoint Document Library.
InvoicePdfMetaData = InvoicePdf; InvoicePdfMetaData(WSS.Filename) = "FLT" + Paulb.PropertyEncoder.EncodeFileName(InvoiceXml.fields.ScanID) + ".pdf"; InvoicePdfMetaData(WSS.ConfigPropertiesXml) = WssProperties;
Step 3: Sending the message
Sending the message is the final and simplest step: create a Send object and link it to the port. The configuration of the port is done using BizTalk Administration Console after deploying the solution to the BizTalk server.
The Pipeline
The solution involves 2 receive pipelines, one for each of the filetypes (xml and pdf). It is necessary to create 2 pipelines because the pipeline for the xml file requires the XML disassembler to recognize the schema and match it to the receive action. Both pipelines contain the custom FixFileName component, which normalizes the filename for the incoming file (it removes the path and the extension).
The actual work is done in the Execute method of the IComponent interface, the remaining interfaces either must be implemented, but are not used (IPersistPropertyBag) or are used by the designer environment (IComponentUI). The IBaseComponent interface supplies basic information about the component, such as Name and Version.
IBaseMessage IComponent.Execute(IPipelineContext pContext, IBaseMessage pInMsg) { // make sure we have something to work with if (pInMsg == null) { return pInMsg; } string receivedFileName = pInMsg.Context.Read("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties") as string; if (string.IsNullOrEmpty(receivedFileName)) { // nothing we can do return pInMsg; } string newFileName = Path.GetFileNameWithoutExtension(receivedFileName); pInMsg.Context.Promote("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties", newFileName); return pInMsg; }
One thing to remember, is to Promote the property you're writing otherwise you're not able to use it for routing.
Another thing about pipeline components is: BizTalk expects them to be deployed to a specific location on the filesystem, the Pipeline Components directory of the BizTalk installation directory (the default location is: C:\Program Files\Microsoft BizTalk Server 2006\Pipeline Components).
FixFileName.cs
Custom component for encoding properties
The component for encoding the properties is a standard .NET assembly. It is not necessary to implement any interfaces, all assemblies work when they are referenced in the BizTalk orchestration project. The only requirement is that the assembly is deployed to the GAC.
This project required some additional encoding of the XML values, beside the normal HTML encoding, to encode the characters used by BizTalk macros and to strip characters which are illegal in filenames in a document library.
PropertyEncoder.cs
See also:
Information about convoys
Developing pipeline components