Friday, November 12, 2010

Debatching a flat file with multiple lines per identifier

If you're new to debatching a flat file (i.e. you receive one flat file containing multiple messages, which you all have to process separately in BizTalk), there are plenty of good blogs around to get you started. One good example is this one: http://geekswithblogs.net/benny/archive/2006/02/05/68249.aspx. This blog describes how you can debatch a flat file where every line represents one message. Now let's take it a step further...

First let me describe the data for you: let's say it's someone's birthday and me and my friends want to give her presents. And obviously, being a good nerd, I have written an orchestration that processes the data (For what purpose? Use your imagination! I'm already happy I came up with this example!). This orchestration takes as input an XML message like this:

<Friend xmlns="http://ronaldlokers.blogspot.com/birthdaypresents">
    <Name>John</Name>
    <Presents>
        <Present>
            <Name>Flowers</Name>
            <Price>5.99</Price>
        </Present>
        <Present>
            <Name>Teapot</Name>
            <Price>11.99</Price>
        </Present>
    </Presents>
</Friend>
The input for my program

Now a friend of mine heard about my plans and to be helpful he started to collect the data in a flat file that looks like this:

Mary;Shawl;14.50
Mary;Headphones;12.99
Christian;Ring;99.50
John;Flowers;5.99
John;Teapot;11.99
Marcy;Red lipstick;4.99
Marcy;Purple lipstick;4.99
Marcy;Black lipstick;4.99
Flat file format 1

There are multiple lines per friend! So I can't use the basic flat file disassembler to convert this flat file into the XML I need!

Of course I could write a custom pipeline component that replaces the standard flat file disassembler component but another thought kept on spinning in my mind.. I could easily use the standard flat file disassembler component if my input file would look like this:

Mary|Shawl;14.50|Headphones;12.99
Christian|Ring;99.50
John|Flowers;5.99|Teapot;11.99
Marcy|Red lipstick;4.99|Purple lipstick;4.99|Black lipstick;4.99
Flat file format 2

So every present is separated by a '|', while the data of each present is separated by a semi-colon.

With this in mind, it was quite easy to write a custom pipeline component for the Decode stage of a pipeline. This component should read the flat file format 1, join the lines, and produce a flat file format 2 as I described above. Here is the code of the component as I wrote it. Just disregard all the 'component' stuff and scroll right down to the Execute method, where all the hard work is done.

using System;
using System.IO;
using System.Text;
using System.Drawing;
using System.Resources;
using System.Reflection;
using System.Diagnostics;
using System.Collections;
using System.ComponentModel;
using Microsoft.BizTalk.Message.Interop;
using Microsoft.BizTalk.Component.Interop;
using Microsoft.BizTalk.Component;
using Microsoft.BizTalk.Messaging;
using Microsoft.BizTalk.Component.Utilities;

namespace RonaldLokers.Blogspot.Com
{
    [ComponentCategory(CategoryTypes.CATID_PipelineComponent)]
    [System.Runtime.InteropServices.Guid("79e5a9b2-9abf-40d8-aa02-d4c15b218f88")]
    [ComponentCategory(CategoryTypes.CATID_Decoder)]
    public class MultiLineJoiner : Microsoft.BizTalk.Component.Interop.IComponent, IBaseComponent, IPersistPropertyBag, IComponentUI
    {
        private System.Resources.ResourceManager resourceManager = new System.Resources.ResourceManager("RonaldLokers.Blogspot.Com.MultiLineJoiner", Assembly.GetExecutingAssembly());


        private int _NumberOfKeyFields;

        public int NumberOfKeyFields
        {
            get
            {
                return _NumberOfKeyFields;
            }
            set
            {
                _NumberOfKeyFields = value;
            }
        }




        #region IBaseComponent members
        /// <summary>
        /// Name of the component
        /// </summary>
        [Browsable(false)]
        public string Name
        {
            get
            {
                return resourceManager.GetString("COMPONENTNAME", System.Globalization.CultureInfo.InvariantCulture);
            }
        }

        /// <summary>
        /// Version of the component
        /// </summary>
        [Browsable(false)]
        public string Version
        {
            get
            {
                return resourceManager.GetString("COMPONENTVERSION", System.Globalization.CultureInfo.InvariantCulture);
            }
        }

        /// <summary>
        /// Description of the component
        /// </summary>
        [Browsable(false)]
        public string Description
        {
            get
            {
                return resourceManager.GetString("COMPONENTDESCRIPTION", System.Globalization.CultureInfo.InvariantCulture);
            }
        }
        #endregion

        #region IPersistPropertyBag members
        /// <summary>
        /// Gets class ID of component for usage from unmanaged code.
        /// </summary>
        /// <param name="classid">
        /// Class ID of the component
        /// </param>
        public void GetClassID(out System.Guid classid)
        {
            classid = new System.Guid("79e5a9b2-9abf-40d8-aa02-d4c15b218f88");
        }

        /// <summary>
        /// not implemented
        /// </summary>
        public void InitNew()
        {
        }

        /// <summary>
        /// Loads configuration properties for the component
        /// </summary>
        /// <param name="pb">Configuration property bag</param>
        /// <param name="errlog">Error status</param>
        public virtual void Load(Microsoft.BizTalk.Component.Interop.IPropertyBag pb, int errlog)
        {
            object val = null;
            val = this.ReadPropertyBag(pb, "NumberOfKeyFields");
            if (val != null)
            {
                this._NumberOfKeyFields = ((int)(val));
            }
        }

        /// <summary>
        /// Saves the current component configuration into the property bag
        /// </summary>
        /// <param name="pb">Configuration property bag</param>
        /// <param name="fClearDirty">not used</param>
        /// <param name="fSaveAllProperties">not used</param>
        public virtual void Save(Microsoft.BizTalk.Component.Interop.IPropertyBag pb, bool fClearDirty, bool fSaveAllProperties)
        {
            this.WritePropertyBag(pb, "NumberOfKeyFields", this.NumberOfKeyFields);

        }

        #region utility functionality
        /// <summary>
        /// Reads property value from property bag
        /// </summary>
        /// <param name="pb">Property bag</param>
        /// <param name="propName">Name of property</param>
        /// <returns>Value of the property</returns>
        private object ReadPropertyBag(Microsoft.BizTalk.Component.Interop.IPropertyBag pb, string propName)
        {
            object val = null;
            try
            {
                pb.Read(propName, out val, 0);
            }
            catch (System.ArgumentException)
            {
                return val;
            }
            catch (System.Exception e)
            {
                throw new System.ApplicationException(e.Message);
            }
            return val;
        }

        /// <summary>
        /// Writes property values into a property bag.
        /// </summary>
        /// <param name="pb">Property bag.</param>
        /// <param name="propName">Name of property.</param>
        /// <param name="val">Value of property.</param>
        private void WritePropertyBag(Microsoft.BizTalk.Component.Interop.IPropertyBag pb, string propName, object val)
        {
            try
            {
                pb.Write(propName, ref val);
            }
            catch (System.Exception e)
            {
                throw new System.ApplicationException(e.Message);
            }
        }
        #endregion
        #endregion

        #region IComponentUI members
        /// <summary>
        /// Component icon to use in BizTalk Editor
        /// </summary>
        [Browsable(false)]
        public IntPtr Icon
        {
            get
            {
                return ((System.Drawing.Bitmap)(this.resourceManager.GetObject("COMPONENTICON", System.Globalization.CultureInfo.InvariantCulture))).GetHicon();
            }
        }

        /// <summary>
        /// The Validate method is called by the BizTalk Editor during the build
        /// of a BizTalk project.
        /// </summary>
        /// <param name="obj">An Object containing the configuration properties.</param>
        /// <returns>The IEnumerator enables the caller to enumerate through a collection of strings containing error messages. These error messages appear as compiler error messages. To report successful property validation, the method should return an empty enumerator.</returns>
        public System.Collections.IEnumerator Validate(object obj)
        {
            // example implementation:
            // ArrayList errorList = new ArrayList();
            // errorList.Add("This is a compiler error");
            // return errorList.GetEnumerator();
            return null;
        }
        #endregion


        #region IComponent members
        /// <summary>
        /// Implements IComponent.Execute method.
        /// </summary>
        /// <param name="pc">Pipeline context</param>
        /// <param name="inmsg">Input message</param>
        /// <returns>Original input message</returns>
        /// <remarks>
        /// IComponent.Execute method is used to initiate
        /// the processing of the message in this pipeline component.
        /// </remarks>
        public Microsoft.BizTalk.Message.Interop.IBaseMessage Execute(Microsoft.BizTalk.Component.Interop.IPipelineContext pc, Microsoft.BizTalk.Message.Interop.IBaseMessage inmsg)
        {
            StreamReader strIn = new StreamReader(inmsg.BodyPart.GetOriginalDataStream());
            MemoryStream memStr = new MemoryStream();
            StreamWriter strOut = new StreamWriter(memStr);


            // Read the first line of the input
            string line = strIn.ReadLine();
            // Set an arbitrary key that won't match the first key field
            string key = "#";
            while (line != null)
            {
                if (!line.StartsWith(key))
                {
                    // If this line doesn't start with the current key, start out by writing the key to the destination, and store the key in the 'key' variable
                    string[] parts = line.Split(new char[] { ';' });
                    key = parts[0];
                    for (int i = 1; i < NumberOfKeyFields; i++)
                        key += ";" + parts[i];
                    strOut.Write(key.Replace(";", "|"));
                    key += ";";
                }
                // Write the data of the non-key part of the line
                strOut.Write("|" + line.Substring(key.Length));


                // Read the next line from our source
                line = strIn.ReadLine();
                if (line == null || !line.StartsWith(key))
                {
                    // If we're at the end of the file, or the next line has a different key, go to the next line in our destination
                    strOut.WriteLine();
                }
            }
            strOut.Flush();
            memStr.Position = 0;
            pc.ResourceTracker.AddResource(memStr);
            inmsg.BodyPart.Data = memStr;
            return inmsg;
        }
        #endregion
    }
}

As you can see, there is one property called 'NumberOfKeyFields'. In my example, there is only 1 key field (the friend's name) but in case you would have a 'firstname;lastname;' key then by setting this property to 2, the MultiLineJoiner still does the work.

No comments:

Post a Comment