Working with MongoDB and SSIS

A little over a year ago I saw a presentation on NoSQL (i.e. Not Only SQL), and specifically MongoDB. MongoDB is an open source document oriented database, and like many things related to NoSQL it seems to have a buzz about it. In this blog post I will demonstrate using SSIS and MongoDB.

The Basics

I am not going to detail installing MongoDB and related drivers, the MongoDB already does fair job and I recommend following their documentation. For the purposes of this demonstration I installed MongoDB and the C# drivers.

Once the MongoDB prerequisites have been installed, create an SSIS package and add a Data Flow. In the data flow I added a flat file source that will extract a list of countries from a CSV file. Then add a Script Component and select “Destination” as the Script Component Type.

Data Flow

Next open the script editor and add the following references and usings:

  1. MongoDB.Bson
  2. MongoDB.Driver

The Code

I created a Country class as part of my ETL, but this is not necessary and I would refer one to the C# driver tutorial for other approaches. In a pretty concise manner, I was able to connection the server, database and add the country data.

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

using MongoDB.Bson;
using MongoDB.Driver;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

    public override void InputCountry_ProcessInputRow(InputCountryBuffer Row)
    {
        var connectionString = "mongodb://localhost";
        var mongo = MongoServer.Create(connectionString);

        var db = mongo.GetDatabase("MyMongoDb");

        MongoCollection<Country> countries = db.GetCollection<Country>("Countries");

        var country = new Country { CountryCode = Row.CountryCode, CountryName = Row.CountryName };

        countries.Insert(country);
    }

}

public class Country
{
    public string CountryCode { get; set; }
    public string CountryName { get; set; }
}

In this demonstration I showed how to use SSIS and create an ETL with MongoDB. This turned out pretty easy as I was able to download, install, and create this proof of concept on a Sunday. Not to mention juggling other weekend chores.

Posted in SSIS, Uncategorized | Tagged , | Comments Off

Using Windows Azure Marketplace Data with SSIS

In this blog post I will describe using Microsoft Windows Azure Marketplace Data and SSIS. The Azure Marketplace allows one to buy and sell Software-as-a-Service applications and datasets. Most of the services are available for a subscription fee, but there are also several offered for free.

For the purposes of this demonstration I will use data from the Practice Fusion Medical Research Data, which is available for free. This data is available for research purposes and contains HIPPA compliant de-identified data. I have a personal interest in Autism and want to direct some after-work energy seeking out and analyzing Autism data. Unfortunately what I extracted from the sample dataset thus far does not have any Autism related data. Just the same it has been a good exercise in doing ETL with data services.

Add variables

Variables

Add three variables to your package. One is for the account key and one for the User ID, which can found in the My Account section. The third variable is the service root. This can be found in the details section of the service one is subscribing to.

Add Data Flow & Script Component

Add a Data Flow to a package. Inside the Data Flow, add a Script Component and select Script Component Type of Source.

Add Script Component

Add the appropriate columns you want to extract from the service.

Add Columns

Add a service reference and use the Service Root URL.

Add service reference.

Consume the Data with Managed Code.

At this point the data can be queried with a Linq expression and output rows generated. For more information see the MSDN documentation on creating application with Azure Marketplace Data.

using System;
using System.Collections;
using System.Data;
using System.Linq;
using System.Net;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

using SC_c0e4421be5a54826af3fe68d3227f2be.PracticeFusion;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
        var serviceUri = new Uri(Variables.ServiceRootUrl);
        var context = new PracticeFusionMedicalResearchDataContainer(serviceUri);

        context.Credentials = new NetworkCredential(Variables.UserID, Variables.Key);
        var diagnosis = context.SyncDiagnosis.ToList();

        foreach (SyncDiagnosis d in diagnosis)
        {
            OutputDiagnosisBuffer.AddRow();
            OutputDiagnosisBuffer.DiagnosisGuid = d.DiagnosisGuid;
            OutputDiagnosisBuffer.PatientGuid = d.PatientGuid;
            OutputDiagnosisBuffer.ICD9Code = d.ICD9Code;
            OutputDiagnosisBuffer.DiagnosisDescription = d.DiagnosisDescription;
            OutputDiagnosisBuffer.StartYear = (int)(d.StartYear > 0 ? d.StartYear : 0);
            OutputDiagnosisBuffer.StopYear = (int)(d.StopYear > 0 ? d.StopYear : 0);
            OutputDiagnosisBuffer.Acute = d.Acute == true ? true : false;
        }
    }
}

In this blog post I demonstrated how to consume data made available using the Window Azure Marketplace. I see this as useful where one would like to extract external data for the Business intelligence needs.

Posted in Business Intelligence, Data Integration, Service Oriented Architecture, SQL Azure, SSIS, Uncategorized | Tagged , , , , , | Comments Off