Wednesday, October 8, 2008

How to create SSIS Package


Importing data from Excel to SQL database using SSIS 2005 Package

Integration Services, which replaces Data Transformation Services (DTS) in SQL Server 2005, is a wonderful tool for extracting, transforming, and loading data. Common uses for Integration Services include: loading data into the database; changing data into to or out from your relational database structures; loading your data warehouse data; and taking data out of your database and moving it to other databases or types of storage. This article describes how you can use the new features of SQL Server 2005 Integration Services (SSIS) to load an Excel file into your database.
Note: There are several wizards that come with SQL Server Management Studio to aid you in the import and export of data into and out of your database. I will not look at those wizards; I will focus on how you can build a package from scratch so that you don’t have to rely on the wizards.
To begin the process, I open SQL Server Business Intelligence (BI) Development Studio, a front-end tool that is installed when you install SQL Server 2005. The BI Development Studio is a scaled down version of Visual Studio. Then I select New Integration Services Project and give the project a name.

When the project opens, you will see an environment that may look familiar to you if you have used SQL Server DTS; some of the items of the toolbox are the same. For the purposes of this project, I am interested in dragging the Data Flow task item from the toolbar into the Control Flow tab. (The idea of a Data Flow task is one of the major differences between DTS and SSIS packages. In an SSIS package, you can control the manner in which your package logic flows inside of the Control Flow tab. When you need to manage the data aspects of your project, you will use the Data Flow task. You can have several different Data Flow tasks in your project — all of which will reside inside the Control Flow tab.)


Double-click the Data Flow task that you have dragged onto the Control Flow tab. The available options in the Toolbar have changed; I now have available Data Flow Sources, Data Flow Destinations, and Data Flow Transformations. Since I am going to import an Excel file into the database, I will drag the Excel Source item from the Toolbar onto the screen.


The Excel Source item represents an Excel file that I will import from somewhere on my network. Now I need somewhere to put the data. Since my plan is to put the data into the database, I will need a Data Flow Destination. For the purposes of this example, I will choose SQL Server Destination from the Data Flow Destination portion of the toolbar and drag it onto my Data Flow tab.

To designate which Excel file I want to import, I double-click the Excel Source item that I moved onto the screen. From there, I find the Excel file on the network that I want to import.


I also need to designate the sheet from the Excel file that I want to import, along with the columns from the sheet that I want to use.



Now that I have defined my Excel source, I need to define my SQL Server destination. Before doing that, I need to indicate the Data Flow Path from the Excel file to the SQL Server destination; this will allow me to use the structure of the data defined in the Excel Source to model my SQL Server table that I will import the data into. To do this, I click the Excel Source item and drag the green arrow onto the SQL Server Destination item.


To define the database server and database to import the data, double-click the SQL Server Destination item. I will define the server in which I will import the data, along with the database that the data will reside.

I also need to define the table that I will insert the Excel data into. I will create a new table named SalesHistoryExcelData.

Under the Mappings section, I define the relationship between the Input Columns (the Excel data) and the Destination Columns (my new SQL Server table).

Once I successfully define the inputs and outputs, my screen will look like the one below. All I need to do now is run the package and import the data into the new table by clicking the green arrow in the top-middle of the screen, which executes my package.


The below figure shows that my package has successfully executed and that 30,000 records from my Excel Source item have been transferred to my SQL Server destination.


You can download the Excel file I used for this article.

Monday, August 25, 2008

Creating a SSIS Project and adding a Package

The starting point in creating a package is to create a Business Intelligence Project using the Integration Services Project - a standard Visual Studio 2005 installed template as shown in Figure 3. It has been given a name here, Editor Basics
Figure 3

This creates the EditorBasics project in the Solution Explorer as shown below. It comes with the folders, Data Sources, Data Source Views and SSIS packages. You could further expand this node to see its contents. By right clicking this node, you reveal the drop-down menu from which you can do a number of things. Click on the New SSIS Package menu item.
Figure 4

You could also create a new package as shown in Figure 5. This figure shows other details for the Package1.dtsx [Design] tab as well.

Figure 5

The Package consists of the following: Control Flow, Data Flow, Event Handlers, and Package Explorer. More items may show up during package development. Please read the grayed text Control Flow of the package, this explains how to configure this part. The pane in the bottom is where the connections are placed called the Connection Managers.

Figure 6

The Data Flow page shows the data flow tasks that are needed by the package. You can click on the link to add the Data Flow task(s) as shown below.

Figure 7


In the Event Handlers page, for each task configured you could attach an Event Handler (default OnError).

Figure 8

The Package Explorer is an explorer style list of all items in the package.

Figure 9


The basic steps consist of configuring the Control Flow and the Data Flow pages in the designer. The control flow configuration starts by first creating a table with the same schema as the Oracle 10 XE's "Departments" table using an SQL statement. Once this SQL procedure is created and executed, the table will be created (with no data) in the SQL 2005 Server's SsisEditor database.

The Data Flow is configured by configuring the OLE DB Editors for the two servers. However, for completing the configuration, all information including the tables on the two databases must be specified. And unless the empty table with the same structure as the oracle database is present in the SQL 2005 Server, the specification will not be met.

In order to complete the package, the Control Flow task is executed to create the table in the SQL 2005 Server and then the design of the Data Flow Task is completed. In order not to complicate the procedure this method has been used in the course of this tutorial.

Here is list of preparatory items that are needed for this project to succeed.

Both SQL Server 2005 and Oracle 10G XE should be functioning correctly.

SQL Server 2005 should be properly configured and tested.

Authentication and permissions should be in place for the objects accessed.

An Instance of VS 2005 should be available.

SQL Server 2005 being the destination, the database to which the table will be brought in should be in place by creating an empty database named SsisEditor

The TSQL Script to create a table in SQL 2005 server, which is a copy of the table fromOracle 10G XE, should be available or capable of being created in the IDE. For both Oracle and SQL 2005 server, OLEDB providers will be used.

Saturday, July 12, 2008