Member Avatar for kurogumo

Trying to import a flat file into SQL Server using SSIS.

The file format is like this:

Image: C:\folder1\1234\5678.jpg

ColHead1 ColHead2 ColHead3
Data1    Data2    Data3

I need a result that looks like this:

ColHead1|ColHead2|ColHead3|ColHead4
-----------------------------------
5678    |Data1   |Data2   |Data3

So the idea is that we extract "5678" from the header portion of the flat file, and put it into the resulting SQL table.

Any ideas?

SSIS tends not to do well with multi-format input lines from text files. It can be done, but frankly if this is a one-time load it's not worth the effort. You should just manually change the file to incorporate the additional column and be done with it.

However, if this is something where you have lots of files in the same directory that all need to be loaded periodically? Different story. And are the formats of the files consistent? If so, you can use this little trick: when you are creating your Flat File Connection, in the Flat File Connection Manager Editor, there's this handy little item in the "General" area called "Header rows to skip". This will allow you to bypass the funky stuff at the beginning when you're ready to get the tabular data.

So, inside a "for each loop" container that iterates through the correct files in your directory, create a Script task to read the first line of the file and parse out the value for the first result column, then use your skip-lines flat file connection manager to then read through the rest of the file, skipping the appropriate number of lines to get to the meat of the data. Use the populated variable from above to set the value for the first column and you're all set.

Hope this gives you enough to go on!

Member Avatar for kurogumo

Thanks for the response.

Yeah, the idea is that there are lots of files using this style of formatting that need to be imported periodically. They are in a consistent format, so each file will have the same number of lines in the header with a lot of tab-delimited data following it.

I somehow managed to pull in all the data from the skip-lines flat file connection manager, but I'm having trouble getting the "5678" out.

My idea was to basically use 2 flat file connections, one that grabs the data skipping the header, and the other loading the header in to the rows and using a script component to extract the header data. After that, do a union all on the resulting sets, but its not working at the moment.

Do you have a better approach to this?

The only thing I would change is use the script object to open the flat file directly to get the "5678" rather than use a second flat file connection, populate a variable with the "5678" for inclusion when you get the tab-delimited data, then put it into the destination table during a single iteration of the "foreach loop" rather than try to union everything at the end.

Does that make sense? It sounds more convoluted than it actually is.

Member Avatar for kurogumo

Not really. ;)

What objects would you create?

Script Task -> Destination

Or

Flat File -> Script Task -> Destination

I'm confused as to how you read in the header data first, then switch to the tab-delimited read-ins.

Script Task -> Variable to get the first row of the file from the Foreach into a string (use OpenFile, FileGet to get the first row)

Variable + Flat File -> Destination to get the tabular data

Member Avatar for kurogumo

Thanks for you help again.

Do you have some code sample maybe that I can go by? I'm not exactly sure how to use the Script Task to extract the name to a variable.

Script Task -> Variable to get the first row of the file from the Foreach into a string (use OpenFile, FileGet to get the first row)

Variable + Flat File -> Destination to get the tabular data

Assuming you have set up the variables "SourceFileName" and "FirstRowOfData" already, and assuming that "SourceFileName" gets populated by your "Foreach Loop Container", use the following construct to get the "SourceFileName" value for use in your script:

dim myValue as string
myValue = CType(Dts.Variables("SourceFileName").Value, String)

Then, open the file and set Dts.Variables("FirstRowOfData").Value equal whatever you read out of the file. That should do the trick.

<edit>
Oh, almost forgot...you have to include Microsoft.SqlServer.Dts.Runtime in your imports at the beginning of your script.

Good luck
</edit>

Member Avatar for kurogumo

Would you happen to know C#? The scripts right now are written in C#.

Alas, I do not.

Member Avatar for kurogumo

Alas, I do not.

Can't ask for too much. ;)

Ok how about this, lets try step by step.

Should I start with making a variable or a script task? If its a script task, should it be a source/destination/transformation?

Let's get even more basic. What version of SQL Server Integration Services are you using?

Member Avatar for kurogumo

Microsoft SQL Server Integration Services Designer
Version 10.0.2531.0

This is what it says in Visual Studio.

The of the software includes Windows Server 2008 R2, along with SQL Server 2008 Standard Edition.

Let's get even more basic. What version of SQL Server Integration Services are you using?

Hm...then I don't quite understand your question. A script task is a script task. It's just script.

Create the ForEach Loop Container. Be sure under the "Collection" category, set the Enumerator type to "ForEach File Enumerator". Set the source folder and the file spec (e.g. "*.txt"). Under the "Variable Mappings" category, create the variables I said from above (User:FileName. You can make them Package level or scope them to the ForEach Loop Container, it doesn't matter. I'm not at work and I don't have my tools, so I'm kind of going by memory. Anyway, drop a Script Task into the ForEachLoopContainer. Leave it alone for now. Now drop a DataFlow task into the ForEachLoopContainer. Click on the Script Task, then drag the GREEN arrow until it snaps to the DataFlow Task. Now you have it set up so you can use the Script task to open the file, read the first line, populate the variable, then pass control to the Data Flow task. Inside the Data Flow task, you have to configure the Connection Manager to use the FileName variable (the iterator for the ForEachLoopContainer) to populate the connection manager. If you already know how to do this, great. If not, I'll post it later.
Anyway, next drop a DerivedColumn task in, connect the FlatFileConnectionManager to it, open it up, select the columns, then add a new column based on the Variable you populated in the Script task. Then create an output connection manager to point to your target database, connect it up and voila! you're done.
Sorry if it's not quite clear at the end...it's getting late. If you are still confused, post again and I'll try to clarify.

Member Avatar for kurogumo

Inside the Data Flow task, you have to configure the Connection Manager to use the FileName variable (the iterator for the ForEachLoopContainer) to populate the connection manager. Anyway, next drop a DerivedColumn task in, connect the FlatFileConnectionManager to it, open it up, select the columns, then add a new column based on the Variable you populated in the Script task. Then create an output connection manager to point to your target database, connect it up...

I'm up to here now. I don't know how to configure the connection manager to support the FileName variable.

Another problem I'm having is the first part where I have to extract the word out of the script. Would it be easier for you if I tried it writing the extraction code in VB? Right now, my script is doing nothing.

Member Avatar for kurogumo

Actually, can I update the structure of the flat file so its more clear for you?

Its more like this:

Begin Header
[tab]JunkJunkJunk
[tab]Image File[tab]C:\Path\To\CodeToExtract.jpg
[tab]JunkJunkJunk
End Header
Begin Raw Data
[tab]ColHead1 [tab]ColHead2 [tab]ColHead3
[tab]Row1Data1[tab]Row1Data2[tab]Row1Data3
[tab]Row2Data1[tab]Row2Data2[tab]Row2Data3
[tab]Row3Data1[tab]Row3Data2[tab]Row3Data3
End Raw Data

Quite ugly. So what I need to do is turn that into this table in a database:

NewColHead|ColHead1 |ColHead2 |ColHead3
-------------------------------------------
CodeToExtract|Row1Data1|Row1Data2|Row1Data3
CodeToExtract|Row2Data1|Row2Data2|Row2Data3
CodeToExtract|Row3Data1|Row3Data2|Row3Data3

I hope that helps a bit more.

I'm up to here now. I don't know how to configure the connection manager to support the FileName variable.

Sorry for the delay, here's what you're looking for:
After you've created your Flat File connection manager, you'll need to go to the Properties and expand the Expressions line. There should be nothing in it, so click on the little build button on the right to create a new one. When the dialog pops up, click in the data area of the "Property" column and select "ConnectionString" from the drop-down list. Then, in the Expression column, click on the little build button to get the Expression Builder dialog. Expand the list of Variables and find your variable name that you use as your FileName variable. Drag it to the "Expression:" edit area. It should come out looking like
"@[User::FileName]". Once you have that, then every time the ForEach loop iterates, it will replace the ConnectionString value with the new FileName value.
To prove to yourself it is working, you can go into the Script object, edit the script, and just hoist a MsgBox that displays the value of that variable. Once you have that, you can use that value to open files just like I said above. Use this syntax:

public void Main()
        {
            // TODO: Add your code here
            MessageBoxButtons buttons = MessageBoxButtons.YesNo;

            string message; 
            message = Dts.Variables["FileName"].Value.ToString();
            string caption = "Here is my File";
            DialogResult result;
            result = MessageBox.Show(message, caption, buttons , MessageBoxIcon.Question, MessageBoxDefaultButton.Button1);
            Dts.TaskResult = (int)ScriptResults.Success;
        }

So, the important coordination points:
Connection Manager -- Expressions, ConnectionString
ForEach Loop Container -- Variable Mappings
Script Task -- ReadOnlyVariables
Inside the Script -- see the code snippet above to get the file name.

Now, once you have the file name, you'll have to do your C# magic to open the file, read as far as you need to get the "Image" line, then the rest should work as planned.

Whew! I hope this solves your need. I'm getting tired! :)

commented: Great stuff. ;) I hope you get more rep points. ;) +1
Member Avatar for kurogumo

Thanks BitBlt. I got it solved yesterday, but I will take a look at your code to see if its similar. ;) All the discussions we had previously gave me some direction so I managed to figure it out from there. ;) I'll mark this as solved. ^^

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.