Hi all,

I have a csv that has over 800,000 records that gets loaded into my db via SSIS package routines. However, there are about 190 records that I do not want to be loaded into the the database. The csv comes from an external source and is unaltered at package runtime. In order to not load these records, I made a table to control load variables but I'm not sure how to use it nor do I know if it is the right approach. Basically my thoughts are:`

 if a record identifier from the csv file matches the record identifier from the Exclusion table, 
 exclude it, 
 else load db. 

If someone can set me straight on a proper approach I would truly appreciate it.

There are several approaches you could take. It all depends if you want to exclude the rows before the load. Given the small proportion of rows to exclude, I would suggest you just go ahead and load the entire CSV into the database, then as a next step, join that table to your exclusion list and delete.

The statement might look something like this:

DELETE x
FROM dbo.my_Loaded_CSV_Data_Table x
INNER JOIN dbo.my_Exclusion_Table y
on x.myTableId = y.excludeId

That way you don't take the performance hit of doing a lookup inside the SSIS package.

commented: Awesome suggestion, thank you so much! +2

Thank you so much BitBit. I'm going to give your suggestion a go and see the logic of the statement. I will be forever learning and I am so glad that there are sites like this to aid one's learning.

Glad to help. Happy coding!

Hi BitBit, I know this is just a staging table but I am not wanting to delete rows that shouldn't be deleted. I've used the following to select the rows in the staging table that shouldn't be there

select  x.vSeries_Geography,x.vSeries_Type
from dbo.tblCPIStaging x
right outer join dbo.tblVSeries_Exclusion y
on x.vSeries_Geography=y.Exclusion_GEO
and x.vSeries_Type = ltrim(rtrim(y.Exclusion_Type))

Does this seem right? If I understand my revision, I am selecting all geography and type records from tblCPIStaging that match the geography and type records from tblVSeries_Exclusion. If I use an inner join, it seems that the same number of rows are selected as well.

So, do you suppose the following would work right?

DELETE x
FROM dbo.tblCPIStaging x
INNER JOIN d bo.tblVSeries_Exclusion y
ON  x.vSeries_Geography=y.Exclusion_GEO
AND x.vSeries_Type = ltrim(rtrim(y.Exclusion_Type))

Well, I went for it and everything appears to be good.

Yes, you did great! The good news is that since it was a staging table, even if you'd completely trashed it, you could still just reload from the original source.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.