How can I do a bulk insert of rows but only if specific combinations of columns don't already exist?

Those columns are not unique to the table. (Otherwise I could easily use INSERT IGNORE). Rather, I just need this to happen for one particular SQL query.

As far as I know you cannot do this with a bulk insert, only a regular insert.

How could I do it with a regular insert?

Download SQLDeveloper (its free; approx. 2GB).
From there, create connection to DB, connect and workbench will open.
Try from there

Some background information:

This is going back a few years so I'll have to dig. In my former life I set up a series of maintenance scripts to manage our EMS/SCADA data bases. I had to bulk insert about 8000 records once an hour. The unique key consisted of a combination of station name, point name and point type and new points were only occasionally added. For space and efficiency reasons I generated a unique point id (numeric) for each point and kept the original key in a separate table. So every hour I had to process a new batch of points

  1. insert any new points into the pointid table (generating new IDs)
  2. create a new bulk insert text file with the original key replaced by the generated key
  3. bulk insert the new text file

I've looked around to see if I have a copy of the original code but I do not so I emailed a former co-worker to see if I can get a copy of the code (the last I heard it was still in service after more than 15 years). If I hear back I'll post the code.

Download SQLDeveloper (its free; approx. 2GB).

That's for Oracle. I'm using MySQL. I don't need an IDE though. I consider myself pretty advanced with MySQL. Just stuck on this one problem.

Jim ... no code necessary, I think I follow your logic. I think you might be misunderstanding me though, because my problem wouldn't have adding a second table as a solution. So basically right now I have a table with the columns:

id (PRIMARY, AUTO INCREMENT) | timestamp (DEFAULT CURRENT TIMESTAMP) | col_a | col_b | col_c

The table has id as a primary key and a regular index on (col_a, col_b). In one specific situation that will occur reasonably often (not via a cron job), I want to insert a row but only when (col_a, col_b) don't already exist. However, there is not a unique constraint on those columns, and there are plenty of other situations where duplicates are desired.

I could easily do this in two separate queries. First search for col_a and col_b being the values I want to add. If they don't already exist, do the insert in a separate query. I was hoping maybe I could get around it by doing something clever in a single query.

Keep in mind that as posted in my original question, I'm currently doing a batch insert. Let's say 20 rows. Instead of one batch insert of 20 rows, I would now have to do something like this:

  • Attempt to retrieve all 20 rows that have the column constraints I'm looking for
  • Loop through the results and determine which ones exist
  • Do a batch insert of the ones that are left

As I said, I was hoping there was some clever SQL solution that allowed me to do it all inside MySQL instead of having to break it out into two separate SQL queries with PHP logic inbetween.

It's possible the capabilities of bulk insert have improved since I last used them (remember I retired in 2008) but back then all you could really do was dump records into the table(s) without any bells and whistled. Sorry I couldn't have been of more help.

FYI, the query to insert only new values was

   'I know the vbCrLf is not required at the end of each line but it makes it
   'so much easier to see what's going on when debugging.

   query = " insert into Analog_Key (                                      " & vbCrLf _
         & "   Station_Name,                                               " & vbCrLf _
         & "   Device_Type,                                                " & vbCrLf _
         & "   Point_Name,                                                 " & vbCrLf _
         & "   Point_Type)                                                 " & vbCrLf _
         & " select                                                        " & vbCrLf _
         & "   WorkingTable.Station_Name,                                  " & vbCrLf _
         & "   WorkingTable.Device_Type,                                   " & vbCrLf _
         & "   WorkingTable.Point_Name,                                    " & vbCrLf _
         & "   WorkingTable.Point_Type                                     " & vbCrLf _
         & " from                                                          " & vbCrLf _
         & "   WorkingTable                                                " & vbCrLf _
         & " where not exists (                                            " & vbCrLf _
         & "   select * from Analog_Key                                    " & vbCrLf _
         & "     where                                                     " & vbCrLf _
         & "       Analog_Key.Station_Name = WorkingTable.Station_Name and " & vbCrLf _
         & "       Analog_Key.Device_Type  = WorkingTable.Device_type  and " & vbCrLf _
         & "       Analog_Key.Point_Name   = WorkingTable.Point_Name   and " & vbCrLf _
         & "       Analog_Key.Point_Type   = WorkingTable.Point_Type       " & vbCrLf _
         & "    )                                                          "

The above query relies on a previous step in which the raw data was bulk loaded into an empty working table.

The engineer who got my job when I retired (and who has moved up the corporate ladder since) was kind enough to send me my old code. I was pleased to see it is still in service and still unmodified since I left.

If you are doing some sort of script that outputs values for an insert, eg...

insert table
(
    a,
    b
) --start script -->
values
(
    1,
    2
),
(
    3,
    4
)....

then your options are likely limited.

Even if that was the case, or you have another setup in mind, if I were to tackle this, I would probably insert into a temp table, then do an insert where I join against my target table excluding the value combinations I don't want.

Otherwise, I may not be understanding what you are trying to do :-/

You can create stored procedure eg

delimiter $$
drop procedure if exists `check_before_insert`$$
create procedure `check_before_insert`(
    in p_param_1 varchar(30)
    ,in p_param_2 varchar(30)
    ,in p_param_3 varchar(30)
) begin
declare v_exists boolean default false;
select t.`p_field_1` = p_param_1 into v_exists from `some_table` t where t.`p_field_2` =  p_param_2;
if v_exists = false then
    insert into `some_table`(
        `p_field_1`
        ,`p_field_2`
        ,`p_field_3`
    ) values (
        p_param_1
        ,p_param_2
        ,p_param_3
    );
    commit;
end if;
end; $$
delimiter ;

and then call procedure for insert if not exist

call `check_before_insert`('some_value_1','some_value_2','some_value_3');
Member Avatar for Geek-Master

Dani, you might try using a combination of AndrisP's solution with an INSERT BEFORE trigger. The trigger would call his sproc and check each row before it was added for the specific column combination. Once the trigger is in place you only need to use a single query which would satisfy your origianl specification to use the bulk insert.

-Daniel

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.