Hello everybody,
so, I realized that when I try to load some XML into a MySQL table through the LOAD XML LOCAL INFILE ...
statement, if there is a parent with the same name of the child, MySQL will insert an extra row. Here's the example to reproduce the issue:
<fruit>
<fruit>
<name>Orange</name>
<variety>Valencia</variety>
</fruit>
<fruit>
<name>Apple</name>
<variety>Fuji</variety>
</fruit>
<fruit>
<name>Lemon</name>
<variety>Eureka</variety>
</fruit>
</fruit>
And the SQL:
CREATE TABLE IF NOT EXISTS `tmp_fruits`(
`name` VARCHAR(50) NOT NULL,
`variety` VARCHAR(50) NOT NULL
) ENGINE = MEMORY;
LOAD XML LOCAL INFILE 'data.xml' INTO TABLE tmp_fruits ROWS IDENTIFIED BY '<fruit>';
SELECT * FROM `tmp_fruits`;
I got four rows instead of three:
+--------+----------+
| name | variety |
+--------+----------+
| Orange | Valencia |
| Apple | Fuji |
| Lemon | Eureka |
| Lemon | Eureka |
+--------+----------+
4 rows in set (0.00 sec)
I think it happens because it matches the parent node (1) and the children nodes (3):
<fruit> <!-- parent -->
<fruit> <!-- child -->
<name>Orange</name>
<variety>Valencia</variety>
</fruit>
<fruit> <!-- child -->
<name>Apple</name>
<variety>Fuji</variety>
</fruit>
<fruit> <!-- child -->
<name>Lemon</name>
<variety>Eureka</variety>
</fruit>
</fruit>
How this could be solved? The XML is an example, I cannot change the parent to a plural <fruits>
at the origin, as it's generated by an external source and defined by a namespace, maybe I could alter the received file or extract those nodes, but I'm not happy with these solutions.
NOTE
ExtractValue()
works fine and matches only three records, example:
SET @xml_data := load_file('data.xml');
SELECT replace(ExtractValue(@xml_data, '//fruit/child::name'), ' ', ',') AS `name`
, replace(ExtractValue(@xml_data, '//fruit/child::variety'), ' ', ',') AS `variety`;
+--------------------+----------------------+
| name | variety |
+--------------------+----------------------+
| Orange,Apple,Lemon | Valencia,Fuji,Eureka |
+--------------------+----------------------+
1 row in set (0.03 sec)
So this is already something to consider. Currently I'm inclined to submit this to the MySQL bug list, but I would like to know what you think about it.
Thanks, for your time.