Before I really dig in ----- HAPPY NEW YEAR EVERYONE!
A few of you may remember that I have been working on producing a large wood knowledge base (woodsoftheworld.org). It is totally none profit. I am getting much closer to having enough working that soon I hope to be able to declare it at version 1 stage.
I have a large respect on how powerful SQL can be but I still have difficulties to
get my statements right, undoubtedly syntax errors. Hope someone can help with
something a few may look at as elementary. This year I want to get better at writing successful statements. Some things are hard to find exactly what a person is looking for in books or online.
--------------------------
Back months ago, I was able to get a large list of existing woods (species) from the second largest wood collection in the world, the Tervuren Xylarium in the Netherlands, over 17,000 records. I record all botanical names of wood in the 'species' table. All tables used for the project are in the 'TAXA" database.
With the gracious help from others, I was able to import the refined Tervuren list without producing a huge number of duplicates, the new total being just over 15,000 woody species. That was a huge jump in the success of the project!
In fact, I have since proven that the species brought in were so numerous that the
genus table is now missing 669 new genera (genera is plural) that are in the species file.
For those that perhaps do not know, botanical species names are made of two parts, the genus and the epithet. For instance, White Oak has a botanical name of
'Quercus alba'. The 'Quercus' is the genus name, so even if there are some missing
genera, they can be generated out of the species name. The following script does that plus shows that all missing genera in the sci_genera table have no index values .. to be expected at this stage. So far, I have been able to list all the missing genera with this script:
<code>
SELECT DISTINCT
left(species_name,locate(" ", `species_name`)-1) ,`generaID`
FROM `species`
WHERE `generaID`=0
ORDER BY `genus_name`
LIMIT 0 , 5000
</code>
Now that I can have a list of all NEW species not yet reported in the genus table (sci_genera), I need to copy all of these into the genus table so it will be complete. I will expect that the insertion process will automatically generate the
generaID values.
I have tried different things but without required results. Last night when I tried to use a 'INSERT" style of script, it came disappointingly back gave an error saying the two files were of different lengths (end of effort). Now that leaves me wondering whether the best approach is the use "INSERT", 'UPDATE, 'ALTER'(less likely) some other command, or if there is an APPEND command good to use.
If I was to generalize the needed procedure, I need to copy over to a column in one table missing data that is embodied in another column in another table. ...... So, what is the best approach to do this? The parameters that should be needed include:
'species' - The table where all species are listed.
'species_name" - The column where all species are stored.
'sci_genera' - The table where the genera names are stored.
'genus_name' - The column where the genera names are stored ((and the target to
append the derived genera from the species listing.
Remember .... you cannot equate anything to the genus prime key (generaID) because they are not yet generated for the missing genera. I especially look forward to getting a working SQL solution :-).
(Later, once the new generaID values get generated, I will want to copy them back into the species table as a foreign key --- but lets not risk confusing things. That can wait for another day).
Much thanks. My efforts in creating the TAXA Wood Knowledge Base site would not be possible without the help of the programming forums I use.
Bill Mudry
Mississauga, Ontario Canada