I am new to python programming and struggling with a problem I would like help with. I have multiple text files that I would like to join using the first column in each file to serve as the key to align the files. Each file could be several hundred lines long. The files SHOULD have the same number of lines. The first line in each file can be omitted in the output file. The may be extra text (Possible superstructure of XX) in one or both files which can be omitted. The first character in the first column can be dropped as well.
File 1 looks like
>SU
>PD-98059 PD-98059 Tanimoto from SU = 0.129213
>BML-265 BML-265 Tanimoto from SU = 0.163743
>BML-257 BML-257 Tanimoto from SU = 0.156627
>SU 4312 SU 4312 Tanimoto from SU = 1
Possible superstructure of SU
>AG-370 AG-370 Tanimoto from SU = 0.264286
>AG-490 AG-490 Tanimoto from SU = 0.347826
File 2 looks like
>GF
>PD-98059 PD-98059 Tanimoto from GF = 0.118483
>BML-265 BML-265 Tanimoto from GF = 0.164179
>BML-257 BML-257 Tanimoto from GF = 0.213904
>SU 4312 SU 4312 Tanimoto from GF = 0.436364
>AG-370 AG-370 Tanimoto from GF = 0.284848
>AG-490 AG-490 Tanimoto from GF = 0.307692
The output file including headers would look like
ID SU GF
PD-98059 0.129213 0.118483
BML-265 0.163743 0.164179
BML-257 0.156627 0.213904
SU 4312 1 0.436364
AG-370 0.264286 0.284848
AG-490 0.347826 0.307692
At this point I would like to join this output file and add a third column and header. I will need to repeat this process many times building a large text file with the number of columns equal to the number of lines. I am trying to build a distance matrix for another application. I hope someone can find this a challenge and offer a solution. Any help will be appreciated.