Hello experts,
I am trying to extract data from XML file using XSLT. I am trying to code a general XSLT code that can handle similar XML files that may differ a bit from each other.
the XML code I am working on can have the following 4 scenarios for the field [FUNCTION] (i am trying to extract [FUNCTION]). The [FUNCTION] may be in the middle or at the start or at the end. If I try to use tokenize with the delimiter ';', the problem is sometimes it is in between the [FUNCTION] statement as it is here,
<GBSeq_comment>On or before Feb 16, 2007 this sequence version replaced gi:121945493, gi:121751.; [B][FUNCTION] Facilitative glucose transporter. This isoform may be responsible for constitutive or basal glucose uptake. Has a very broad substrate specificity; can transport a wide range of aldoses including both pentoses and hexoses.[/B]; [SUBCELLULAR LOCATION] Cell membrane; Multi-pass </GBSeq_comment>
or
<GBSeq_comment>On or before Feb 16, 2007 this sequence version replaced gi:121945493, gi:121751.; [B][FUNCTION] Facilitative glucose transporter. This isoform may be responsible for constitutive or basal glucose uptake. Has a very broad substrate specificity; can transport a wide range of aldoses including both pentoses and hexoses.[/B]</GBSeq_comment>
or
<GBSeq_comment>[[B]FUNCTION] Facilitative glucose transporter. This isoform may be responsible for constitutive or basal glucose uptake. Has a very broad substrate specificity; can transport a wide range of aldoses including both pentoses and hexoses.[/B]</GBSeq_comment>
or
<GBSeq_comment>[B][FUNCTION] Facilitative glucose transporter. This isoform may be responsible for constitutive or basal glucose uptake. Has a very broad substrate specificity; can transport a wide range of aldoses including both pentoses and hexoses.[/B]; [SUBCELLULAR LOCATION] Cell membrane; Multi-pass </GBSeq_comment>
i want to write a code that can work for all three of this, I have the following XSLT code that works for scenario 1 and 3 (thanks to xml_looser), but doesn't work for 2 and 4.
the code is
<xsl:for-each select="GBSeq_comment">
<field name="protein_function">
<xsl:choose>
<xsl:when test="contains(.,'[FUNCTION]') and contains(.,'; [')">
<xsl:value-of select="substring-before(substring-after(.,'; [FUNCTION] '),'; [')"/>
</xsl:when>
<xsl:when test="contains(.,'[FUNCTION] ')">
<xsl:value-of select="substring-after(.,'[FUNCTION] ')"/>
</xsl:when>
</xsl:choose>
</field>
</xsl:for-each>
Could anyone of u please help me. I greatly appreciate your help and your time.
Thank you,
Sammed