I am trying to filter/parse HTML tags mentioned below to get the desired output (mentioned). I have tried so far with `sed` & `grep` but was only able to extract the content between the starting and closing tags. However, I want to be more specific like below I have mentioned. Any help would be really appreciated. Thanks!
1. Filter:
`<a href="http://www.gradle.org">Gradle 2.4</a>` at Aug 8, 2015 6:38:46 PM`</p>`
--> I want to fetch date and time out of this (Way I thought was to extract between ...at and `</p>` - not sure if I am correct.
2. Filter `<div class="percent">50%</div>` to get that 50%
--> My approach: if class="percent" extract number out of the tag
3. Filter `<a href="packages/com.pratik.testing.html">com.pratik.testing</a>` to get package name i.e com.pratik.testing
4. Filter `<a href="classes/com.pratik.testing.UserTest.html">UserTest</a>`&&
I want to fetch test performed ---> i.e UserTest (Content between tags)
5. Filter time taken out of `<td>5.308s</td>`
6. Filter failure % out of `<td class="failures">50%</td>`
7. Filter test that has failed, here it is "failingTest"
out of `<h3 class="failures">failingTest</h3>`
I want to come up with a bash script that will do all above filtering on index.html (where index.html has these above html tags + `additional tags which I do not want to worry about /filter`)
Input:
<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <meta http-equiv="x-ua-compatible" content="IE=edge"/> <title>Test results - Test Summary</title> <link href="css/base-style.css" rel="stylesheet" type="text/css"/> <link href="css/style.css" rel="stylesheet" type="text/css"/> <script src="js/report.js" type="text/javascript"></script> </head> <body> <div id="content"> <h1>Test Summary</h1> <div id="summary"> <table> <tr> <td> <div class="summaryGroup"> <table> <tr> <td> <div class="infoBox" id="tests"> <div class="counter">2</div> <p>tests</p> </div> </td> <td> <div class="infoBox" id="failures"> <div class="counter">1</div> <p>failures</p> </div> </td> <td> <div class="infoBox" id="ignored"> <div class="counter">0</div> <p>ignored</p> </div> </td> <td> <div class="infoBox" id="duration"> <div class="counter">5.308s</div> <p>duration</p> </div> </td> </tr> </table> </div> </td> <td> <div class="infoBox failures" id="successRate"> <div class="percent">50%</div> <p>successful</p> </div> </td> </tr> </table> </div> <div id="tabs"> <ul class="tabLinks"> <li> <a href="#tab0">Failed tests</a> </li> <li> <a href="#tab1">Packages</a> </li> <li> <a href="#tab2">Classes</a> </li> </ul> <div id="tab0" class="tab"> <h2>Failed tests</h2> <ul class="linkList"> <li> <a href="classes/com.pratik.testing.UserTest.html">UserTest</a>.
<a href="classes/com.pratik.testing.UserTest.html#failingTest">failingTest</a> </li> </ul> </div> <div id="tab1" class="tab"> <h2>Packages</h2> <table> <thead> <tr> <th>Package</th> <th>Tests</th> <th>Failures</th> <th>Ignored</th> <th>Duration</th> <th>Success rate</th> </tr> </thead> <tbody> <tr> <td class="failures"> <a href="packages/com.pratik.testing.html">com.pratik.testing</a> </td> <td>2</td> <td>1</td> <td>0</td> <td>5.308s</td> <td class="failures">50%</td> </tr> </tbody> </table> </div> <div id="tab2" class="tab"> <h2>Classes</h2> <table> <thead> <tr> <th>Class</th> <th>Tests</th> <th>Failures</th> <th>Ignored</th> <th>Duration</th> <th>Success rate</th> </tr> </thead> <tbody> <tr> <td class="failures"/> <a href="classes/com.pratik.testing.UserTest.html">com.pratik.testing.UserTest</a> <td>2</td> <td>1</td> <td>0</td> <td>5.308s</td> <td class="failures">50%</td> </tr> </tbody> </table> </div> </div> <div id="footer"> <p> <div> <label class="hidden" id="label-for-line-wrapping-toggle" for="line-wrapping-toggle">Wrap lines
<input id="line-wrapping-toggle" type="checkbox" autocomplete="off"/> </label> </div>Generated by
<a href="http://www.gradle.org">Gradle 2.4</a> at Aug 8, 2015 6:38:46 PM</p> </div> </div> </body> </html>
Desired output:
JSON:
Aug 8 2015 6:38:46 PM, 50%, com.pratik.testing, UserTest, 5.308s, failingTest,..
jpratik21 0
Newbie Poster
rch1231 169
Posting Shark
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.