Hello,

I have just completed a rather large project which involved data that was less than optimal html wise. Our biggest problem currently is that we have a structure like this:

 <ul id="postblockinfo"> 
 <li>List One</li> 
 <li>List Two</li> 
 <li>List Three</li> 
 <li>List Four</li> 
 <li>List Five</li>
 </ul> 

Text that is suppose to be here and formatted correctly.

<ul> <li>Text that is suppose to be void of html tags around it but has them<br /> <br />  More more more <br /> <br />  and more more more <br /> <br /> More before it ends this way  </li> </ul> 

The first list with id postblockinfo is correct. It is also the only list that should be in the html. Any list html after the first list needs to be completely stripped from the content.

My question to you all, is what is the best way to achieve this?

you can use div in place of list like that
<div>Text that is suppose to be void of html tags around it but has them<br /> <br /> More more more <br /> <br /> and more more more <br /> <br /> More before it ends this way </div>
and formate this div using css

To provide more information, the data is in a database as shown in the first post. There are 15,000 such entries formatted in the same way that need to be fixed. I need to do this in bulk so to say.

I need to strip everything but the first list and the <br /> tags.

Member Avatar for diafol

ok. Unfortunately MySQL has very limited regexp functions and creating an update query to do this would be quite complicated and has the potential to cause havoc if some of the entries are different.

1.find the position of the first instance of '</ul>' or second instance of <ul>
2.from that point onwards strip all instances of <li> and </li>
3. from that point onwards (as mentioned by arti), replace <ul> with <div> and </ul> with </div> or even <p> and </p> - depending on your preferences.

//EDIT

Had a think - there are many, many ways to do this, but perhaps a simple method would be:

while($data = mysql_fetch_array($result)){
/*  $data['text'] = '<ul id="postblockinfo"> 
     <li>List One</li> 
     <li>List Two</li> 
     <li>List Three</li> 
     <li>List Four</li> 
     <li>List Five</li>
     </ul> 
    Text that is suppose to be here and formatted correctly.
    <ul> <li>Text that is suppose to be void of html tags around it but has them<br /> <br />  More more more <br /> <br />  and more more more <br /> <br /> More before it ends this way  </li> </ul> ';
*/

    $id = $data['id'];
    list($first, $second) = explode('</ul>',$data['text'], 2);
    $second = str_replace(array("<ul>","</ul>","<li>","</li>"),array("<div>","</div>","",""),$second);
    $patched =  $first . "</ul>" . $second;
    $run = mysql_query("UPDATE table SET `text` = '$patched' WHERE `id` = $id");

}

I tried the above on your example and it seemed to work - the only thing the first bit of text isn't 'div'-ed.
I strongly suggest that you create a duplicate table of your data and experiment with that, NOT the original table. If you find the process works, fine.
There are many variations on this theme as I mentioned, so have a fiddle. ;)

Thank you, that worked great!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.