get all paragraph in tag

Question

xzero1 0 Newbie Poster

12 Years Ago

hi, i am using regular expression in extract paragraph from html code but it gives me only one (the first one) line written in first i want the whole article in my string. here is my code.

Match m = Regex.Match(htmlstring, @"\s(.+?)\s");
htmlstring=m.Groups[1].Value;

where "htmlstring" all the html code in text form in it.

regex

4 Contributors
4 Replies
3K Views
5 Hours Discussion Span
Latest Post 12 Years Ago Latest Post by nmaillet

All 4 Replies

nmaillet 97 Posting Whiz in Training

12 Years Ago

Well I can't vouch for HAP, I do agree with __avd. In any case, by default the dot operator (.) matches any character except the line feed character (\n). You can change the default behaviour by doing something like this:

Regex regex = new Regex(@"<p>\s(.)\s</p>", RegexOptions.SingleLine);
Match m = regex.Match(htmlstring);

Just note, that there are a lot of things that can go wrong when doing this. The other option is to use * after the subexpression to capture each as a group. If you want to capture multiple paragraphs, wrap the whole thing in round braces () and add the * to capture multiple groups.

F**ks sake, I am really starting to hate this new text editor. Writing this one short post was painful.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

kvprajapati 1,826 Posting Genius Team Colleague · Answer 1 · 2012-06-21T07:29:11+00:00

Don't reinvent the wheel! Use Html Agility Pack

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

ChrisHunter 152 Posting Whiz in Training Featured Poster · Answer 2 · 2012-06-21T07:37:44+00:00

can you not give the tag an ID attribute and then use that ID to reference the text attribute of the tag ?

xzero1 0 Newbie Poster · Answer 3 · 2012-06-21T08:11:26+00:00

@Chris i can't understand that. actually i got all the html code in a string and now i just want to pull out the paragraphs written in between "p" tags. actually i am trying to fetch the whole article.

get all paragraph in <p> tag

Recommended Answers

All 4 Replies

get all paragraph in <p> tag

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers