using StreamReader to find a string

Question

paulious1983 0 Newbie Poster

15 Years Ago

Hi,
i am new to C# but have experiance in java and html and am having problems using the StreamReader class.

I am trying to run a C# script on my webserver to read a target website and extract particular weblinks and either save them to a local XML file or temporarily display them as clickable links on my generated page. The problem i am having is the link i want to extract isn't happily on a seperate line in the html code.

//xhtml

<h3><a href="http://tech-reviews.co.uk/reviews/prolimatech-megahalems-cpu-cooler/" rel="bookmark" title="Permanent Link to Prolimatech Megahalems CPU Cooler">Prolimatech Megahalems CPU Cooler</a></h3>

//xhtml

i want the reader class to extract from the above

'http://tech-reviews.co.uk/reviews/prolimatech-megahalems-cpu-cooler/'

Below is the code i am using so far

C#

<%@ Page language="c#"%>
<%@ Import Namespace="System.Net" %>
<%@ Import Namespace="System.IO" %>
<script runat="server" lang="c#">

private void Page_Load(object sender, System.EventArgs e)
{
//Retrieve URL from user input box
if(Page.IsPostBack)
litHTMLfromScrapedPage.Text = GetHtmlPage( tbURL.Text );
}
public String GetHtmlPage(string strURL)
{
// the html retrieved from the page
String strResult;
WebResponse objResponse;
WebRequest objRequest = System.Net.HttpWebRequest.Create(strURL);
objResponse = objRequest.GetResponse();
// the using keyword will automatically dispose the object 
// once complete
using (StreamReader sr = 
new StreamReader(objResponse.GetResponseStream()))
{
strResult = sr.ReadToEnd();

// Close and clean up the StreamReader
sr.Close();
}
return strResult;
}

//c#

Any help would be greatly appreciated and appologies if i have not adhered to the posting rules (my first post)

3 Contributors
2 Replies
860 Views
23 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by kvprajapati

sknake 1,622 Senior Poster

15 Years Ago

Please use code tags when posting on daniweb:

[code=c#] ...code here...

[/code]

Lastly this is a highly talked about topic called "scraping". If you google "C# Scraping" or "c# scrapers" you will find a lot of example projects that do exactly this.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

kvprajapati 1,826 Posting Genius Team Colleague · Answer 1 · 2009-08-28T17:20:37+00:00

WebClient wc = new WebClient();
Stream stream = wc.OpenRead("http://www.google.co.in/search?hl=en&q=regular+expression+in+.net&meta=&aq=7&oq=regular+expression+in+.ne");
        StreamReader reader = new StreamReader(stream);
        string s = reader.ReadToEnd();

        Regex r = new Regex(@"href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))",RegexOptions.IgnoreCase |  RegexOptions.Compiled);

        Match m = r.Match(s);
        while (m.Success)
        {
            Console.WriteLine(m.Groups[1].Value + " " + m.Groups[1].Index);
            m = m.NextMatch();
        }