Hello DaniWebbers,
So I have been hard at work on a new program that reads in a webpage every X amount of seconds. Once doing so it detects if there has been a change on the webpage, if so, updates a form, and let me know of the change.
Well I have been running into a snag. I recently got a webpage reader class that works perfect for me ... or so I thought. I have recently come to learn that WebBrowser stores a cache of recent visited sites and if it detects the same sight it will access it from the cache (if it hasn't expired).
One of the webpages I have been test this code on, is causing a problem. The webpage will update with new data, but my WebBrowser keeps reading in the old data from it's cache.
Here's my code
namespace ScoreTableDetector_2v2
{
//===================================================================================================================
class readInWebpage_v3 : IDisposable
{
WebBrowser wb;
bool timerTriggered;
//-------------------------------------------------------------------------------------------------------------------
public readInWebpage_v3 ()
{
wb = new WebBrowser();
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
timerTriggered = false;
}
//-------------------------------------------------------------------------------------------------------------------
public string downloadedData
{
get;
private set;
}
//-------------------------------------------------------------------------------------------------------------------
public void readIn (Uri webLink, int secs)
{
DateTime timeNow = DateTime.Now;
wb.Navigate(webLink);
//wb.Refresh(WebBrowserRefreshOption.Normal); //DOESN'T fix problem
TimeSpan elapsedTime;
while (wb.ReadyState != WebBrowserReadyState.Complete)
{
elapsedTime = DateTime.Now - timeNow;
if (elapsedTime.Seconds > secs) //This function does indeed work for a timed out (and supports Application.DoEvents() which seems to be needed for wb_DocumentCompleted)
{
timerTriggered = true;
break;
}
Application.DoEvents();
}
if (timerTriggered == false)
{
//downloadedData = wb.Document.Body.InnerHtml; //Added this line, because the final HTML takes a while to show up
//!!!! This seems redundent so hold up on it
if (this.downloadedData.Contains("Navigation to the webpage was canceled")) //The URL lead to an invalid webpage
{
downloadedData = "Invalid Webpage";
}
}
else //timed out
{
wb.Stop();
downloadedData = "Timed Out";
}
//wb.Dispose();
}
//-------------------------------------------------------------------------------------------------------------------
void wb_DocumentCompleted (object sender, WebBrowserDocumentCompletedEventArgs e) //when the webpage has finished loading (read it)
{
WebBrowser webBrows = (WebBrowser) sender;
downloadedData = webBrows.Document.Body.InnerHtml;
}
//-------------------------------------------------------------------------------------------------------------------
public void Dispose () //used for disposing items
{
if (wb != null)
{
wb.Dispose();
wb = null;
}
if (downloadedData != null)
{
downloadedData = "";
}
}
//-------------------------------------------------------------------------------------------------------------------
}
//===================================================================================================================
}
Now I read online about the cache WebBrowser has (actually to be honest at first I guessed it did that, what do you know a lucky guess), and that using the Refresh() command is suppose to force the page to re-read the webpage in it's current state.
Well I tried this and no matter what I do I can't get it to work for me. I tried at one point adding in a cold start if() statement. If the class was called the first time the wb.Navigate() would be called and everytime after the wb.Refresh() would be (instead of Navigate). I tried different Refreshes to, not just the commented out one above. But no matter what I tried, when I used the Refresh on its own never called the DocumentCompleted event (which I kind of rely on).
So what I am trying to figure out is how can I force my WebBrowser to constantly gather new data from the webpage and stop relying on the cache?
Oh yeah this is how I call the class
readInData = new readInWebpage_v3();
readInData.readIn(webpageURI, refreshTimer);
//does a bunch of stuff with the data
readInData.Dispose();
That's all tucked into a backgroundWorker_ProgressChanged (I was hoping disposing and recreating the WebBrowser would prove successful but it doesn't). I also know that there are sites this does work fine on, but the site I am on it doesn't and that's what I need it to work for.
Thanks in advance for any help