So awhile back I came here asking for help trying to read in data from a webpage, more specifically the HTML. I was running into an issue where the data wasn't actually being read (as if it hadn't loaded in). I had to put the program aside for awhile as life got busy, but I finally have time to come back to it.
The page I am trying to read in looks like this (it's a roster and there are many of them, each for a different clan)
http://worldoftanks.com/community/clans/1000000954-SAC/
Now I initally tried using code like the following
WebClient myWebClient = new WebClient(); // Create a new WebClient instance.
string dataDownloaded = "";
try
{
byte [] myDataBuffer = myWebClient.DownloadData(input [i]);
//download the Web resource and save it into a data buffer.
dataDownloaded = Encoding.UTF8.GetString(myDataBuffer); //moves the downloaded data to the string variable
}
catch (Exception error)
{
MessageBox.Show(error.ToString());
}
Unfortantly this would not work, and instead of reading in HTML like this (this is just what one of the many lines look like, up to 100 players per clan, so up to 100 of these clumps of code)
<TD class="number t-number">1</TD>
<TD class="name t-name b-user js-rendered-template"><A href="/community/accounts/1000391751-AgentAlaskan/">AgentAlaskan</A></TD>
<TD class="role js-role js-rendered-template">Soldier</TD>
<TD class="member_since js-member-since js-rendered-template">21.06.2011</TD></TR>
<TR class="even clan-role-commander">
All I would get would be something like this (note there should be a space after "member_since ..." but it messed up the quote)
<tbody id="member_table_container"> <tr class="js-template js-hidden"> <td class="number t-number"></td> <td class="name t-name b-user"></td> <td class="role js-role"></td> <td class="member_since js-member-since"></td> </tr> </tbody>
Well I searched around the web trying to find a way to read this data. I was pretty sure it had to deal with javascript, or something like that. Then finally I stumboled on a piece of code that worked for my needs. The code looks like this (Sorry for format, copied straight from my program)
namespace WoTClanRead_3v3
{
//==========================================================================================
class readInClanRoster
{
//------------------------------------------------------------------------------------------
private string GeneratedSource
{
get;
set;
}
//------------------------------------------------------------------------------------------
private string URL
{
get;
set;
}
//------------------------------------------------------------------------------------------
public string GetGeneratedHTML (string url)
{
URL = url;
Thread t = new Thread(new ThreadStart(WebBrowserThread));
t.SetApartmentState(ApartmentState.STA);
t.Start();
Cursor.Current = Cursors.WaitCursor;
t.Join();
Cursor.Current = Cursors.Default;
return GeneratedSource;
}
//------------------------------------------------------------------------------------------
private void WebBrowserThread ()
{
WebBrowser wb = new WebBrowser();
wb.Navigate(URL);
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
while (wb.ReadyState != WebBrowserReadyState.Complete)
Application.DoEvents();
//Added this line, because the final HTML takes a while to show up
GeneratedSource = wb.Document.Body.InnerHtml;
wb.Dispose();
}
//------------------------------------------------------------------------------------------
private void wb_DocumentCompleted (object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = (WebBrowser) sender;
GeneratedSource = wb.Document.Body.InnerHtml;
}
//------------------------------------------------------------------------------------------
}
//==========================================================================================
}
Well initally the code worked out great, it was finally read in the data, that for the longest time I was having to go to the developing tools in IE and save a .txt file that the program would parse and get the data I required.
There are two problems with this code however. One I really don't fully understand how it works, and this leads to my second problem. For some reason, sometimes the page will still not load in properly and I won't get the code i need (it will act like that 2nd quote I pasted above)
So I assume, "oh the page didn't load up, let's run it again". Well then I stumboled on a problem. If the page doesn't load (I determine this by taking the data read in from the string, and parse it, and if it realizes it can't find any player's links, then it didn't get the full page), I stop the process of parsing, and I ask the user to try again.
Try again meaning click the button that starts the process again, and go over the whole process. Well the problem I have noticed lately is that if it fails once, it will fail again, and on the third try, the program pretty much locks up.
If you notice in the code, there's code for a wait cursor, kind of like a debugging tool. On that third try, the cursor never goes back to default, and based on what I know from coding, it's as if it gets stuck in the while loop.
So my question now is, why is this happening? And can someone help explain to me this code I found on the web? Thanks in advance for the help
(sorry this was a lot, but I feel good details help explain the situation better).