0
down vote
favorite
So you have these Microsoft KB support articles. Example: https://support.microsoft.com/en-us/help/871122

My goal is to be able to iterate through a list of these pages and see if the articles actually exist. As you can imagine, it's hard to do it manually, so I need a program to do that for me. I have tried loading a page with .NET HTML Agility Pack and also some PHP code posted below. Both result in a broken and useless page. Namely, this is what I get: https://i.stack.imgur.com/mKoED.png

Any ideas on why this is happening and what's something I can do to fix this?

Here is some PHP code I have tried.. nothing too special.

$url = 'https://support.microsoft.com/en-us/help/871122';
$htm = file_get_contents($url);
echo $htm;

Thanks!

What's with the first lines of your post with 0, down vote and favorite?

Ahh, I see it. Reading https://stackoverflow.com/questions/44479738/load-a-microsoft-support-articles-outside-a-browser they wanted you to post more code.

-> That out of the way I went and researched my thoughts about wget and more with this search: https://www.google.com/search?q=wget&ie=utf-8&oe=utf-8#q=wget+a+microsoft+kb+number

https://www.sevenforums.com/windows-updates-activation/402888-wget-curl-fail-download-kb-html-page.html looked interesting and you can alter that to fit.

// EDIT

In addition to rproffitt's: contents, in the MS knowledge board, are loaded through Javascript, file_get_contents() won't load them, you need a rendering engine (like browsers do) to run your scripts. So you need something like PhantomJS: http://phantomjs.org/

Few months ago I posted an usage example here:

Which generates a screenshot of the page. The discussion was about testing the existence of a page with an HEAD request, which MS drops, and on success perform a GET request.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.