Folks,

Look at this youtube serp:
https://www.youtube.com/results?search_query=make+money+online+in+2022
You are shown 10 search results.
Now, if you want to see more results, you got to scroll to the bottom of the page and only then more search results would be presented. Otherwise not.

Q1.
My question is, what php function or javascript function youtube is using here to feed you new search results like this ?

Folks,

Q2.
With php or cURL php, how would you load the page on your screen (with what codes) and get php/cURL php to scroll to the bottom of the page for more results to be presented ?
Which php functions would you use here to do this ?
And, which php cURL functions would you use here to do this ?
I need to know the functions names to further look into them on the php.net manual. Do not know where to start unless I learn the functions names first.

Yes. You have guessed it. I am trying the get php, php cURL to fetch pages (youtube search result pages) and scrape search result links youtube presents. Youtube only presents 10 links unless you scroll to the bottom of the page for more results to load. I need to teach my web agent to scroll to the bottom of the page after scraping the links on the page. And so, this thread is now opened.

Any code snippets to get me starting would be most appreciated.

Thanks

Hello and welcome to DaniWeb!!

What you are describing is a Javascript technique called infinite scrolling. Here at DaniWeb, we use this technique on pages such as https://www.daniweb.com/programming/web-development/6 and the Javascript library that we use can be found at https://infinite-scroll.com/

Depending on how the infinite scroll is generated, you may or may not be able to use PHP-issued cURL requests to scrape the contents.

For example, the way that we do it here at DaniWeb, is that our pages are actually paginated (into page 1, page 2, page 3, etc.). We then have some Javascript that tells the web browser that when the end-user scrolls to the bottom pf page 1, load up the contents of page 2, and inject them right into the HTML at the bottom of the page 1 that the user is viewing. Then, as they continue scrolling down, load up the contents of page 3, and so on and so forth. With that type of setup, one could easily issue cURL requests individually to page 1, page 2, and page 3.

However, that does not appear to be what Youtube is doing. (Alas, they are much more sophisticated that my measly method.) The only way I think you can retrieve the contents beyond the subsequent first load of the search query would be to use a headless web browser that is capable of emulating scrolling down a page.

A headless web browser is basically an automated web browser that is not interacted with, but that is capable of understanding web pages. We use the google-chrome headless web browser to generate the screenshots that appear (for example, when you hover over the link to youtube in the post you just made).

Puppetteer is a Node.js-based library for the google-chrome headless browser, that was written and supported by Google engineers.

Here are some links on how you can use Puppetter to scrape infinite scroll webpages:

And, of course, there are a lot more tutorials and resources on the web. From the little research I've done, Puppeteer seems to be the most popular way to scrape infinite scrolling webpages.

Again, because infinite scrolling is something that happens via client-side Javascript, this isn't something you can do with PHP-issued cURL requests. cURL is a very rudimentary headless web browser that is capable of retrieving the HTML for a webpage and spitting them out into a file or a PHP variable, but it's not sophisticated enough to be able to emulate scrolling, or a mouse cursor, or process Javascript on the page.

commented: Thank you very much for your valuable reply. It was more than I expected. +0
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.