Discussion on: Web Scraping in PHP using Goutte - part 2

View post

What about Scraping Single Page Apps like angular or react apps? Does Goutte support's this? is this even possible using PHP? Is there anything that can do this? I've been looking for info in Client Side Rendered Scraping but there is little information.

Sayo Paul • Mar 4 '20

Yes, it is in fact possible with PHP. The tools use for this are called headless browsers. Headless browsers act as regular browsers ( running javascript, etc. ) Using a headless browser, javascript rendered pages can be scraped. We combine Goutte's crawler with the response from a headless browser such as Selenium or PhantomJS and we are able to use all of Goutte's crawling functions. This is personally what I use for scraping those type of sites.

Peter Rauscher • Jan 23

At scale, you're almost always better off avoiding headless browsers. Try using plain HTTP requests and parsing the HTML, the data loaded in SPAs is usually loaded from a JSON object in a tag somewhere. I wrote this extension that extracts the data for you:<br> <a href="https://chromewebstore.google.com/detail/kjlhnflincmlpkgahnidgebbngieobod" rel="nofollow">https://chromewebstore.google.com/detail/kjlhnflincmlpkgahnidgebbngieobod</a></p>