DEV Community

Discussion on: Web Scraping in PHP using Goutte - part 2

Collapse
 
marcohern profile image
Marco Hernandez

What about Scraping Single Page Apps like angular or react apps? Does Goutte support's this? is this even possible using PHP? Is there anything that can do this? I've been looking for info in Client Side Rendered Scraping but there is little information.

Collapse
 
sayopaul profile image
Sayo Paul

Yes, it is in fact possible with PHP. The tools use for this are called headless browsers. Headless browsers act as regular browsers ( running javascript, etc. ) Using a headless browser, javascript rendered pages can be scraped. We combine Goutte's crawler with the response from a headless browser such as Selenium or PhantomJS and we are able to use all of Goutte's crawling functions. This is personally what I use for scraping those type of sites.

Collapse
 
peterrauscher profile image
Peter Rauscher

At scale, you're almost always better off avoiding headless browsers. Try using plain HTTP requests and parsing the HTML, the data loaded in SPAs is usually loaded from a JSON object in a tag somewhere. I wrote this extension that extracts the data for you:<br> <a href="https://chromewebstore.google.com/detail/kjlhnflincmlpkgahnidgebbngieobod" rel="nofollow">https://chromewebstore.google.com/detail/kjlhnflincmlpkgahnidgebbngieobod</a></p>