I've been working on some other projects the past week or so and I've gone back to work while still looking for my first developer role. I have made some progress just enough for me to really talk about yet. I'm still having some issues with ffmpeg adding audio to video, but I'm still optimistic about this project and hope to wrap up the last few steps soon. Today I'm going to explain how I'm using puppeteer to grab screenshots from chrome.
The screenshot.js file
This file is exporting the screenshot()
function that uses puppeteer to grab screenshots from the question. This was surprisingly easy to do, and was really satisfying to get working. The function takes in the questions url as well as the questionDataObj
global variable.
First, url
is set equal to the questions url. Then inside of a function called getScreenShot()
puppeteer is used to launch a headless chrome browser, navigate to the url and then screenshot different divs based on CSS selectors that contain either keywords or ID's grabbed from the API call that I talked about last week.
Some of this process is repeated a few different times because of the way the divs are being selected. But everything starts with this code block
// open browser and navigate to questionURL
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
// if page has cookies prompt, close prompt
const [cookieButton] = await page.$x(
"//button[contains(., 'Accept all cookies')]"
);
if (cookieButton) {
await cookieButton.click();
}
This starts chrome, goes to the url and then closes an 'Accept Cookies' prompt that often shows up if it exists. I was pleasantly surprised with how easy it was to simulate clicks with puppeteer.
Next, is the code that screenshots the questions title
// find question title and screenshot it
await page.waitForSelector("#question-header");
const questionTitle = await page.$("#question-header");
await questionTitle.screenshot({
path: "./screenshots/question-title.png",
});
This finds the selector that is used for the title and screenshots just the div that contains the title. After that, the questions body is then found and has a screenshot taken of it.
// find question body and screenshot it
await page.waitForSelector(
"#question > div.post-layout > div.postcell.post-layout--right"
);
const questionBody = await page.$(
"#question > div.post-layout > div.postcell.post-layout--right"
);
await questionBody.screenshot({
path: "./screenshots/question-body.png",
});
The questions body uses a different syntax that the title because of how the div it's in is nested. This was probably the biggest struggle for me with screenshots. Still much easier than I thought the whole thing was going to be.
Then, using the questionDataObj
I use a for loop to loop over the array of answer ID's from the API call. There's another prompt that likes to pop up here sometimes that needed to be handled as well.
// loop through answer ID's
for (let i = 0; i < questionDataObj.answerIds.length; i++) {
// find answer and screenshot it
await page.waitForSelector(`#answer-${questionDataObj.answerIds[i]}`);
const answerText = await page.$(`#answer-${questionDataObj.answerIds[i]}`);
// close prompt if it exists
const [button] = await page.$x("//button[contains(., 'Dismiss')]");
if (button) {
await button.evaluate((b) => b.click());
}
await answerText.screenshot({
path: `./screenshots/answer${questionDataObj.answerIds[i]}.png`,
});
}
That grabs all of the screenshots we need, and all that's left is to close the page and chrome.
await page.close();
await browser.close();
I found this to be much easier than I had initially expected, and I want to try using puppeteer again for other projects in the future. Thanks for reading and feel free to check out some of my other posts. If you have any questions for me I'll do my best to answer them in the comments.
Top comments (0)