In this blog post, I’ll walk you through my recent contribution to the scrape-it repository, where I resolved a type error in the ScrapeOptionElement interface. This was part of issue
Type error with the how field of type ScrapeOptionElement #193
I recently started using scrape-it to extract data from html pages, but there were cases where I needed to intervene in the extraction using the Cheerio API.
// test example
const anyHTML = '<html>...</html>'
const { data } = scrapeIt.scrapeHTML<{ data: unknown }>(anyHTML, {
data: {
listItem: 'main',
data: {
items: {
selector: 'article',
how: (element) => {
const $items = element.find('p:nth-child(n+2)')
// more cheerio methods
return $items.text()
}
}
}
}
})
TypeScript throws a typing warning, if you run the code nothing happens, but it becomes a nuisance to have that warning and not have autocompletion with the Cheerio object passed to the function parameter.
Looking into the types of scrape-it, the how
field has as its type a function whose parameter is a cheerio.Selector
, which may cause the problem.
export interface ScrapeOptionElement {
selector?: string;
convert?: (value: any) => any;
// Change cheerio.Selector to cheerio.Cheerio
how?: string | ((element: cheerio.Selector) => any);
attr?: string;
trim?: boolean;
closest?: string;
eq?: number;
texteq?: number;
}
Which highlighted outdated type definitions when using the how function in TypeScript. Here’s the journey from identifying the problem to submitting a successful pull request.
The issue: Outdated Type
- The issue was centered around the ScrapeOptionElement interface, specifically the how field. The problem was that the element parameter of the how function was not correctly typed in TypeScript. As a result, developers were not getting proper type hints or autocompletion when using Cheerio methods.
Issue URL: Type error with the how field of type ScrapeOptionElement #193
The how field in ScrapeOptionElement was typed as string | ((element: any) => any). This typing was too generic for users working with Cheerio. What we needed was to type the element parameter more specifically as Cheerio, ensuring that TypeScript could recognize Cheerio methods and provide better tooling support.
What Did I Do to Fix It?
- To address the issue, I updated the type definition for the ScrapeOptionElement interface in the index.d.ts file.
export interface ScrapeOptionElement {
selector?: string;
convert?: (value: any) => any;
how?: string | ((element: cheerio.Selector) => any); // original
}
How I updated it ?
- I updated the how field to explicitly use the Cheerio type for the element parameter. This ensured that TypeScript would provide correct typings and autocompletion for all Cheerio methods. So we are using reference to declaration so we don't have to import.
/// reference Cheerio
export interface ScrapeOptionElement {
selector?: string;
convert?: (value: any) => any;
how?: string | ((element: Cheerio) => any); // Updated type
}
(() => {
const { data } = scrapeIt.scrapeHTML<{data: unknown}>("https://ionicabizau.net", {
data: {
listItem: 'main'
, data: {
items:{
selector: 'article'
, how: (element) => {
const $items = element.find('p:nth-child(n+2)')
return $items.text()
}
}
}
}
})
console.log(data)
})
Testing the Fix
- After making the necessary changes, I created an example TypeScript file (example/index-type.ts) to demonstrate how the updated ScrapeOptionElement interface works. This example included both Promise-based and async/await usage to showcase how the how function could be used with proper typings in TypeScript.
Research and Challenges
- While updating the type definition itself was straightforward, I had to ensure that the fix didn’t break existing functionality or introduce new errors. My primary challenge was ensuring backward compatibility and understanding how Cheerio’s type definitions worked.
Pull Request and Interaction with Maintainers
- Once I completed the fix, I submitted a pull request PR #194 to the scrape-it repository. The project maintainer, IonicaBizau, reviewed my code and provided feedback. After ensuring that all tests passed and the changes were satisfactory, the PR was merged successfully.
Conclusion
This contribution helped improve the TypeScript experience for developers using the scrape-it library by providing proper typings for the how function when working with Cheerio elements. By fixing this type error, developers can now enjoy better autocompletion and error-checking in their scraping projects.
If you’re working with web scraping and TypeScript, this fix makes it easier to write robust, type-safe code. You can check out the updated type definitions in version 6.1.3 of scrape-it.
Top comments (0)