Scrapy Ja3改造

已经有第三方库了，但是更新速度较慢，不是很成熟
库名：scrapy-ja3

使用方式1：直接在settings.py配置文件中加入一行

# ja3伪造
DOWNLOAD_HANDLERS = {
    'http': 'scrapy_ja3.download_handler.JA3DownloadHandler',
    'https': 'scrapy_ja3.download_handler.JA3DownloadHandler'
}

使用方式2：在爬虫文件中实现（settings.py文件中不配置）


from scrapy import Request, Spider


class Ja3TestSpider(Spider):
    name = 'ja3_test'

    custom_settings = {
        'DOWNLOAD_HANDLERS': {
            'http': 'scrapy_ja3.download_handler.JA3DownloadHandler',
            'https': 'scrapy_ja3.download_handler.JA3DownloadHandler',
        }
    }

    def start_requests(self):
        start_urls = [
            'https://tls.browserleaks.com/json',
        ]
        for url in start_urls:
            yield Request(url=url, callback=self.parse_ja3)

    def parse_ja3(self, response):
        self.logger.info(response.text)
        self.logger.info("ja3_hash: " + response.json()['ja3_hash'])

安装依赖的方式：

由于scrapy-ja3不支持最新版的scrapy
前两个依赖一定要指定版本，否则一定会出现各种依赖问题

pip install Twisted==22.10.0
pip install Scrapy==2.9.0
pip install scrapy-ja3

DEV Community

Scrapy Ja3改造

Top comments (0)