DEV Community

drake
drake

Posted on

Scrapy Ja3改造

  • 已经有第三方库了,但是更新速度较慢,不是很成熟
  • 库名:scrapy-ja3

  • 使用方式1:直接在settings.py配置文件中加入一行
# ja3伪造
DOWNLOAD_HANDLERS = {
    'http': 'scrapy_ja3.download_handler.JA3DownloadHandler',
    'https': 'scrapy_ja3.download_handler.JA3DownloadHandler'
}
Enter fullscreen mode Exit fullscreen mode
  • 使用方式2:在爬虫文件中实现(settings.py文件中不配置)

from scrapy import Request, Spider


class Ja3TestSpider(Spider):
    name = 'ja3_test'

    custom_settings = {
        'DOWNLOAD_HANDLERS': {
            'http': 'scrapy_ja3.download_handler.JA3DownloadHandler',
            'https': 'scrapy_ja3.download_handler.JA3DownloadHandler',
        }
    }

    def start_requests(self):
        start_urls = [
            'https://tls.browserleaks.com/json',
        ]
        for url in start_urls:
            yield Request(url=url, callback=self.parse_ja3)

    def parse_ja3(self, response):
        self.logger.info(response.text)
        self.logger.info("ja3_hash: " + response.json()['ja3_hash'])
Enter fullscreen mode Exit fullscreen mode

  • 安装依赖的方式:

由于scrapy-ja3不支持最新版的scrapy
前两个依赖一定要指定版本,否则一定会出现各种依赖问题

pip install Twisted==22.10.0
pip install Scrapy==2.9.0
pip install scrapy-ja3
Enter fullscreen mode Exit fullscreen mode

Top comments (0)