DEV Community

Cover image for How to Scrape Websites Using Python
Ordinary Coders
Ordinary Coders

Posted on • Originally published at ordinarycoders.com

How to Scrape Websites Using Python

1) Create a Python virtual environment

C:\Users\Owner> cd desktop
C:\Users\Owner\desktop> py -m venv scrap
C:\Users\Owner\desktop> cd scrap
C:\Users\Owner\desktop\scrap> Scripts\activate
(scrap)C:\Users\Owner\desktop\scrap>
Enter fullscreen mode Exit fullscreen mode

2) Install scrapy

(scrap)C:\Users\Owner\desktop\scrap>pip install scrapy
Enter fullscreen mode Exit fullscreen mode

3) Create a scrapy project

scrapy startproject myproject
Enter fullscreen mode Exit fullscreen mode

4) Create a basic spider
Create a file named spider1.py in the myprojects > spiders folder.
Add the subclass, name, start_urls, and get all of the text in the .readmore <p> tag.

import scrapy
class ReviewSpider(scrapy.Spider):
    name = "quicken"
    start_urls = [
    "https://www.creditkarma.com/reviews/mortgage/single/id/quicken-loans-mortgage/",
    ]
    def parse(self, response):
      reviews = response.css('.readmoreInner p::text').getall()
      yield {"text" : reviews}
Enter fullscreen mode Exit fullscreen mode

5) Run the spider

(scrap) C:\Users\Owner\Desktop\code\scrap\myproject\myproject\spiders>scrapy crawl quicken
Enter fullscreen mode Exit fullscreen mode

6) Save the data

(scrap) C:\Users\Owner\Desktop\code\scrap\myproject\myproject\spiders>scrapy crawl quicken -o reviews.json
Enter fullscreen mode Exit fullscreen mode

Beginner's Guide to Scrapy for Python

Top comments (0)