DEV Community

notHanii
notHanii

Posted on

API Scraping - E-Commerce Data - Excel & PostgreSQL Using Python

PAckages FOR This Process is

pip install jupyter
pip install requests
pip install pandas
pip install sqlalchemy
Enter fullscreen mode Exit fullscreen mode

After Installation given Packages
We USe THis E-Commerce Site for data

https://www.currys.co.uk/
Enter fullscreen mode Exit fullscreen mode

Now load this site and after That
USe this cmd On Chrome ANd inspect Given page
ctrl+shift+s
And Click on Network
And after it Look For File start with name navigation
and copy as a CURL By Right clicking on it

now go to google and search for

https://www.onlinedevtools.in/curl
Enter fullscreen mode Exit fullscreen mode

and paste the given file and convert It in Python
and AFter that Open Jupyter import Given Libraries We installed earlier

import requests
import pandas as pd
import sqlalchemy
Enter fullscreen mode Exit fullscreen mode

after that Paste the Given Code that we get from devtools site
Now, Create a json object and output data using this code

result_json  =  response.json()
result_json.keys()
listing =  result_json['resultsets']['default']['results']

Enter fullscreen mode Exit fullscreen mode

After all We Put everything togather That the KEys WE HAve By runnung Upper Code
`product_title = []
brand= []
price= []
review_score= []
review_count= []
description= []

for result in listing:
product_title.append(result['title'])
brand.append(result['brand'])
price.append(result['price'])
review_score.append(result['reevoo_score'])
review_count.append(result['reevoo_count'])
description.append(result['short_description'])
speakers_df = pd.DataFrame({'product_title':product_title,'brand':brand,'price':price,'review_score':review_score,
'review_count':review_count,'description':description})
speakers_df
`
after running all these cell you get Data from only One Page

So What we have t do if we need data from multiple pages

for that you have to make for loop
suppose i want data from 20 pages so we can us eloop that look like this

for i in range(1,21):
print (i)
Enter fullscreen mode Exit fullscreen mode

Here the Complete Code for Multiple Pages
`

headers = {
'authority': 'api.cdn.dcg-search.com',
'sec-ch-ua': '^\^',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
'sec-ch-ua-platform': '^\^Windows^\^',
'accept': '/',
'origin': 'https://www.currys.co.uk',
'sec-fetch-site': 'cross-site',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.currys.co.uk/',
'accept-language': 'en-US,en;q=0.9',
}
product_title = []
brand= []
price= []
review_score= []
review_count= []
description= []

for i in range(1,21):
params = (
('do', 'json'),
('m_results_per_page', '20'),
('page', str(i)),
('q', 'speaker^%%5E20docks'),
('sort', 'relevance'),
('sp_cs', 'UTF-8'),
)

response = requests.get('https://api.cdn.dcg-search.com/currys/navigation', headers=headers, params=params)

result_json  =  response.json()
listing =  result_json['resultsets']['default']['results']

for result in listing:
    #Product title
    try:
     product_title.append(result['title'])
    except:
        product_title.append('')

    #brand
    try:
        brand.append(result['brand'])
    except:
        brand.append('')
    try:

        price.append(float (result['price']))
    except:
        price.append('')
    try:
        review_score.append(float (result['reevoo_score']))
    except:
        review_score.append('')

    try:

        review_count.append(int (result['reevoo_count']))
    except:
        review_count.append('')
    try:

        description.append(result['short_description'])
    except:
        description.append('')
Enter fullscreen mode Exit fullscreen mode

speakers_multiple_df = pd.DataFrame({'product_title':product_title,'brand':brand,'price':price,'review_score':review_score,
'review_count':review_count,'description':description})
speakers_multiple_df

`

after all we have to save data in excel file so for that we use this logic

speakers_multiple_df.to_excel('speakers_multiple.xlsx',index = False)
Enter fullscreen mode Exit fullscreen mode

ANd also if we want to Save data and use in PostgreSQL
for that we use this logic For Multiple Pages

engine = sqlalchemy.create_engine('postgresql://postgres:"yourPassword"@localhost:5432')
#hAnii :(
speakers_multiple_df.to_sql('e-commerce',engine, index=False)
Enter fullscreen mode Exit fullscreen mode

for getting this Data In PostgreSQL You HAve To Install this In Your Machine

That's It For Now...............

Top comments (0)