PAckages FOR This Process is
pip install jupyter
pip install requests
pip install pandas
pip install sqlalchemy
After Installation given Packages
We USe THis E-Commerce Site for data
https://www.currys.co.uk/
Now load this site and after That
USe this cmd On Chrome ANd inspect Given page
ctrl+shift+s
And Click on Network
And after it Look For File start with name navigation
and copy as a CURL By Right clicking on it
now go to google and search for
https://www.onlinedevtools.in/curl
and paste the given file and convert It in Python
and AFter that Open Jupyter import Given Libraries We installed earlier
import requests
import pandas as pd
import sqlalchemy
after that Paste the Given Code that we get from devtools site
Now, Create a json object and output data using this code
result_json = response.json()
result_json.keys()
listing = result_json['resultsets']['default']['results']
After all We Put everything togather That the KEys WE HAve By runnung Upper Code
`product_title = []
brand= []
price= []
review_score= []
review_count= []
description= []
for result in listing:
product_title.append(result['title'])
brand.append(result['brand'])
price.append(result['price'])
review_score.append(result['reevoo_score'])
review_count.append(result['reevoo_count'])
description.append(result['short_description'])
speakers_df = pd.DataFrame({'product_title':product_title,'brand':brand,'price':price,'review_score':review_score,
'review_count':review_count,'description':description})
speakers_df
`
after running all these cell you get Data from only One Page
So What we have t do if we need data from multiple pages
for that you have to make for loop
suppose i want data from 20 pages so we can us eloop that look like this
for i in range(1,21):
print (i)
Here the Complete Code for Multiple Pages
`
headers = {
'authority': 'api.cdn.dcg-search.com',
'sec-ch-ua': '^\^',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
'sec-ch-ua-platform': '^\^Windows^\^',
'accept': '/',
'origin': 'https://www.currys.co.uk',
'sec-fetch-site': 'cross-site',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.currys.co.uk/',
'accept-language': 'en-US,en;q=0.9',
}
product_title = []
brand= []
price= []
review_score= []
review_count= []
description= []
for i in range(1,21):
params = (
('do', 'json'),
('m_results_per_page', '20'),
('page', str(i)),
('q', 'speaker^%%5E20docks'),
('sort', 'relevance'),
('sp_cs', 'UTF-8'),
)
response = requests.get('https://api.cdn.dcg-search.com/currys/navigation', headers=headers, params=params)
result_json = response.json()
listing = result_json['resultsets']['default']['results']
for result in listing:
#Product title
try:
product_title.append(result['title'])
except:
product_title.append('')
#brand
try:
brand.append(result['brand'])
except:
brand.append('')
try:
price.append(float (result['price']))
except:
price.append('')
try:
review_score.append(float (result['reevoo_score']))
except:
review_score.append('')
try:
review_count.append(int (result['reevoo_count']))
except:
review_count.append('')
try:
description.append(result['short_description'])
except:
description.append('')
speakers_multiple_df = pd.DataFrame({'product_title':product_title,'brand':brand,'price':price,'review_score':review_score,
'review_count':review_count,'description':description})
speakers_multiple_df
`
after all we have to save data in excel file so for that we use this logic
speakers_multiple_df.to_excel('speakers_multiple.xlsx',index = False)
ANd also if we want to Save data and use in PostgreSQL
for that we use this logic For Multiple Pages
engine = sqlalchemy.create_engine('postgresql://postgres:"yourPassword"@localhost:5432')
#hAnii :(
speakers_multiple_df.to_sql('e-commerce',engine, index=False)
for getting this Data In PostgreSQL You HAve To Install this In Your Machine
That's It For Now...............
Top comments (0)