This post cross-published with OnePublish
Welcome to the first post of Upwork Series. In this series, we are going to work on real-world applicati...
For further actions, you may consider blocking this person and/or reporting abuse
Just a couple of minor points, which might make the code look clean and neat.
If you define a method like the following
can become
Of course you need to declare
information
as global variableAlso if you want to use
pandas
pandas.pydata.org/,pd.to_csv
which would directly write the csv file, without the need of additional (header) kungfuThis is a really interesting concept for a series!
Thanksππ
For some reason, this code gave me AttributeError when a Dot number was not found. I figured out that this was due to bs.find('center') not finding the correct field (since it doesn't exist on the page for non-existent or outdated DoT number). I solved the problem by changing this:
to
so that instead of doing nothing (
pass
) I'd switch to the next DoT number. I also had to move the whole block of code starting with "information" one tab to the right so that it's only executed whentry
statement executes without errors. This way only valid DoT numbers are crawled and saved. Hope this helps!Here's how the code looks in the final form:
Also, it'd be convenient to add some sort of progress bar that would state which DoT is crawled at the moment and how many are left, as well as a short statement in the case when DoT number is not found.
Good post, with examples and explanations, this can be interesting.
I invite you to try the new lib dev.to/juancarlospaco/faster-than-...
π
I'm looking to hire a dev that can create a web scraper to .csv format that will run daily at 9am est. It can save to a google sheet once run.
Each day a new pdf is published. ie: li-public.fmcsa.dot.gov//lihtml/rp...
only the digits change which correspond to the daily date change.
need to pull the following per row:
MC Number
Company Name
Name
Address
Phone
email to discuss budget, can pay through paypal G&S quickly. Looking for help immediately.
info@dnxint.com is my email
Hi! Great article, thanks for posting.
Just one question, how does this related to Upwork? Just curious as a former freelancer on a platform.
Nice post. Thanks for make this one. The explanation is so clear. Happy to read it .. :)
2 Issues,
1) The Code is creating seprate CSV file for each line
2) My dots.xls file have 100 Dot number but its only search 2 and end the process