DEV Community

loading...

Day 18 of 100Days Of Code: Following Links in HTML Using BeautifulSoup

iamdurga profile image Durga Pokharel ・2 min read

Today is my 18th day of #100Daysofcode and #python. Today I also continue to learn python access to web data on Coursera.. Did Day2 challenge on Advent of code 2020.

Worked on CSS flexbox challenge in the topic use display, add flex superpowers to the tweet embed, use the flex-direction property to make a row, apply flex-direction property to create rows in the tweet embed, use the flex-direction property to make a column etc on Freecodecamp.

Following Links in HTML Using BeautifulSoup

The program start by importing library urllib.request, urllib.parse, urllib.error,and BeautifulSoup

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import re

Enter fullscreen mode Exit fullscreen mode

The program will use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name we find.


url = "http://py4e-data.dr-chuck.net/known_by_Danni.html"
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
all_num_list = list()
link_position = 18
process_repeat = 3
tags = soup('a')

while process_repeat -1 >=0:
    print('process round',process_repeat)
    target = tags[link_position - 1]
    print('target:',target)
    url = target.get('href',2)
    print('Current url',url)
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html,"html.parser")
    tags = soup('a')
    process_repeat = process_repeat - 1

Enter fullscreen mode Exit fullscreen mode

Discussion (0)

pic
Editor guide