Discussion on: Do you use BeautifulSoup or LXML to parse your HTML markup in Python?

View post

I'm a fan of lxml but I haven't done any HTML parsing in a while. lxml is written in C and BeautifulSoup in Python IIRC, which tends to be slower than C.

I think your best bet is to write a pet project, feed the same HTML to both, measure performance but also see if they behave the same way. Different parsers sometimes have different behaviors in corner cases or malformed input.