As I was learning scraping I have done many tutorials on it. Some tutorials used XPath and others used god ol' CSS Selectors. Until today I always resorted to CSS because it was a familiar choice. All I new about XPath is that Scrapy uses it by default and any CSS selectors are converted to XPath behind the scenes. Being a complete noob at the time I did not give it much thought. Alas, the time has come for me to sink my teeth into this topic and understand what is the difference between these two types of selectors.
XPath stands for XML Path. It uses XML document and queries it to identify elements within it. The path part of XPath means that we need to specify the path from the beginning to the desired element.
- Allows navigation up the DOM when looking for elements
- More flexible than CSS Selectors
- Allows searching for full or partial text in element names with
CSS Selector uses styles specified in Cascading Style Sheet (CSS) to select desired elements. Most of the web pages online are styled using CSS and that makes CSS Selector a popular choice for a lot of people.
CSS relies on tags, class names, and ids among other things to select what we want. This is in contrast with XPath which uses tree-like structure to select the element.
- Simplicity because CSS is easy to pick up
- Faster than XPath because we can specify the exact element and completely disregard everything else on the page
- Allows attribute selection based on values assigned to them
- Allows pseudo selectors for elements whose state is declared with CSS, such as on-hover attributes and checkboxes
This a great table that shows the differences between XPath and CSS Selector made by Slotix :