DEV Community

fooooo-png
fooooo-png

Posted on

XPath Tool Introduction

To help you get started working with XPath, this section will help you to build a basic understanding of XPath quickly and introduce its application in the web scraping tool, Octoparse.

Table of content:

  1. What is XPath?
  2. How to write an XPath?
  3. What is Octoparse XPath Tool?

1. What is XPath?

XPath(XML Path Language) is a query language for selecting elements from an XML/HTML document. It can help you find an element from the whole document precisely and quickly.

Web pages are generally in a language called HTML. If you load a web page on a browser(Chrome, Firefox, etc), you can easily access the corresponding HTML doc by hitting the F12 key. Everything you see on the webpage can be found within the HTML, such as images, blocks of text, links, etc.
1.png

Let's look at this following example to further explain how XPath works.
2.png

This image is apart of an HTML doc. It's easy to notice that there are 3 levels of the element in this HTML section.

Level 1: Bookstore

Level 2: Book

Level 3: Title, author, year and price.

Text with angle brackets() is called a tag. An HTML element usually consists of a start tag and an end tag, with the content inserted in between.

Content goes here...

XPath uses "/" to connect tags of different levels from the top to the bottom to specify the location of an element. For our example, if we want to locate the element "author", the XPath would be like:

/bookstore/book/author

That is pretty similar to a file structure as the below image shows.

3.png

We can conclude that XPath is the address for locating a precise place in an HTML doc.

2. How to write an XPath?

Writing an XPath is easy if you understand the logic of an HTML and the grammars of XPath.

Sounds easy? Yet it takes some time to learn. Here are some useful tutorials for beginners, as least for me.

HTML Tutorial

XPath0 Tutorial

XPath Basic

To make things easier for you, here is a cheat sheet of helpful XPath expressions to help you quickly target any elements in the HTML.

01.png
02.png
03.png

*Note that the attribute and text value are all case-sensitive.
*For a more exhaustive list of XPath expressions, check this out.

3. What is XPath Tool

We know the basic rules of writing an XPath and we can start writing it. Congratulation!

Yet, how can we know whether the XPath is correct or not? In this case, we should use an XPath tool to help with verification.

I would love to recommend 2 XPath Tools.

Octoparse offers an XPath tool to help you write XPath easily.

  • Chrome Add-on: XPath Helper

XPath Helper is a superb chrome extension that allows you to look up XPath by simply hovering over the element from the browser. You can also edit the XPath query directly in the console. You'll get the result(s) immediately so you know if your XPath is working correctly or not.

1585901457783.png

That's the end of the whole article. If you have better ideas on how to learn XPath more effectively, please leave the comment below!

Top comments (1)

Collapse
 
hnnx profile image
Nejc

Great article, will come in handy. I can also recommend freeformatter.com which helped me a lot - includes plenty of useful features when dealing with XML and HTML