DEV Community

Aniket Satbhai
Aniket Satbhai

Posted on

Using htmlq to filter web data

Similar to the jq, the htmlq facilitates the filtering of html data. It can be utilized along with the curl command.

To filter with id: article-body

$ curl -s https://dev.to/anks/using-jq-to-filter-json-data-36c5 | htmlq '#article-body'

Enter fullscreen mode Exit fullscreen mode

This will filter all codeblocks on a specified dev.to page:

$ curl -s https://dev.to/anks/using-jq-to-filter-json-data-36c5 | htmlq '[class="highlight js-code-highlight"]'
Enter fullscreen mode Exit fullscreen mode

To filter out non-code text from the page:

$ curl -s https://dev.to/anks/using-jq-to-filter-json-data-36c5 | htmlq '#article-body>p'
<p>Basic Elements</p>
<p>n ∉ [0, ∞), int</p>
<p>Ex.</p>
<p>file.json<br>
</p>
<p>To filter ids:<br>
</p>
<p>To return value of <code>name</code> key when id is 1<br>
</p>
<p>To filter ids as json<br>
</p>
<p>Ref. :<br>
<a href="https://stedolan.github.io/jq/">https://stedolan.github.io/jq/</a><br>
<a href="https://programminghistorian.org/en/lessons/json-and-jq">https://programminghistorian.org/en/lessons/json-and-jq</a></p>
Enter fullscreen mode Exit fullscreen mode

To filter out non-code text from the page and to return the output in text format:

$ curl -s https://dev.to/anks/using-jq-to-filter-json-data-36c5 | htmlq -t '#article-body>p'
Basic Elements
n ∉ [0, ∞), int
Ex.
file.json

To filter ids:

To return value of name key when id is 1

To filter ids as json

Ref. :
https://stedolan.github.io/jq/
https://programminghistorian.org/en/lessons/json-and-jq

Enter fullscreen mode Exit fullscreen mode

Ref.
https://github.com/mgdm/htmlq

Top comments (0)