DEV Community

James Candan
James Candan

Posted on

Use Javascript Regex to Find All IDs That Contain a String and Copy the Text to the Clipboard

Web scraping is a powerful tool. I sometimes find that spinning up a full-fledged Beautiful Soup Python script for a short task is unnecessary. Today I had an issue with a web page that would not allow me to select items in a table to copy, and, even if it did, I would have had the additional, unwanted column data in my clipboard.

Solution: Console web scraping

Let's break this down.

First, what I wanted was a way to capture each element. The text I desired from the table was wrapped in a <div id="edit-tid-24-view"></div> tag. I tried targeting them first by a "begins with" filter:

document.querySelectorAll('[id^="edit-tid"]');

This got me part of the way there, but I needed to target ID attribute values that not only started with this, but ended with -view. In typical Regex, you might do something like /edit-tid.*-view/. A bit greedy, but would have done the trick in my case. However, we don't really get to use Regex in querySelectors. So, I combined two filters: one for the beginning portion, the other for the ending portion.

document.querySelectorAll('[id^="edit-tid"][id$="-view"]');

After that, it was quite simple. I wanted to loop through the NodeList object that was returned, so I had to first convert it to an Array.

Array.from(someObject);

Once there, I could have mapped the innerText of each Node from the DOM to an array of the desired strings.

Array.from(someObject).map(function(item) { return item.text; });

However, I was not satisfied with that.

I wanted my list cleanly output, and piped directly to my clipboard. Javascript allows one to select and execute a copy command on the document object. However I was working in the console, and found something much simpler: the copy function works in the console.

I simply concatenated the strings together with a carriage return, and copied the result to my clipboard.

Conclusion

Here's my Developer Tools Console web scraper in all it's glory.

copyText = ''; 
Array.from(
    document.querySelectorAll('[id^="edit-tid"][id$="-view"]'))
    .forEach(function (x) { 
        copyText += x.text + '\n' 
    }
); 
copy(copyText);

Discussion (0)