If you need to remove all links occurrencies in a webpage and return it as plain text you can go with two methods:
Basic method
If you know the content of all your href
s you can use this basic way. In my example, the href
content is a simple hash #
.
get the page content as string
replace all start anchor tags using the JavaScript method String.prototype.replace() with the RegEx
/g
global pattern flag: it returns all matches (it does not return after first match)
.replace(/<a href="#">/g, '')
- replace all end anchor tags using RegEx
.replace(/<\/a>/g, '')
- concatenate the two replace functions
const mystring = `<a href="#">The cat</a> (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family <a href="#">Felidae</a> and is often referred to as the domestic cat to distinguish it from the wild members of the family.`
mystring.replace(/<a href="#">/g, '').replace(/<\/a>/g, '')
Output
The cat (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family Felidae and is often referred to as the domestic cat to distinguish it from the wild members of the family.
Advanced method
In this way, we can remove all anchor tag instances from our string, even those we do not know the href
s.
Thanks to @shadowtime2000 for pointing that out π
- create a new DOM element
let elem = document.createElement('div')
- add the inner HTML passed as string
elem.innerHTML = `<a href="https://en.wikipedia.org/wiki/Cat">The cat</a> (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family <a href="#">Felidae</a> and is often referred to as the <strong>domestic cat</strong> to distinguish it from the wild members of the family.`
- loop through all
<div>
children (so we are looping the HTML string we have just passed) and check if there are any<a>
tag.
𧨠!important
tagName
returns the tag name of the element, and in HTML language it is in uppercase.
Array.from(elem.children).forEach(child => {
// if child is an HTML tag different from an anchor <a>, then skip it
if (!(child.tagName === 'A')) return
// else if child is an anchor tag,
// then replace the current node with a new textNode containing the anchor text content
// <a href="#">wow!</a> -> wow!
child.replaceWith(document.createTextNode(child.textContent))
})
To sum up
let elem = document.createElement('div')
elem.innerHTML = `<a href="https://en.wikipedia.org/wiki/Cat">The cat</a> (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family <a href="#">Felidae</a> and is often referred to as the <strong>domestic cat</strong> to distinguish it from the wild members of the family.`
Array.from(elem.children).forEach(child => {
if (!(child.tagName === 'A')) return
child.replaceWith(document.createTextNode(child.textContent));
})
Output: console.log(elem)
<div>The cat (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family Felidae and is often referred to as the <strong>domestic cat</strong> to distinguish it from the wild members of the family.</div>
π More info
Top comments (2)
Using Regex might be one of the worst ways to do this. A better way is to modify a DOM tree instead.
Thank you @shadowtime2000 , your way is definitely better πͺπ» I updated the post adding your solution. Thanks!