Giulia Chiola

Posted on Jan 16, 2021 • Edited on Jan 26, 2021 • Originally published at giuliachiola.dev

How to remove all links in JavaScript

#javascript #regex

If you need to remove all links occurrencies in a webpage and return it as plain text you can go with two methods:

Basic method

If you know the content of all your hrefs you can use this basic way. In my example, the href content is a simple hash #.

get the page content as string
replace all start anchor tags using the JavaScript method String.prototype.replace() with the RegEx /g global pattern flag: it returns all matches (it does not return after first match)

.replace(/<a href="#">/g, '')

replace all end anchor tags using RegEx

.replace(/<\/a>/g, '')

concatenate the two replace functions

const mystring = `<a href="#">The cat</a> (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family <a href="#">Felidae</a> and is often referred to as the domestic cat to distinguish it from the wild members of the family.`

mystring.replace(/<a href="#">/g, '').replace(/<\/a>/g, '')

Output

The cat (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family Felidae and is often referred to as the domestic cat to distinguish it from the wild members of the family.

Live RegEx example

Advanced method

In this way, we can remove all anchor tag instances from our string, even those we do not know the hrefs.
Thanks to @shadowtime2000 for pointing that out 🙂

create a new DOM element

let elem = document.createElement('div')

add the inner HTML passed as string

elem.innerHTML = `<a href="https://en.wikipedia.org/wiki/Cat">The cat</a> (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family <a href="#">Felidae</a> and is often referred to as the <strong>domestic cat</strong> to distinguish it from the wild members of the family.`

loop through all <div> children (so we are looping the HTML string we have just passed) and check if there are any <a> tag.

🧨 !important

tagName returns the tag name of the element, and in HTML language it is in uppercase.

Array.from(elem.children).forEach(child => {
  // if child is an HTML tag different from an anchor <a>, then skip it
  if (!(child.tagName === 'A')) return
  // else if child is an anchor tag,
  // then replace the current node with a new textNode containing the anchor text content
  // <a href="#">wow!</a> -> wow!
  child.replaceWith(document.createTextNode(child.textContent))
})

To sum up

let elem = document.createElement('div')

elem.innerHTML = `<a href="https://en.wikipedia.org/wiki/Cat">The cat</a> (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family <a href="#">Felidae</a> and is often referred to as the <strong>domestic cat</strong> to distinguish it from the wild members of the family.`

Array.from(elem.children).forEach(child => {
  if (!(child.tagName === 'A')) return
  child.replaceWith(document.createTextNode(child.textContent));
})

Output: console.log(elem)

<div>The cat (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family Felidae and is often referred to as the <strong>domestic cat</strong> to distinguish it from the wild members of the family.</div>

📚 More info

RegEx 101 playground

A Practical Guide to Regular Expressions (RegEx) In JavaScript

tagName - MDN

Top comments (2)

shadowtime2000 • Jan 16 '21

Using Regex might be one of the worst ways to do this. A better way is to modify a DOM tree instead.

var elem = document.createElement("div");

elem.innerHTML = "ur text here";

Array.from(elem.children).forEach(child => {
    if (!(child.tagName === "a")) return;
    child.replaceWith(document.createTextNode(child.textContent));
})

Giulia Chiola • Jan 26 '21

Thank you @shadowtime2000 , your way is definitely better 💪🏻 I updated the post adding your solution. Thanks!