DEV Community

Cover image for JavaScript removing HTML tags
Chris Bongers
Chris Bongers

Posted on • Originally published at daily-dev-tips.com

JavaScript removing HTML tags

I recently needed to remove all HTML from the content of my own application.

In this case, it was to share a plain text version for meta descriptions, but it can be used for several outputs.

Today I'll show you two ways of doing this, which are not fully safe if your application accepts user inputs.

Users love to break scripts like this and especially method one can give you some vulnerabilities.

1. JavaScript removing HTML tags with innerHTML

One method is to create a temporary HTML element and get the innerText from it.

const original = `<h1>Welcome to my blog</h1>
<p>Some more content here</p><br /><img alt="a > 2" src="img.jpg" />`;

let removeHTML = input => {
    let tmp = document.createElement('div');
    tmp.innerHTML = input;
    return tmp.textContent || tmp.innerText || '';
} 
console.log(removeHTML(original));
Enter fullscreen mode Exit fullscreen mode

This will result in the following:

'Welcome to my blog
Some more content here'
Enter fullscreen mode Exit fullscreen mode

As you can see we removed every HTML tag including a bogus image.

2. JavaScript removing HTML tags with regex

My personal favourite for my own applications is using a regex, just a cleaner solution and I trust my own inputs to be valid HTML.

How it works:

const original = `<h1>Welcome to my blog</h1>
<p>Some more content here</p><br /><img src="img.jpg" />`;

const regex = original.replace(/<[^>]*>/g, '');
console.log(regex);
Enter fullscreen mode Exit fullscreen mode

This will result in:

'Welcome to my blog
Some more content here'
Enter fullscreen mode Exit fullscreen mode

As you can see, we removed the heading, paragraph, break and image.
This is because we escape all < > formats.

It could be breached by something silly like:

const original = `<h1>Welcome to my blog</h1>
<p>Some more content here</p><br /><img alt="a > 2" src="img.jpg" />`;
Enter fullscreen mode Exit fullscreen mode

I know it's not valid HTML anyhow and one should use &gt; for this.

But running this will result in:

'Welcome to my blog
Some more content here 2" src="img.jpg" />'
Enter fullscreen mode Exit fullscreen mode

It's just something to be aware of.

You can have a play with both methods in this Codepen.

Thank you for reading, and let's connect!

Thank you for reading my blog. Feel free to subscribe to my email newsletter and connect on Facebook or Twitter

Top comments (1)

Collapse
 
lexlohr profile image
Alex Lohr

Back in the old jQuery days, there was a long discussion if parent.innerHTML = '' or parent.parentNode.replaceChild(parent.cloneNode(), parent) was the better way to remove all children from a node. I believe they went with the latter then.