Once I noticed that we can sanitize XML-tags using rails-html-sanitizer and loofah gems. And I want to share the knowledge.
For example, imagine the task, we have a string that contains some HTML-tags.
html_string = <<-STR
<p>
<span>some text is here</span>
<a><img src="lala.png" /></a>
</p>
STR
We want to sanitize the string, but don't delete <img>
tag.
scrubber = Rails::Html::PermitScrubber.new
scrubber.tags = ['img']
scrubber.attributes = ['src']
html_fragment = Loofah.fragment(html_string)
html_fragment.scrub!(scrubber)
puts html_fragment.to_s
Of course, it works perfectly, and our result is here.
# some text is here
# <img src="lala.png">
Unfortunately, it won't work with tags which name contains symbols :
, -
. XML-tags often contain those symbols.
xml_string = <<-STR
<item>
<title>A Life in Russia</title>
<description>What do you knot about Russia?</description>
<dc:creator>Sasha Troianovski</dc:creator>
<media:content height="150" medium="image" url="https://static.worldtimes.com/images/2099/02/13/world/some_photo.jpg" width="151"/>
<media:credit>Sasha Troianovski for The World Times</media:credit>
<media:description>Amazing travel to Russia</media:description>
</item>
STR
For example, we want to sanitize a new string, but we need to keep media:content, media:credit and media:description tags.
scrubber = Rails::Html::PermitScrubber.new
scrubber.tags = ['media:content', 'media:credit', 'media:description']
html_fragment = Loofah.fragment(xml_string)
html_fragment.scrub!(scrubber)
puts html_fragment.to_s
Unfortunately, it doesn't work properly, and our result is.
# A Life in Russia
# What do you knot about Russia?
# Sasha Troianovski
# Sasha Troianovski for The World Times
# Amazing travel to Russia
How to solve the problem? Loofah
is able to work with XML but we have to tune up a parser and use .xml_fragment
instead of .fragment
.
scrubber = Rails::Html::PermitScrubber.new
scrubber.tags = ['media:content', 'media:credit', 'media:description']
xml_fragment = Loofah.xml_fragment(xml_string)
xml_fragment.scrub!(scrubber)
puts xml_fragment.to_s
And here is our result.
# A Life in Russia
# What do you knot about Russia?
# Sasha Troianovski
# <media:content height="150" width="151"/>
# <media:credit>Sasha Troianovski for The World Times</media:credit>
# <media:description>Amazing travel to Russia</media:description>
It works perfectly 😊
Oldest comments (0)