DEV Community

Cover image for How to sanitize XML tags in Rails
Vladislav Kopylov
Vladislav Kopylov

Posted on

How to sanitize XML tags in Rails

Once I noticed that we can sanitize XML-tags using rails-html-sanitizer and loofah gems. And I want to share the knowledge.

For example, imagine the task, we have a string that contains some HTML-tags.

html_string = <<-STR
<p>
  <span>some text is here</span>
  <a><img src="lala.png" /></a>
</p>
STR
Enter fullscreen mode Exit fullscreen mode

We want to sanitize the string, but don't delete <img> tag.

scrubber = Rails::Html::PermitScrubber.new
scrubber.tags = ['img']
scrubber.attributes = ['src']
html_fragment = Loofah.fragment(html_string)
html_fragment.scrub!(scrubber)

puts html_fragment.to_s
Enter fullscreen mode Exit fullscreen mode

Of course, it works perfectly, and our result is here.

# some text is here
# <img src="lala.png">
Enter fullscreen mode Exit fullscreen mode

Unfortunately, it won't work with tags which name contains symbols :, -. XML-tags often contain those symbols.

xml_string = <<-STR
<item>
  <title>A Life in Russia</title>
  <description>What do you knot about Russia?</description>
  <dc:creator>Sasha Troianovski</dc:creator>
  <media:content height="150" medium="image" url="https://static.worldtimes.com/images/2099/02/13/world/some_photo.jpg" width="151"/>
  <media:credit>Sasha Troianovski for The World Times</media:credit>
  <media:description>Amazing travel to Russia</media:description>
</item>
STR
Enter fullscreen mode Exit fullscreen mode

For example, we want to sanitize a new string, but we need to keep media:content, media:credit and media:description tags.

scrubber = Rails::Html::PermitScrubber.new
scrubber.tags = ['media:content', 'media:credit', 'media:description']
html_fragment = Loofah.fragment(xml_string)
html_fragment.scrub!(scrubber)

puts html_fragment.to_s
Enter fullscreen mode Exit fullscreen mode

Unfortunately, it doesn't work properly, and our result is.

# A Life in Russia
# What do you knot about Russia?
# Sasha Troianovski

# Sasha Troianovski for The World Times
# Amazing travel to Russia
Enter fullscreen mode Exit fullscreen mode

How to solve the problem? Loofah is able to work with XML but we have to tune up a parser and use .xml_fragment instead of .fragment.

scrubber = Rails::Html::PermitScrubber.new
scrubber.tags = ['media:content', 'media:credit', 'media:description']
xml_fragment = Loofah.xml_fragment(xml_string)
xml_fragment.scrub!(scrubber)

puts xml_fragment.to_s
Enter fullscreen mode Exit fullscreen mode

And here is our result.

# A Life in Russia
# What do you knot about Russia?
# Sasha Troianovski
# <media:content height="150" width="151"/>
# <media:credit>Sasha Troianovski for The World Times</media:credit>
# <media:description>Amazing travel to Russia</media:description>
Enter fullscreen mode Exit fullscreen mode

It works perfectly ๐Ÿ˜Š

Top comments (0)