XSS
Description
The commit adds escaping of CommonMark markdown metacharacters in Action Text markdown generation to ensure text nodes are treated as literal text by Markdown renderers, reducing the risk of XSS via injected markdown formatting. It introduces ActionText::MarkdownConversion.escape_markdown_text, updates Content#to_markdown to wrap attachment markdown in an action-text-markdown element (so content is passed through without escaping), and adjusts how markdown links are rendered to escape their text. Overall, it hardens markdown rendering against user-supplied content that could otherwise be interpreted as formatting (e.g., headings, links, or other markdown constructs) and potentially lead to XSS when rendered by a Markdown processor.
Proof of Concept
PoC (conceptual and executable steps):
- Rationale: Before this patch, text nodes inside HTML that contain CommonMark metacharacters could be treated as markdown formatting when converted to Markdown, potentially enabling markdown-driven injection or visual-delimiter-based manipulation in rendered output. The patch escapes these metacharacters so they render as literal text instead of formatting.
- PoC code (Rails console):
# Demonstrates escaping of common metacharacters in a string that would be interpreted by a Markdown renderer
html_with_script = "<p>Hello</p><script>alert('XSS')</script>"
# Post-patch: escaping is applied to text nodes, so the script tag would be escaped rather than executed.
escaped = ActionText::MarkdownConversion.escape_markdown_text("<script>alert('XSS')</script>")
puts escaped
# Expected output (shown as Ruby string):
# \<script\>alert('XSS')\</script\>
# To illustrate the render effect, if you pass the original HTML through the markdown converter in a post-patch app,
# the resulting markdown would contain the escaped characters, rendering literally as "<script>alert('XSS')</script>" in HTML,
# rather than executing a script.
- Expected outcome: After the patch, rendering the same content via to_markdown would escape the < and > characters in text nodes,
preventing a Markdown renderer from interpreting the content as HTML/formatting and mitigating XSS risks.
Note: The exact pre-patch behavior would depend on the Markdown renderer in use; the PoC focuses on demonstrating that escaping occurs for metacharacters and prevents execution when rendered.
Commit Details
Author: Mike Dalessio
Date: 2026-02-25 21:51 UTC
Message:
Improve character escaping in Action Text markdown generation (#56873)
Text nodes in HTML that contain CommonMark metacharacters (like `*`,
`[`, `]`, `#`, `<`, `>`, etc.) now have those characters backslash-escaped
so markdown renderers treat them as literal text rather than
formatting.
So, for example, Rich Text content like this:
<p>## Look at *this*</p>
does not become a markdown-formatted heading, instead it renders as
ordinary text:
\## Look at \*this\*
To preserve markdown generated by attachments'
`attachable_markdown_representation` method, `Content#to_markdown`
wraps the representation in <action-text-markdown> elements. The tree
walker recognizes these and passes their content through without
escaping.
Captions and filenames in `ActiveStorage::Blob` and `RemoteImage`
attachment representations are also escaped, and
`MarkdownConversion#escape_markdown_text` is now public for use by
application custom attachables.
This PR is likely escaping Markdown metacharacters where it's not
absolutely necessary. We tried to avoid escaping in some obvious
cases, but we can continue to refine this in the future by making the
regular expression more complex.
Co-authored-by: Jeremy Daer <jeremy@rubyonrails.org>
Triage Assessment
Vulnerability Type: XSS
Confidence: MEDIUM
Reasoning:
The commit adds escaping for CommonMark metacharacters in markdown output and adjusts how attachments are rendered to ensure markdown content is treated as literal text rather than interpreted as formatting. This reduces the risk that user-provided rich-text content could execute or render unintended HTML/markdown, which can lead to XSS or injection-like issues in Markdown renderers.
Verification Assessment
Vulnerability Type: XSS
Confidence: MEDIUM
Affected Versions: 8.1.3 and earlier
Code Diff
diff --git a/actiontext/lib/action_text/attachment.rb b/actiontext/lib/action_text/attachment.rb
index afbe151db5809..971e6872587bb 100644
--- a/actiontext/lib/action_text/attachment.rb
+++ b/actiontext/lib/action_text/attachment.rb
@@ -139,7 +139,7 @@ def to_plain_text
# include ActionText::Attachable
#
# def attachable_markdown_representation(caption)
- # "[@#{name}](#{Rails.application.routes.url_helpers.person_url(self)})"
+ # MarkdownConversion.markdown_link("@#{name}", Rails.application.routes.url_helpers.person_url(self))
# end
# end
#
diff --git a/actiontext/lib/action_text/content.rb b/actiontext/lib/action_text/content.rb
index 5490d0948069b..6fb9becf1b3df 100644
--- a/actiontext/lib/action_text/content.rb
+++ b/actiontext/lib/action_text/content.rb
@@ -143,7 +143,11 @@ def to_plain_text
# NOTE: that the returned string is not HTML safe and should not be rendered in
# browsers without additional sanitization.
def to_markdown
- render_attachments(with_full_attributes: false, &:to_markdown).fragment.to_markdown
+ render_attachments(with_full_attributes: false) { |attachment|
+ ActionText::HtmlConversion.create_element("action-text-markdown").tap do |node|
+ node.content = attachment.to_markdown
+ end
+ }.fragment.to_markdown
end
def to_trix_html
diff --git a/actiontext/lib/action_text/engine.rb b/actiontext/lib/action_text/engine.rb
index 86b38589efdd9..64061c8311ddb 100644
--- a/actiontext/lib/action_text/engine.rb
+++ b/actiontext/lib/action_text/engine.rb
@@ -61,7 +61,7 @@ def attachable_plain_text_representation(caption = nil)
end
def attachable_markdown_representation(caption = nil)
- "[#{caption || filename}]"
+ "[#{MarkdownConversion.escape_markdown_text((caption || filename).to_s)}]"
end
def to_trix_content_attachment_partial_path
diff --git a/actiontext/lib/action_text/markdown_conversion.rb b/actiontext/lib/action_text/markdown_conversion.rb
index 762c0139d5aea..3fce10ef272a3 100644
--- a/actiontext/lib/action_text/markdown_conversion.rb
+++ b/actiontext/lib/action_text/markdown_conversion.rb
@@ -9,6 +9,12 @@ module ActionText
#
# Converts an HTML fragment into a Markdown string. Used by `ActionText::Content#to_markdown`
# and `ActionText::Fragment#to_markdown` to produce Markdown representations of rich text.
+ #
+ # Example: `<h1>Release Notes</h1>` => `# Release Notes`, a markdown heading.
+ #
+ # Note that this converter escapes text nodes so it won't render as markdown.
+ #
+ # Example: `<p># Release Notes</p>` => `\# Release Notes`, not a heading.
module MarkdownConversion
extend self
@@ -29,7 +35,16 @@ def node_to_markdown(node)
# MarkdownConversion.markdown_link("photo", "https://example.com/photo_(large).png")
# # => "[photo](https://example.com/photo_%28large%29.png)"
def markdown_link(title, url)
- "[#{escape_link_text(title)}](#{encode_href(url)})"
+ "[#{escape_markdown_text(title)}](#{encode_href(url)})"
+ end
+
+ # Backslash-escapes CommonMark metacharacters in +text+ so they are treated
+ # as literal characters by Markdown renderers.
+ #
+ # MarkdownConversion.escape_markdown_text("**Important**")
+ # # => "\\*\\*Important\\*\\*"
+ def escape_markdown_text(text)
+ text.gsub(MARKDOWN_METACHARACTERS) { |c| "\\#{c}" }
end
private
@@ -38,14 +53,31 @@ def markdown_link(title, url)
LIST_BULLET = /\A(-|\d+\.) /
LIST_INDENT = " "
ENCODE_HREF_CHARS = /[() <>\n\r\t]/
- private_constant :BOLD_TAGS, :ITALIC_TAGS, :LIST_BULLET, :LIST_INDENT, :ENCODE_HREF_CHARS
+ MARKDOWN_METACHARACTERS = /
+ [\\`*_{}\[\]|~<>] # metacharacters that should be escaped generally
+ | \A\#(?=[\s\#]|\z) # leading hash before space or another hash: ATX heading
+ | \A=(?=[=\s]|\z) # leading equals before space or another equals: setext heading
+ | \A- # leading hyphen: list item, thematic break, or setext heading
+ | \A\+(?=\s|\z) # leading plus before space: list item
+ | \A\d+\K\.(?=\s|\z) # leading "1." with trailing space: ordered list item (only the dot is matched)
+ /x
+ SKIP_ESCAPING_PARENTS = %w[ action-text-markdown code pre ].freeze
+ INLINE_ELEMENTS = %w[
+ action-text-markdown
+ a abbr b bdi bdo cite code data dfn em i kbd mark q
+ rp rt ruby s samp small span strong sub sup time u var
+ ].freeze
+ private_constant :BOLD_TAGS, :ITALIC_TAGS, :LIST_BULLET, :LIST_INDENT, :ENCODE_HREF_CHARS,
+ :MARKDOWN_METACHARACTERS, :SKIP_ESCAPING_PARENTS, :INLINE_ELEMENTS
def markdown_for_node(node, child_values)
if node.text?
if node.content.blank? && !significant_whitespace?(node)
""
- else
+ elsif skip_markdown_escaping?(node)
node.content
+ else
+ escape_markdown_text(node.content)
end
elsif node.element?
method_name = :"visit_#{node.name.tr("-", "_")}"
@@ -134,7 +166,7 @@ def visit_ol(node, child_values)
def visit_a(node, child_values)
inner = join_children(child_values)
if (href = node["href"]) && Rails::HTML::Sanitizer.allowed_uri?(href)
- markdown_link(inner, href)
+ "[#{inner}](#{encode_href(href)})"
else
inner
end
@@ -162,6 +194,12 @@ def visit_hr(_node, _child_values)
"---\n\n"
end
+ # Attachment markdown is wrapped in <action-text-markdown> by Content#to_markdown so it passes
+ # through without text escaping.
+ def visit_action_text_markdown(_node, child_values)
+ join_children(child_values)
+ end
+
# Avoid including content from elements that aren't meaningful for markdown output
def visit__unsupported(_node, _child_values)
""
@@ -276,7 +314,12 @@ def inline_code(content)
end
def significant_whitespace?(node)
- node.previous_sibling&.text? && node.next_sibling&.text?
+ significant_inline_sibling?(node.previous_sibling) &&
+ significant_inline_sibling?(node.next_sibling)
+ end
+
+ def significant_inline_sibling?(sibling)
+ sibling&.text? || sibling&.name&.in?(INLINE_ELEMENTS)
end
def ancestor_named?(node, names, max_depth:)
@@ -293,8 +336,8 @@ def encode_href(href)
URI::RFC2396_PARSER.escape(href, ENCODE_HREF_CHARS)
end
- def escape_link_text(text)
- text.gsub(/[\\\[\]]/) { |c| "\\#{c}" }
+ def skip_markdown_escaping?(node)
+ node.parent&.name.in?(SKIP_ESCAPING_PARENTS)
end
end
end
diff --git a/actiontext/test/unit/markdown_conversion_test.rb b/actiontext/test/unit/markdown_conversion_test.rb
index 959bec6836ef5..409010fafd7fe 100644
--- a/actiontext/test/unit/markdown_conversion_test.rb
+++ b/actiontext/test/unit/markdown_conversion_test.rb
@@ -3,13 +3,144 @@
require "test_helper"
class ActionText::MarkdownConversionTest < ActiveSupport::TestCase
+ # --- Text tests ---
+
test "plain text passes through unchanged" do
assert_converted_to(
"hello world",
"hello world"
)
+ assert_converted_to(
+ "hello world",
+ "<p>hello world</p>"
+ )
+ end
+
+ test "empty content" do
+ assert_converted_to("", "")
+ end
+
+ test "leading and trailing whitespace is stripped" do
+ assert_converted_to("hello", "<p> hello </p>")
+ end
+
+ test "HTML entities are decoded" do
+ assert_converted_to(
+ 'asdf \< asdf & asdf \> asdf',
+ "<p>asdf < asdf & asdf > asdf</p>"
+ )
+ end
+
+ test "text escaping: angle bracket (HTML block)" do
+ assert_converted_to('\<!-- hidden --\>', "<p><!-- hidden --></p>")
+ end
+
+ test "text escaping: backslash (escape character)" do
+ assert_converted_to('a \\\\ b', '<p>a \\ b</p>')
+ end
+
+ test "text escaping: backtick (code span)" do
+ assert_converted_to('a \\` b', "<p>a ` b</p>")
+ end
+
+ test "text escaping: asterisk (emphasis)" do
+ assert_converted_to('\\*not italic\\*', "<p>*not italic*</p>")
+ assert_converted_to('\\*\\*not bold\\*\\*', "<p>**not bold**</p>")
+ end
+
+ test "text escaping: underscore (emphasis)" do
+ assert_converted_to('\\_not italic\\_', "<p>_not italic_</p>")
+ assert_converted_to('\\_\\_not bold\\_\\_', "<p>__not bold__</p>")
+ end
+
+ test "text escaping: curly braces (reserved for extensions)" do
+ assert_converted_to('\\{attr\\}', "<p>{attr}</p>")
+ end
+
+ test "text escaping: square brackets (link text)" do
+ assert_converted_to('\\[not a link\\]', "<p>[not a link]</p>")
+ end
+
+ test "text escaping: parentheses are not escaped because bracket escaping prevents link syntax" do
+ assert_converted_to("(aside)", "<p>(aside)</p>")
+ assert_converted_to("f(x)", "<p>f(x)</p>")
+ end
+
+ test "text escaping: hash (ATX heading)" do
+ assert_converted_to('\# not a heading', "<p># not a heading</p>")
+ assert_converted_to('\## not a heading', "<p>## not a heading</p>")
+ assert_converted_to('\#', "<p>#</p>")
+ assert_converted_to("> \\# not a heading", "<blockquote># not a heading</blockquote>")
+ end
+
+ test "text escaping: hash is not escaped in normal prose" do
+ assert_converted_to("C# is a language", "<p>C# is a language</p>")
+ assert_converted_to("issue #123", "<p>issue #123</p>")
+ assert_converted_to("#hashtag", "<p>#hashtag</p>")
end
+ test "text escaping: plus (unordered list item)" do
+ assert_converted_to('\+ not a list', "<p>+ not a list</p>")
+ end
+
+ test "text escaping: plus is not escaped in normal prose" do
+ assert_converted_to("C++ is a language", "<p>C++ is a language</p>")
+ assert_converted_to("a + b = c", "<p>a + b = c</p>")
+ assert_converted_to("+1 for this", "<p>+1 for this</p>")
+ end
+
+ test "text escaping: hyphen (list item, thematic break, setext heading)" do
+ assert_converted_to('\- not a list', "<p>- not a list</p>")
+ assert_converted_to('\---', "<p>---</p>")
+ assert_converted_to("> \\- not a list", "<blockquote>- not a list</blockquote>")
+ assert_converted_to('> \---', "<blockquote>---</blockquote>")
+ end
+
+ test "text escaping: equals (setext heading)" do
+ assert_converted_to('\===', "<p>===</p>")
+ assert_converted_to('\=', "<p>=</p>")
+ assert_converted_to('> \===', "<blockquote>===</blockquote>")
+ end
+
+ test "text escaping: equals is not escaped in normal prose" do
+ assert_converted_to("a = b", "<p>a = b</p>")
+ assert_converted_to("a == b", "<p>a == b</p>")
+ assert_converted_to("=foo", "<p>=foo</p>")
+ end
+
+ test "text escaping: hyphen is not escaped in normal prose" do
+ assert_converted_to("well-known fact", "<p>well-known fact</p>")
+ assert_converted_to("foo -- bar", "<p>foo -- bar</p>")
+ assert_converted_to("foo --- bar", "<p>foo --- bar</p>")
+ end
+
+ test "text escaping: dot (ordered list item)" do
+ assert_converted_to('1\. not a list', "<p>1. not a list</p>")
+ assert_converted_to('1\.', "<p>1.</p>")
+ end
+
+ test "text escaping: dot is not escaped in normal prose" do
+ assert_converted_to("end of sentence. start of next", "<p>end of sentence. start of next</p>")
+ assert_converted_to("e.g. for example", "<p>e.g. for example</p>")
+ assert_converted_to("Agent 007. Licensed to kill.", "<p>Agent 007. Licensed to kill.</p>")
+ assert_converted_to("3.1 this is not a list item", "<p>3.1 this is not a list item</p>")
+ end
+
+ test "text escaping: pipe (table delimiter)" do
+ assert_converted_to('a \\| b', "<p>a | b</p>")
+ end
+
+ test "text escaping: tilde (strikethrough)" do
+ assert_converted_to('\\~\\~not struck\\~\\~', "<p>~~not struck~~</p>")
+ end
+
+ test "text escaping: greater-than (block quote)" do
+ assert_converted_to('\> not a quote', "<p>> not a quote</p>")
+ assert_converted_to('\> not a quote', "<p>> not a quote</p>")
+ end
+
+ # --- Tag tests ---
+
test "<strong> tags are converted to bold" do
assert_converted_to("**hello**", "<strong>hello</strong>")
end
@@ -295,14 +426,14 @@ class ActionText::MarkdownConversionTest < ActiveSupport::TestCase
test "<a> tags with bracket injection in link text are escaped" do
assert_converted_to(
- "[text\\](https://evil.com)](https://example.com)",
+ '[text\](https://evil.com)](https://example.com)',
'<a href="https://example.com">text](https://evil.com)</a>'
)
end
test "<a> tags with backslash+bracket injection in link text are escaped" do
assert_converted_to(
- "[text\\\\\\](https://evil.com)](https://example.com)",
+ '[text\\\\\](https://evil.com)](https://example.com)',
'<a href="https://example.com">text\](https://evil.com)</a>'
)
end
@@ -402,21 +533,6 @@ class ActionText::MarkdownConversionTest < ActiveSupport::TestCase
)
end
- test "empty content" do
- assert_converted_to("", "")
- end
-
- test "leading and trailing whitespace is stripped" do
- assert_converted_to("hello", "<p> hello </p>")
- end
-
- test "HTML entities are decoded" do
- assert_converted_to(
- "asdf < asdf & asdf > asdf",
- "<p>asdf < asdf & asdf > asdf</p>"
- )
- end
-
test "unknown elements pass through their content" do
assert_converted_to("hello", "<asdf>hello</asdf>")
end
@@ -453,6 +569,57 @@ class ActionText::MarkdownConversionTest < ActiveSupport::TestCase
)
end
+ test "attachment with surrounding text" do
+ assert_converted_to(
+ "Hello world! ",
+ 'Hello world! <action-text-attachment url="http://example.com/cat.jpg" content-type="image/jpeg" caption="Cat"></action-text-attachment>'
+ )
+ end
+
+ test "multiple attachments separated by whitespace preserve the whitespace" do
+ assert_converted_to(
+ " ",
+ '<action-text-attachment content-type="image/jpeg" url="https://example.com/a.jpg" caption="A"></action-text-attachment> <action-text-attachment content-type="image/jpeg" url="https://example.com/b.jpg" caption="B"></action-text-attachment>'
+ )
+ end
+
+ test "multiple attachments and text separated by whitespace preserve the whitespace" do
+ assert_converted_to(
+ " and ",
+ '<action-text-attachment content-type="image/jpeg" url="https://example.com/a.jpg" caption="A"></action-text-attachment> and <action-text-attachment content-type="image/jpeg" url="https://example.com/b.jpg" caption="B"></action-text-attachment>'
+ )
+ end
+
+ test "attachment with whitespace before inline element preserves the whitesp
... [truncated]