Skip to content

Commit 4122362

Browse files
authored
Merge pull request #158 from rails/flavorjones-support-html5-parsing
support html5 parsing
2 parents 5e3bc32 + 53e9aa8 commit 4122362

13 files changed

+1406
-972
lines changed

.github/workflows/ci.yml

+10
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,16 @@ on:
1616
- '*'
1717

1818
jobs:
19+
rubocop:
20+
runs-on: ubuntu-latest
21+
steps:
22+
- uses: actions/checkout@v3
23+
- uses: ruby/setup-ruby@v1
24+
with:
25+
ruby-version: "3.2"
26+
bundler-cache: true
27+
- run: bundle exec rubocop
28+
1929
cruby:
2030
strategy:
2131
fail-fast: false

.rdoc_options

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
encoding: UTF-8
3+
static_path: []
4+
rdoc_include: []
5+
page_dir:
6+
charset: UTF-8
7+
exclude:
8+
- "~\\z"
9+
- "\\.orig\\z"
10+
- "\\.rej\\z"
11+
- "\\.bak\\z"
12+
- "\\.gemspec\\z"
13+
- "issues"
14+
- "Gemfile*"
15+
- "Rakefile"
16+
hyperlink_all: false
17+
line_numbers: false
18+
locale:
19+
locale_dir: locale
20+
locale_name:
21+
main_page: "README.md"
22+
markup: rdoc
23+
output_decoration: true
24+
show_hash: false
25+
skip_tests: true
26+
tab_width: 8
27+
template_stylesheets: []
28+
title:
29+
visibility: :protected
30+
webcvs:

CHANGELOG.md

+5
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@
99

1010
*Mike Dalessio*
1111

12+
* `Rails::Html` has been renamed to `Rails::HTML`, but this module is aliased to `Rails::Html` for
13+
backwards compatibility.
14+
15+
*Mike Dalessio*
16+
1217

1318
## 1.5.0 / 2023-01-20
1419

Gemfile

-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ gemspec
66

77
gem "rake"
88
gem "minitest"
9-
gem "rails-dom-testing"
109

1110
group :rubocop do
1211
gem "rubocop", ">= 1.25.1", require: false

README.md

+88-43
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,15 @@
1-
# Rails Html Sanitizers
1+
# Rails HTML Sanitizers
22

33
In Rails 4.2 and above this gem will be responsible for sanitizing HTML fragments in Rails
44
applications, i.e. in the `sanitize`, `sanitize_css`, `strip_tags` and `strip_links` methods.
55

6-
Rails Html Sanitizer is only intended to be used with Rails applications. If you need similar functionality in non Rails apps consider using [Loofah](https://github.com/flavorjones/loofah) directly (that's what handles sanitization under the hood).
7-
8-
## Installation
9-
10-
Add this line to your application's Gemfile:
11-
12-
gem 'rails-html-sanitizer'
13-
14-
And then execute:
15-
16-
$ bundle
17-
18-
Or install it yourself as:
19-
20-
$ gem install rails-html-sanitizer
6+
Rails HTML Sanitizer is only intended to be used with Rails applications. If you need similar functionality in non Rails apps consider using [Loofah](https://github.com/flavorjones/loofah) directly (that's what handles sanitization under the hood).
217

228
## Usage
239

2410
### A note on HTML entities
2511

26-
__Rails::HTML sanitizers are intended to be used by the view layer, at page-render time. They are *not* intended to sanitize persisted strings that will sanitized *again* at page-render time.__
12+
__Rails HTML sanitizers are intended to be used by the view layer, at page-render time. They are *not* intended to sanitize persisted strings that will sanitized *again* at page-render time.__
2713

2814
Proper HTML sanitization will replace some characters with HTML entities. For example, `<` will be replaced with `&lt;` to ensure that the markup is well-formed.
2915

@@ -47,62 +33,101 @@ You might simply choose to persist the untrusted string as-is (the raw input), a
4733

4834
That raw string, if rendered in an non-HTML context (like SMS), must also be sanitized by a method appropriate for that context. You may wish to look into using [Loofah](https://github.com/flavorjones/loofah) or [Sanitize](https://github.com/rgrove/sanitize) to customize how this sanitization works, including omitting HTML entities in the final string.
4935

50-
If you really want to sanitize the string that's stored in your database, you may wish to look into [Loofah::ActiveRecord](https://github.com/flavorjones/loofah-activerecord) rather than use the Rails::HTML sanitizers.
36+
If you really want to sanitize the string that's stored in your database, you may wish to look into [Loofah::ActiveRecord](https://github.com/flavorjones/loofah-activerecord) rather than use the Rails HTML sanitizers.
37+
38+
39+
### A note on module names
40+
41+
In versions < 1.6, the only module defined by this library was `Rails::Html`. Starting in 1.6, we define three additional modules:
42+
43+
- `Rails::HTML` for general functionality (replacing `Rails::Html`)
44+
- `Rails::HTML4` containing sanitizers that parse content as HTML4
45+
- `Rails::HTML5` containing sanitizers that parse content as HTML5
46+
47+
The following aliases are maintained for backwards compatibility:
48+
49+
- `Rails::Html` points to `Rails::HTML`
50+
- `Rails::HTML::FullSanitizer` points to `Rails::HTML4::FullSanitizer`
51+
- `Rails::HTML::LinkSanitizer` points to `Rails::HTML4::LinkSanitizer`
52+
- `Rails::HTML::SafeListSanitizer` points to `Rails::HTML4::SafeListSanitizer`
5153

5254

5355
### Sanitizers
5456

55-
All sanitizers respond to `sanitize`.
57+
All sanitizers respond to `sanitize`, and are available in variants that use either HTML4 or HTML5 parsing, under the `Rails::HTML4` and `Rails::HTML5` namespaces, respectively.
5658

5759
#### FullSanitizer
5860

5961
```ruby
60-
full_sanitizer = Rails::Html::FullSanitizer.new
62+
full_sanitizer = Rails::HTML5::FullSanitizer.new
6163
full_sanitizer.sanitize("<b>Bold</b> no more! <a href='more.html'>See more here</a>...")
6264
# => Bold no more! See more here...
6365
```
6466

67+
or, if you insist on parsing the content as HTML4:
68+
69+
```ruby
70+
full_sanitizer = Rails::HTML4::FullSanitizer.new
71+
full_sanitizer.sanitize("<b>Bold</b> no more! <a href='more.html'>See more here</a>...")
72+
# => Bold no more! See more here...
73+
```
74+
75+
HTML5 version:
76+
77+
78+
6579
#### LinkSanitizer
6680

6781
```ruby
68-
link_sanitizer = Rails::Html::LinkSanitizer.new
82+
link_sanitizer = Rails::HTML5::LinkSanitizer.new
6983
link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
7084
# => Only the link text will be kept.
7185
```
7286

87+
or, if you insist on parsing the content as HTML4:
88+
89+
```ruby
90+
link_sanitizer = Rails::HTML4::LinkSanitizer.new
91+
link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
92+
# => Only the link text will be kept.
93+
```
94+
95+
7396
#### SafeListSanitizer
7497

98+
This sanitizer is also available as an HTML4 variant, but for simplicity we'll document only the HTML5 variant below.
99+
75100
```ruby
76-
safe_list_sanitizer = Rails::Html::SafeListSanitizer.new
101+
safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new
77102

78103
# sanitize via an extensive safe list of allowed elements
79104
safe_list_sanitizer.sanitize(@article.body)
80105

81-
# safe list only the supplied tags and attributes
106+
# sanitize only the supplied tags and attributes
82107
safe_list_sanitizer.sanitize(@article.body, tags: %w(table tr td), attributes: %w(id class style))
83108

84-
# safe list via a custom scrubber
109+
# sanitize via a custom scrubber
85110
safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
86111

87-
# safe list sanitizer can also sanitize css
88-
safe_list_sanitizer.sanitize_css('background-color: #000;')
112+
# prune nodes from the tree instead of stripping tags and leaving inner content
113+
safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new(prune: true)
89114

90-
# fully prune nodes from the tree instead of stripping tags and leaving inner content
91-
safe_list_sanitizer = Rails::Html::SafeListSanitizer.new(prune: true)
115+
# the sanitizer can also sanitize css
116+
safe_list_sanitizer.sanitize_css('background-color: #000;')
92117
```
93118

94119
### Scrubbers
95120

96121
Scrubbers are objects responsible for removing nodes or attributes you don't want in your HTML document.
97122

98-
This gem includes two scrubbers `Rails::Html::PermitScrubber` and `Rails::Html::TargetScrubber`.
123+
This gem includes two scrubbers `Rails::HTML::PermitScrubber` and `Rails::HTML::TargetScrubber`.
99124

100-
#### `Rails::Html::PermitScrubber`
125+
#### `Rails::HTML::PermitScrubber`
101126

102127
This scrubber allows you to permit only the tags and attributes you want.
103128

104129
```ruby
105-
scrubber = Rails::Html::PermitScrubber.new
130+
scrubber = Rails::HTML::PermitScrubber.new
106131
scrubber.tags = ['a']
107132

108133
html_fragment = Loofah.fragment('<a><img/ ></a>')
@@ -113,31 +138,31 @@ html_fragment.to_s # => "<a></a>"
113138
By default, inner content is left, but it can be removed as well.
114139

115140
```ruby
116-
scrubber = Rails::Html::PermitScrubber.new
141+
scrubber = Rails::HTML::PermitScrubber.new
117142
scrubber.tags = ['a']
118143

119144
html_fragment = Loofah.fragment('<a><span>text</span></a>')
120145
html_fragment.scrub!(scrubber)
121146
html_fragment.to_s # => "<a>text</a>"
122147

123-
scrubber = Rails::Html::PermitScrubber.new(prune: true)
148+
scrubber = Rails::HTML::PermitScrubber.new(prune: true)
124149
scrubber.tags = ['a']
125150

126151
html_fragment = Loofah.fragment('<a><span>text</span></a>')
127152
html_fragment.scrub!(scrubber)
128153
html_fragment.to_s # => "<a></a>"
129154
```
130155

131-
#### `Rails::Html::TargetScrubber`
156+
#### `Rails::HTML::TargetScrubber`
132157

133158
Where `PermitScrubber` picks out tags and attributes to permit in sanitization,
134-
`Rails::Html::TargetScrubber` targets them for removal. See https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb for the tag list.
159+
`Rails::HTML::TargetScrubber` targets them for removal. See https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb for the tag list.
135160

136161
**Note:** by default, it will scrub anything that is not part of the permitted tags from
137162
loofah `HTML5::Scrub.allowed_element?`.
138163

139164
```ruby
140-
scrubber = Rails::Html::TargetScrubber.new
165+
scrubber = Rails::HTML::TargetScrubber.new
141166
scrubber.tags = ['img']
142167

143168
html_fragment = Loofah.fragment('<a><img/ ></a>')
@@ -148,14 +173,14 @@ html_fragment.to_s # => "<a></a>"
148173
Similarly to `PermitScrubber`, nodes can be fully pruned.
149174

150175
```ruby
151-
scrubber = Rails::Html::TargetScrubber.new
176+
scrubber = Rails::HTML::TargetScrubber.new
152177
scrubber.tags = ['span']
153178

154179
html_fragment = Loofah.fragment('<a><span>text</span></a>')
155180
html_fragment.scrub!(scrubber)
156181
html_fragment.to_s # => "<a>text</a>"
157182

158-
scrubber = Rails::Html::TargetScrubber.new(prune: true)
183+
scrubber = Rails::HTML::TargetScrubber.new(prune: true)
159184
scrubber.tags = ['span']
160185

161186
html_fragment = Loofah.fragment('<a><span>text</span></a>')
@@ -167,7 +192,7 @@ html_fragment.to_s # => "<a></a>"
167192
You can also create custom scrubbers in your application if you want to.
168193

169194
```ruby
170-
class CommentScrubber < Rails::Html::PermitScrubber
195+
class CommentScrubber < Rails::HTML::PermitScrubber
171196
def initialize
172197
super
173198
self.tags = %w( form script comment blockquote )
@@ -180,7 +205,7 @@ class CommentScrubber < Rails::Html::PermitScrubber
180205
end
181206
```
182207

183-
See `Rails::Html::PermitScrubber` documentation to learn more about which methods can be overridden.
208+
See `Rails::HTML::PermitScrubber` documentation to learn more about which methods can be overridden.
184209

185210
#### Custom Scrubber in a Rails app
186211

@@ -190,18 +215,36 @@ Using the `CommentScrubber` from above, you can use this in a Rails view like so
190215
<%= sanitize @comment, scrubber: CommentScrubber.new %>
191216
```
192217
218+
## Installation
219+
220+
Add this line to your application's Gemfile:
221+
222+
gem 'rails-html-sanitizer'
223+
224+
And then execute:
225+
226+
$ bundle
227+
228+
Or install it yourself as:
229+
230+
$ gem install rails-html-sanitizer
231+
232+
193233
## Read more
194234
195235
Loofah is what underlies the sanitizers and scrubbers of rails-html-sanitizer.
236+
196237
- [Loofah and Loofah Scrubbers](https://github.com/flavorjones/loofah)
197238
198239
The `node` argument passed to some methods in a custom scrubber is an instance of `Nokogiri::XML::Node`.
240+
199241
- [`Nokogiri::XML::Node`](https://nokogiri.org/rdoc/Nokogiri/XML/Node.html)
200242
- [Nokogiri](http://nokogiri.org)
201243
202-
## Contributing to Rails Html Sanitizers
203244
204-
Rails Html Sanitizers is work of many contributors. You're encouraged to submit pull requests, propose features and discuss issues.
245+
## Contributing to Rails HTML Sanitizers
246+
247+
Rails HTML Sanitizers is work of many contributors. You're encouraged to submit pull requests, propose features and discuss issues.
205248
206249
See [CONTRIBUTING](CONTRIBUTING.md).
207250
@@ -211,5 +254,7 @@ Trying to report a possible security vulnerability in this project? Please
211254
check out our [security policy](https://rubyonrails.org/security) for
212255
guidelines about how to proceed.
213256
257+
214258
## License
215-
Rails Html Sanitizers is released under the [MIT License](MIT-LICENSE).
259+
260+
Rails HTML Sanitizers is released under the [MIT License](MIT-LICENSE).

lib/rails-html-sanitizer.rb

+1-21
Original file line numberDiff line numberDiff line change
@@ -8,27 +8,7 @@
88
require_relative "rails/html/sanitizer"
99

1010
module Rails
11-
module Html
12-
class Sanitizer
13-
class << self
14-
def full_sanitizer
15-
Html::FullSanitizer
16-
end
17-
18-
def link_sanitizer
19-
Html::LinkSanitizer
20-
end
21-
22-
def safe_list_sanitizer
23-
Html::SafeListSanitizer
24-
end
25-
26-
def white_list_sanitizer
27-
safe_list_sanitizer
28-
end
29-
end
30-
end
31-
end
11+
Html = HTML # :nodoc:
3212
end
3313

3414
module ActionView

0 commit comments

Comments
 (0)