-
Notifications
You must be signed in to change notification settings - Fork 31
/
index.bs
862 lines (687 loc) · 37.1 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
<pre class="metadata">
Title: HTML Sanitizer API
Status: CG-DRAFT
Group: WICG
URL: https://wicg.github.io/sanitizer-api/
Repository: WICG/sanitizer-api
Shortname: sanitizer-api
Level: 1
Editor: Frederik Braun 68466, Mozilla, [email protected], https://frederik-braun.com
Editor: Mario Heiderich, Cure53, [email protected], https://cure53.de
Editor: Daniel Vogelheim, Google LLC, [email protected], https://www.google.com
Abstract:
This document specifies a set of APIs which allow developers to take
untrusted HTML input and sanitize it for safe insertion into a document's
DOM.
Indent: 2
Work Status: exploring
Boilerplate: omit conformance
Markup Shorthands: css off, markdown on
</pre>
<pre class="link-defaults">
spec:html; type:attribute; text: innerHTML
spec:dom; type:method; text: createDocumentFragment
spec:html; type:dfn; text: template contents
</pre>
<pre class="anchors">
text: window.toStaticHTML(); type: method; url: https://msdn.microsoft.com/en-us/library/cc848922(v=vs.85).aspx
text: parse HTML from a string; type: dfn; url: https://html.spec.whatwg.org/#parse-html-from-a-string
</pre>
<pre class="biblio">
{
"DOMPURIFY": {
"href": "https://github.com/cure53/DOMPurify",
"title": "DOMPurify",
"publisher": "Cure53"
},
"MXSS": {
"href": "https://cure53.de/fp170.pdf",
"title": "mXSS Attacks: Attacking well-secured Web-Applications by using innerHTML Mutations",
"publisher": "Ruhr-Universität Bochum"
}
}
</pre>
<style>
/* Boxes around algorithms. */
[data-algorithm]:not(.heading) {
padding: .5em;
border: thin solid #ddd; border-radius: .5em;
margin: .5em calc(-0.5em - 1px);
}
[data-algorithm]:not(.heading) > :first-child { margin-top: 0; }
[data-algorithm]:not(.heading) > :last-child { margin-bottom: 0; }
[data-algorithm] [data-algorithm] { margin: 1em 0; }
</style>
# Introduction # {#intro}
<em>This section is not normative.</em>
Web applications often need to work with strings of HTML on the client side,
perhaps as part of a client-side templating solution, perhaps as part of
rendering user generated content, etc. It is difficult to do so in a safe way.
The naive approach of joining strings together and stuffing them into
an {{Element}}'s {{Element/innerHTML}} is fraught with risk, as it can cause
JavaScript execution in a number of unexpected ways.
Libraries like [[DOMPURIFY]] attempt to manage this problem by carefully
parsing and sanitizing strings before insertion, by constructing a DOM and
filtering its members through an allow-list. This has proven to be a fragile
approach, as the parsing APIs exposed to the web don't always map in
reasonable ways to the browser's behavior when actually rendering a string as
HTML in the "real" DOM. Moreover, the libraries need to keep on top of
browsers' changing behavior over time; things that once were safe may turn
into time-bombs based on new platform-level features.
The browser has a fairly good idea of when it is going to
execute code. We can improve upon the user-space libraries by teaching the
browser how to render HTML from an arbitrary string in a safe manner, and do
so in a way that is much more likely to be maintained and updated along with
the browser's own changing parser implementation. This document outlines an
API which aims to do just that.
## Goals ## {#goals}
* Mitigate the risk of DOM-based cross-site scripting attacks by providing
developers with mechanisms for handling user-controlled HTML which prevent
direct script execution upon injection.
* Make HTML output safe for use within the current user agent, taking into
account its current understanding of HTML.
* Allow developers to override the default set of elements and attributes.
Adding certain elements and attributes can prevent
<a href="https://github.com/google/security-research-pocs/tree/master/script-gadgets">script gadget</a>
attacks.
## API Summary ## {#api-summary}
The Sanitizer API offers functionality to parse a string containing HTML into
a DOM tree, and to filter the resulting tree according to a user-supplied
configuration. The methods come in two by two flavours:
* <dfn>Safe and unsafe</dfn>: The "safe" methods will not generate any markup that
executes script. That is, they should be safe from XSS. The "unsafe" methods
will parse and filter whatever they're supposed to.
See also: [[#security-considerations]].
* Context: Methods are defined on {{Element}} and {{ShadowRoot}} and will
replace these {{Node}}'s children, and are largely analogous to {{Element/innerHTML}}.
There are also static methods on the {{Document}}, which parse an entire
document are largely analogous to {{DOMParser}}.{{parseFromString()}}.
# Framework # {#framework}
## Sanitizer API ## {#sanitizer-api}
The {{Element}} interface defines two methods, {{Element/setHTML()}} and
{{Element/setHTMLUnsafe()}}. Both of these take a {{DOMString}} with HTML
markup, and an optional configuration.
<pre class="idl extract">
partial interface Element {
[CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
[CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>
<div algorithm>
{{Element}}'s <dfn for="Element" export>setHTMLUnsafe</dfn>(|html|, |options|) method steps are:
1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
{{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "Element setHTMLUnsafe", and "script".
1. Let |target| be [=this=]'s [=template contents=] if [=this=] is a
{{HTMLTemplateElement|template}} element; otherwise [=this=].
1. [=Set and filter HTML=] given |target|, [=this=], |compliantHTML|, |options|, and false.
</div>
<div algorithm>
{{Element}}'s <dfn for="Element" export>setHTML</dfn>(|html|, |options|) method steps are:
1. Let |target| be [=this=]'s [=template contents=] if [=this=] is a
{{HTMLTemplateElement|template}}; otherwise [=this=].
1. [=Set and filter HTML=] given |target|, [=this=], |html|, |options|, and true.
</div>
<pre class="idl extract">
partial interface ShadowRoot {
[CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
[CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>
These methods are mirrored on the {{ShadowRoot}}:
<div algorithm>
{{ShadowRoot}}'s <dfn for="ShadowRoot" export>setHTMLUnsafe</dfn>(|html|, |options|) method steps are:
1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
{{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "ShadowRoot setHTMLUnsafe", and "script".
1. [=Set and filter HTML=] using [=this=],
[=this=]'s [=shadow host=] (as context element),
|compliantHTML|, |options|, and false.
</div>
<div algorithm>
{{ShadowRoot}}'s <dfn for="ShadowRoot" export>setHTML</dfn>(|html|, |options|)</dfn> method steps are:
1. [=Set and filter HTML=] using [=this=] (as target), [=this=] (as context element),
|html|, |options|, and true.
</div>
The {{Document}} interface gains two new methods which parse an entire {{Document}}:
<pre class="idl extract">
partial interface Document {
static Document parseHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
static Document parseHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>
<div algorithm>
The <dfn for="Document" export>parseHTMLUnsafe</dfn>(|html|, |options|) method steps are:
1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
{{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "Document parseHTMLUnsafe", and "script".
1. Let |document| be a new {{Document}}, whose [=Document/content type=] is "text/html".
Note: Since |document| does not have a browsing context, scripting is disabled.
1. Set |document|'s [=allow declarative shadow roots=] to true.
1. [=Parse HTML from a string=] given |document| and |compliantHTML|.
1. Let |sanitizer| be the result of calling [=get a sanitizer instance from options=]
with |options|.
1. Call [=sanitize=] on |document|'s [=tree/root|root node=] with |sanitizer| and false.
1. Return |document|.
</div>
<div algorithm>
The <dfn for="Document" export>parseHTML</dfn>(|html|, |options|) method steps are:
1. Let |document| be a new {{Document}}, whose [=Document/content type=] is "text/html".
Note: Since |document| does not have a browsing context, scripting is disabled.
1. Set |document|'s [=allow declarative shadow roots=] to true.
1. [=Parse HTML from a string=] given |document| and |html|.
1. Let |sanitizer| be the result of calling [=get a sanitizer instance from options=]
with |options|.
1. Call [=sanitize=] on |document|'s [=tree/root|root node=] with |sanitizer| and true.
1. Return |document|.
</div>
## SetHTML options and the configuration object. ## {#configobject}
The family of {{Element/setHTML()}}-like methods all accept an options
dictionary. Right now, only one member of this dictionary is defined:
<pre class=idl>
enum SanitizerPresets { "default" };
dictionary SetHTMLOptions {
(Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = "default";
};
dictionary SetHTMLUnsafeOptions {
(Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = {};
};
</pre>
The {{Sanitizer}} configuration object encapsulates a filter configuration.
The same configuration can be used with both <a lt="safe and unsafe">"safe"
or "unsafe"</a> methods, where the "safe" methods perform an implicit
{{removeUnsafe}} operation on the passed in configuration and have a default
configuration when none is passed. The intent is
that one (or a few) configurations will be built-up early on in a page's
lifetime, and can then be used whenever needed. This allows implementations
to pre-process configurations.
The configuration object can be queried to return a configuration dictionary.
It can also be modified directly.
<pre class=idl>
[Exposed=(Window,Worker)]
interface Sanitizer {
constructor(optional (SanitizerConfig or SanitizerPresets) configuration = "default");
// Query configuration:
SanitizerConfig get();
// Modify a Sanitizer's lists and fields:
undefined allowElement(SanitizerElementWithAttributes element);
undefined removeElement(SanitizerElement element);
undefined replaceElementWithChildren(SanitizerElement element);
undefined allowAttribute(SanitizerAttribute attribute);
undefined removeAttribute(SanitizerAttribute attribute);
undefined setComments(boolean allow);
undefined setDataAttributes(boolean allow);
// Remove markup that executes script. May modify multiple lists:
undefined removeUnsafe();
};
</pre>
A {{Sanitizer}} has an associated <dfn for="Sanitizer">configuration</dfn>, a {{SanitizerConfig}}.
<div algorithm>
The <dfn for="Sanitizer" export>constructor</dfn>(|configuration|)
method steps are:
1. If |configuration| is a {{SanitizerPresets}} [=string=], then:
1. [=Assert=]: |configuration| [=is=] {{SanitizerPresets/default}}.
1. Set |configuration| to the [=built-in safe default configuration=].
1. Let |valid| be the return value of [=set a configuration|setting=] |configuration| on [=this=].
1. If |valid| is false, then throw a {{TypeError}}.
</div>
<div algorithm>
The <dfn for="Sanitizer" export>get</dfn>() method steps are to return the value of [=this=]'s [=Sanitizer/configuration=].
</div>
<div algorithm>
The <dfn for="Sanitizer" export>allowElement</dfn>(|element|) method steps are to [=allow an element=] with |element| and [=this=]'s [=Sanitizer/configuration=].
</div>
<div algorithm>
The <dfn for="Sanitizer" export>removeElement</dfn>(|element|) method steps are
to [=remove an element=] with |element| and [=this=]'s [=Sanitizer/configuration=].
</div>
<div algorithm>
The <dfn for="Sanitizer" export>replaceElementWithChildren</dfn>(|element|) method steps are to [=replace an element with its children=] with |element| and [=this=]'s [=Sanitizer/configuration=].
</div>
<div algorithm>
The <dfn for="Sanitizer" export>allowAttribute</dfn>(|attribute|) method steps are to [=allow an attribute=] with |attribute| and [=this=]'s [=Sanitizer/configuration=].
</div>
<div algorithm>
The <dfn for="Sanitizer" export>removeAttribute</dfn>(|attribute|) method steps are to [=Sanitizer/remove an attribute=] with |attribute| and [=this=]'s [=Sanitizer/configuration=].
</div>
<div algorithm>
The <dfn for="Sanitizer" export>setComments</dfn>(|allow|) method steps to [=set comments=] with |allow| and [=this=]'s [=Sanitizer/configuration=].
</div>
<div algorithm>
The <dfn for="Sanitizer" export>setDataAttributes</dfn>(|allow|) method steps are to [=set data attributes=] with |allow| and [=this=]'s [=Sanitizer/configuration=].
</div>
<div algorithm>
The <dfn for="Sanitizer" export>removeUnsafe</dfn>() method steps are to
update [=this=]'s [=Sanitizer/configuration=] with the result of calling [=remove unsafe=]
on [=this=]'s [=Sanitizer/configuration=].
</div>
## The Configuration Dictionary ## {#config}
<pre class=idl>
dictionary SanitizerElementNamespace {
required DOMString name;
DOMString? _namespace = "http://www.w3.org/1999/xhtml";
};
// Used by "elements"
dictionary SanitizerElementNamespaceWithAttributes : SanitizerElementNamespace {
sequence<SanitizerAttribute> attributes;
sequence<SanitizerAttribute> removeAttributes;
};
typedef (DOMString or SanitizerElementNamespace) SanitizerElement;
typedef (DOMString or SanitizerElementNamespaceWithAttributes) SanitizerElementWithAttributes;
dictionary SanitizerAttributeNamespace {
required DOMString name;
DOMString? _namespace = null;
};
typedef (DOMString or SanitizerAttributeNamespace) SanitizerAttribute;
dictionary SanitizerConfig {
sequence<SanitizerElementWithAttributes> elements;
sequence<SanitizerElement> removeElements;
sequence<SanitizerElement> replaceWithChildrenElements;
sequence<SanitizerAttribute> attributes;
sequence<SanitizerAttribute> removeAttributes;
boolean comments;
boolean dataAttributes;
};
</pre>
# Algorithms # {#algorithms}
<div algorithm>
To <dfn>set and filter HTML</dfn>, given an {{Element}} or {{DocumentFragment}}
|target|, an {{Element}} |contextElement|, a [=string=] |html|, and a
[=dictionary=] |options|, and a [=boolean=] |safe|:
1. If |safe| and |contextElement|'s [=Element/local name=] is "`script`" and
|contextElement|'s [=Element/namespace=] is the [=HTML namespace=] or the
[=SVG namespace=], then return.
1. Let |sanitizer| be the result of calling [=get a sanitizer instance from options=]
with |options|.
1. Let |newChildren| be the result of the HTML [=fragment parsing algorithm steps=]
given |contextElement|, |html|, and true.
1. Let |fragment| be a new {{DocumentFragment}} whose [=node document=] is |contextElement|'s [=node document=].
1. [=list/iterate|For each=] |node| in |newChildren|, [=list/append=] |node| to |fragment|.
1. Run [=sanitize=] on |fragment| using |sanitizer| and |safe|.
1. [=Replace all=] with |fragment| within |target|.
</div>
<div algorithm>
To <dfn for="SanitizerConfig">get a sanitizer instance from options</dfn> from
a [=dictionary=] |options|, do:
Note: This algorithm works for both {{SetHTMLOptions}} and
{{SetHTMLUnsafeOptions}}. They only differ in the defaults.
1. Let |sanitizerSpec| be "{{SanitizerPresets/default}}".
1. If |options|["{{SetHTMLOptions/sanitizer}}"] [=map/exists=], then:
1. Set |sanitizerSpec| to |options|["{{SetHTMLOptions/sanitizer}}"]
1. [=Assert=]: |sanitizerSpec| is either a {{Sanitizer}} instance,
a [=string=] which is a {{SanitizerPresets}} member, or a [=dictionary=].
1. If |sanitizerSpec| is a [=string=]:
1. [=Assert=]: |sanitizerSpec| [=is=] "{{SanitizerPresets/default}}"
1. Set |sanitizerSpec| to the [=built-in safe default configuration=].
1. [=Assert=]: |sanitizerSpec| is either a {{Sanitizer}} instance,
or a [=dictionary=].
1. If |sanitizerSpec| is a [=dictionary=]:
1. Let |sanitizer| be a new {{Sanitizer}} instance.
1. Let |setConfigurationResult| be the result of [=set a configuration=]
with |sanitizerSpec| on |sanitizer|.
1. If |setConfigurationResult| is false, [=throw=] a {{TypeError}}.
1. Set |sanitizerSpec| to |sanitizer|.
1. [=Assert=]: |sanitizerSpec| is a {{Sanitizer}} instance.
1. Return |sanitizerSpec|.
</div>
## Sanitization Algorithms ## {#sanitization}
<div algorithm>
For the main <dfn>sanitize</dfn> operation, using a {{ParentNode}} |node|, a
{{Sanitizer}} |sanitizer|, and a [=boolean=] |safe|, run these steps:
1. Let |configuration| be the value of |sanitizer|'s [=Sanitizer/configuration=].
1. If |safe| is true, then set |configuration| to the result of calling [=remove unsafe=] on |configuration|.
1. Call [=sanitize core=] on |node|, |configuration|, and with [=handleJavascriptNavigationUrls=] set to |safe|.
</div>
<div algorithm="sanitize core">
The <dfn>sanitize core</dfn> operation,
using a {{ParentNode}} |node|, a {{SanitizerConfig}} |configuration|, and a
[=boolean=] <var><dfn>handleJavascriptNavigationUrls</dfn></var>, iterates over the DOM tree
beginning with |node|, and may recurse to handle some special cases (e.g.
template contents). It consistes of these steps:
1. Let |current| be |node|.
1. [=list/iterate|For each=] |child| in |current|'s [=tree/children=]:
1. [=Assert=]: |child| [=implements=] {{Text}}, {{Comment}}, or {{Element}}.
Note: Currently, this algorithm is only called on output of the HTML
parser for which this assertion should hold. If in the future
this algorithm will be used in different contexts, this assumption
needs to be re-examined.
1. If |child| [=implements=] {{Text}}:
1. [=continue=].
1. else if |child| [=implements=] {{Comment}}:
1. If |configuration|["{{SanitizerConfig/comments}}"] is not true:
1. [=/remove=] |child|.
1. else:
1. Let |elementName| be a {{SanitizerElementNamespace}} with |child|'s
[=Element/local name=] and [=Element/namespace=].
1. If |configuration|["{{SanitizerConfig/removeElements}}"] [=SanitizerConfig/contains=] |elementName|, or if |configuration|["{{SanitizerConfig/elements}}"] is not [=list/empty=] and does not [=SanitizerConfig/contain=] |elementName|:
1. [=/remove=] |child|.
1. If |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"] [=SanitizerConfig/contains=] |elementName|:
1. Call [=sanitize core=] on |child| with |configuration| and
|handleJavascriptNavigationUrls|.
1. Call [=replace all=] with |child|'s [=tree/children=] within |child|.
1. If |elementName| [=equals=] «[ "`name`" → "`template`",
"`namespace`" → [=HTML namespace=] ]»
1. Then call [=sanitize core=] on |child|'s [=template contents=] with
|configuration| and |handleJavascriptNavigationUrls|.
1. If |child| is a [=shadow host=]:
1. Then call [=sanitize core=] on |child|'s [=Element/shadow root=] with
|configuration| and |handleJavascriptNavigationUrls|.
1. [=list/iterate|For each=] |attribute| in |child|'s [=Element/attribute list=]:
1. Let |attrName| be a {{SanitizerAttributeNamespace}} with |attribute|'s
[=Attr/local name=] and [=Attr/namespace=].
1. If |configuration|["{{SanitizerConfig/removeAttributes}}"]
[=SanitizerConfig/contains=] |attrName|:
1. Remove |attribute| from |child|.
1. If |configuration|["{{SanitizerConfig/elements}}"]["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
[=SanitizerConfig/contains=] |attrName|:
1. Remove |attribute| from |child|.
1. If all of the following are false, then remove |attribute| from |child|.
- |configuration|["{{SanitizerConfig/attributes}}"] [=list/exists=] and
[=SanitizerConfig/contains=] |attrName|
- |configuration|["{{SanitizerConfig/elements}}"]["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
[=SanitizerConfig/contains=] |attrName|
- "data-" is a [=code unit prefix=] of [=Attr/local name=] and
[=Attr/namespace=] is `null` and
|configuration|["{{SanitizerConfig/dataAttributes}}"] is true
1. If |handleJavascriptNavigationUrls| and «[|elementName|, |attrName|]» matches an entry in the
[=built-in navigating URL attributes list=], and if |attribute|'s [=protocol=] is
"`javascript:`":
1. Then remove |attribute| from |child|.
</div>
## Configuration Processing ## {#configuration-processing}
<div algorithm>
To <dfn for="SanitizerConfig">allow an element</dfn> |element| with a {{SanitizerConfig}} |configuration|, do:
1. Set |element| to the result of [=canonicalize a sanitizer element with attributes=] with |element|.
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/elements}}"].
1. [=list/Append=] |element| to |configuration|["{{SanitizerConfig/elements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/removeElements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"].
NOTE: Handling of [=allowElement=] is a little more complicated than the other
methods, because the element allow list can have per-element allow- and
remove-attribute lists. We first remove the given element from the list
before then adding it, which has the effect of re-setting (rather than
merging or elsehow modifying) the per-element list to whatever is passed
in. In other words, the per-element allow- and remove-lists can only be
set as a whole.
NOTE: [=SanitizerConfig/Remove=] matches on name and namespace, so adding an
element with attributes would still remove the matching element from the
{{SanitizerConfig/removeElements}} and {{SanitizerConfig/replaceWithChildrenElements}} lists.
</div>
<div algorithm>
To <dfn for="Sanitizer">remove an element</dfn> |element| from a {{SanitizerConfig}} |configuration|, do:
1. Set |element| to the result of [=canonicalize a sanitizer element=] with |element|.
1. [=SanitizerConfig/Add=] |element| to |configuration|["{{SanitizerConfig/removeElements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/elements}}"] list.
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"].
</div>
<div algorithm>
To <dfn for="Sanitizer">replace an element with its children</dfn> |element| from a {{SanitizerConfig}} |configuration|, do:
1. Set |element| to the result of [=canonicalize a sanitizer element=] with |element|.
1. [=SanitizerConfig/Add=] |element| to |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/removeElements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/elements}}"] list.
</div>
<div algorithm>
To <dfn for="Sanitizer">allow an attribute</dfn> |attribute| on a {{SanitizerConfig}} |configuration|, do:
1. Set |attribute| to the result of [=canonicalize a sanitizer attribute=] with |attribute|.
1. [=SanitizerConfig/Add=] |attribute| to |configuration|["{{SanitizerConfig/attributes}}"].
1. [=SanitizerConfig/Remove=] |attribute| from |configuration|["{{SanitizerConfig/removeAttributes}}"].
</div>
<div algorithm>
To <dfn for="Sanitizer">remove an attribute</dfn> |attribute| from a {{SanitizerConfig}} |configuration|, do:
1. Set |attribute| to the result of [=canonicalize a sanitizer attribute=] with |attribute|.
1. [=SanitizerConfig/Add=] |attribute| to |configuration|["{{SanitizerConfig/removeAttributes}}"].
1. [=SanitizerConfig/Remove=] |attribute| from |configuration|["{{SanitizerConfig/attributes}}"].
</div>
<div algorithm>
To <dfn for="Sanitizer">set comments</dfn> with |allow| on a {{SanitizerConfig}} |configuration|, do:
1. Set |configuration|["{{SanitizerConfig/comments}}"] to |allow|.
</div>
<div algorithm>
To <dfn for="Sanitizer">set data attributes</dfn> with |allow| on a {{SanitizerConfig}} |configuration|, do:
1. Set |configuration|["{{SanitizerConfig/dataAttributes}}"] to |allow|.
</div>
<div algorithm>
Note: While this algorithm is called [=remove unsafe=], we use
<a href="#security-considerations">the term "unsafe" strictly in the sense
of this spec</a>, to denote content that will
execute JavaScript when inserted into the document. In other words, this
method will remove oportunities for XSS.
To <dfn for="SanitizerConfig">remove unsafe</dfn> from a |configuration|, do this:
1. [=Assert=]: The [=built-in safe baseline configuration=] has
{{SanitizerConfig/removeElements}} and {{SanitizerConfig/removeAttributes}}
keys set, but not {{SanitizerConfig/elements}},
{{SanitizerConfig/replaceWithChildrenElements}}, or
{{SanitizerConfig/attributes}}.
1. Let |result| be a copy of |configuration|.
1. [=list/For each=] |element| in
[=built-in safe baseline configuration=][{{SanitizerConfig/removeElements}}]:
1. Call [=remove an element=] with |element| and |result|.
1. [=list/For each=] |attribute| in
[=built-in safe baseline configuration=][{{SanitizerConfig/removeAttributes}}]:
1. Call [=Sanitizer/remove an attribute=] with |attribute| and |result|.
1. Return |result|.
</div>
<div algorithm>
To <dfn for="Sanitizer">set a configuration</dfn>, given a [=dictionary=] |configuration| and a {{Sanitizer}} |sanitizer|:
1. [=list/iterate|For each=] |element| of |configuration|["{{SanitizerConfig/elements}}"] do:
1. Call [=allow an element=] with |element| and |sanitizer|.
1. [=list/iterate|For each=] |element| of |configuration|["{{SanitizerConfig/removeElements}}"] do:
1. Call [=remove an element=] with |element| and |sanitizer|.
1. [=list/iterate|For each=] |element| of |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"] do:
1. Call [=replace an element with its children=] with |element| and |sanitizer|.
1. [=list/iterate|For each=] |attribute| of |configuration|["{{SanitizerConfig/attributes}}"] do:
1. Call [=allow an attribute=] with |attribute| and |sanitizer|.
1. [=list/iterate|For each=] |attribute| of |configuration|["{{SanitizerConfig/removeAttributes}}"] do:
1. Call [=Sanitizer/remove an attribute=] with |attribute| and |sanitizer|.
1. Call [=set comments=] with |configuration|["{{SanitizerConfig/comments}}"] and |sanitizer|.
1. Call [=set data attributes=] with |configuration|["{{SanitizerConfig/dataAttributes}}"] and |sanitizer|.
1. Return whether all of the following are true:
- [=list/size=] of |configuration|["{{SanitizerConfig/elements}}"] equals
[=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/elements}}"].
- [=list/size=] of |configuration|["{{SanitizerConfig/removeElements}}"] equals
[=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/removeElements}}"].
- [=list/size=] of |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"] equals
[=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/replaceWithChildrenElements}}"].
- [=list/size=] of |configuration|["{{SanitizerConfig/attributes}}"] equals
[=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/attributes}}"].
- [=list/size=] of |configuration|["{{SanitizerConfig/removeAttributes}}"] equals
[=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/removeAttributes}}"].
- Either |configuration|["{{SanitizerConfig/elements}}"] or
|configuration|["{{SanitizerConfig/removeElements}}"] [=map/exist=],
or neither, but not both.
- Either |configuration|["{{SanitizerConfig/attributes}}"] or
|configuration|["{{SanitizerConfig/removeAttributes}}"] [=map/exist=],
or neither, but not both.
Note: Previous versions of this spec had elaborate definitions of how to
canonicalize a config. This has now effectively been moved into the method
definitions.
Note: This operation is defined in terms of the manipulation methods on the
{{Sanitizer}}. Those methods remove matching entries from other lists.
The size equality steps in the last step would then catch this.
For example:
`{ allow: ["div", "div"] }` would create a Sanitizer with one element in
the allow list. The final test would then return false, which would cause
the caller to throw an exception.
Issue: This is still missing error checks for the per-element attribute lists
and syntax errors.
</div>
<div algorithm>
In order to <dfn>canonicalize a sanitizer element with attributes</dfn> a {{SanitizerElementWithAttributes}} |element|, do this:
1. Let |result| be the result of [=canonicalize a sanitizer element=] with |element|.
1. If |element| is a [=dictionary=]:
1. [=list/iterate|For each=] |attribute| in
|element|["{{SanitizerElementNamespaceWithAttributes/attributes}}"]:
1. [=SanitizerConfig/Add=] the result of [=canonicalize a sanitizer attribute=] with |attribute| to |result|["{{SanitizerElementNamespaceWithAttributes/attributes}}"].
1. [=list/iterate|For each=] |attribute| in
|element|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]:
1. [=SanitizerConfig/Add=] the result of [=canonicalize a sanitizer attribute=] with |attribute| to |result|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"].
1. Return |result|.
</div>
<div algorithm>
In order to <dfn>canonicalize a sanitizer element</dfn> a
{{SanitizerElement}} |element|,
return the result of [=canonicalize a sanitizer name=] with |element| and the [=HTML namespace=] as the default namespace.
</div>
<div algorithm>
In order to <dfn>canonicalize a sanitizer attribute</dfn> a
{{SanitizerAttribute}} |attribute|,
return the result of [=canonicalize a sanitizer name=] with |attribute| and `null` as the default namespace.
</div>
<div algorithm>
In order to <dfn>canonicalize a sanitizer name</dfn> |name|, with a default
namespace |defaultNamespace|, run the following steps:
1. [=Assert=]: |name| is either a {{DOMString}} or a [=dictionary=].
1. If |name| is a {{DOMString}}, then return «[ "`name`" → |name|, "`namespace`" → |defaultNamespace|]».
1. [=Assert=]: |name| is a [=dictionary=] and |name|["name"] [=map/exists=].
1. Return «[ <br>
"`name`" → |name|["name"], <br>
"`namespace`" → ( |name|["namespace"] if it [=map/exists=], otherwise |defaultNamespace| ) <br>
]».
</div>
## Supporting Algorithms ## {#alg-support}
For the [=canonicalize a sanitizer name|canonicalized=]
{{SanitizerElementNamespace|element}} and {{SanitizerAttributeNamespace|attribute name}} lists
used in this spec, list membership is based on matching both "`name`" and "`namespace`"
entries:
<div algorithm>
A Sanitizer name |list| <dfn for="SanitizerConfig">contains</dfn> an |item|
if there exists an |entry| of |list| that is an [=ordered map=], and where
|item|["name"] [=equals=] |entry|["name"] and
|item|["namespace"] [=equals=] |entry|["namespace"].
</div>
<div algorithm>
To <dfn for="SanitizerConfig">remove</dfn> an |item| from a |list| that is an
[=ordered map=], [=list/remove=] all |entry| from |list|
where |item|["name"] [=equals=] |entry|["name"] and
|item|["namespace"] [=equals=] |entry|["namespace"].
</div>
<div algorithm>
To <dfn for="SanitizerConfig">add</dfn> a |name| to a |list|, where |name| is
[=canonicalize a sanitizer name|canonicalized=] and |list| is an [=ordered map=]:
1. If |list| [=SanitizerConfig/contains=] |name|, then return.
1. [=list/Append=] |name| to |list|.
</div>
<div algorithm>
Equality for [=ordered sets=] is equality of its members, but without
regard to order:
[=Ordered sets=] |A| and |B| are <dfn for=set>equal</dfn> if both |A| is a
[=superset=] of |B| and |B| is a [=superset=] of |A|.
</div>
## Defaults ## {#sanitization-defaults}
There are three builtins:
* The [=built-in safe default configuration=],
* the [=built-in safe baseline configuration=], and
* the [=built-in navigating URL attributes list=].
The <dfn>built-in safe default configuration</dfn> is as follows:
```
{
elements: [ ... ],
attributes: [ ... ],
}
```
The <dfn>built-in safe baseline configuration</dfn> is meant to block only
script-content, and nothing else. It is as follows:
```
{
removeElements: [
{ name: "script", namespace: "http://www.w3.org/1999/xhtml" },
{ name: "script", namespace: "http://www.w3.org/2000/svg" }
],
removeAttributes: [....],
}
```
<div>
The <dfn>built-in navigating URL attributes list</dfn>, for which "`javascript:`"
navigations are "unsafe", are as follows:
«[
<br>
[
{ "`name`" → "`a`", "`namespace`" → [=HTML namespace=] },
{ "`name`" → "`href`", "`namespace`" → `null` }
],
<br>
[
{ "`name`" → "`area`", "`namespace`" → [=HTML namespace=] },
{ "`name`" → "`href`", "`namespace`" → `null` }
],
<br>
[
{ "`name`" → "`button`", "`namespace`" → [=HTML namespace=] },
{ "`name`" → "`formaction`", "`namespace`" → `null` }
],
<br>
[
{ "`name`" → "`form`", "`namespace`" → [=HTML namespace=] },
{ "`name`" → "`action`", "`namespace`" → `null` }
],
<br>
[
{ "`name`" → "`iframe`", "`namespace`" → [=HTML namespace=] },
{ "`name`" → "`src`", "`namespace`" → `null` }
],
<br>
[
{ "`name`" → "`input`", "`namespace`" → [=HTML namespace=] },
{ "`name`" → "`formaction`", "`namespace`" → `null` }
],
<br>
]»
</div>
# Security Considerations # {#security-considerations}
The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting
by traversing a supplied HTML content and removing elements and attributes
according to a configuration. The specified API must not support
the construction of a Sanitizer object that leaves script-capable markup in
and doing so would be a bug in the threat model.
That being said, there are security issues which the correct usage of the
Sanitizer API will not be able to protect against and the scenarios will be
laid out in the following sections.
## Server-Side Reflected and Stored XSS ## {#server-side-xss}
<em>This section is not normative.</em>
The Sanitizer API operates solely in the DOM and adds a capability to traverse
and filter an existing DocumentFragment. The Sanitizer does not address
server-side reflected or stored XSS.
## DOM clobbering ## {#dom-clobbering}
<em>This section is not normative.</em>
DOM clobbering describes an attack in which malicious HTML confuses an
application by naming elements through `id` or `name` attributes such that
properties like `children` of an HTML element in the DOM are overshadowed by
the malicious content.
The Sanitizer API does not protect DOM clobbering attacks in its
default state, but can be configured to remove `id` and `name` attributes.
## XSS with Script gadgets ## {#script-gadgets}
<em>This section is not normative.</em>
Script gadgets are a technique in which an attacker uses existing application
code from popular JavaScript libraries to cause their own code to execute.
This is often done by injecting innocent-looking code or seemingly inert
DOM nodes that is only parsed and interpreted by a framework which then
performs the execution of JavaScript based on that input.
The Sanitizer API can not prevent these attacks, but requires page authors to
explicitly allow unknown elements in general, and authors must additionally
explicitly configure unknown attributes and elements and markup that is known
to be widely used for templating and framework-specific code,
like `data-` and `slot` attributes and elements like `<slot>` and `<template>`.
We believe that these restrictions are not exhaustive and encourage page
authors to examine their third party libraries for this behavior.
## Mutated XSS ## {#mutated-xss}
<em>This section is not normative.</em>
Mutated XSS or mXSS describes an attack based on parser context mismatches
when parsing an HTML snippet without the correct context. In particular,
when a parsed HTML fragment has been serialized to a string, the string is
not guaranteed to be parsed and interpreted exactly the same when inserted
into a different parent element. An example for carrying out such an attack
is by relying on the change of parsing behavior for foreign content or
mis-nested tags.
The Sanitizer API offers only functions that turn a string into a node tree.
The context is supplied implicitly by all sanitizer functions:
`Element.setHTML()` uses the current element; `Document.parseHTML()` creates a
new document. Therefore Sanitizer API is not directly affected by mutated XSS.
If a developer were to retrieve a sanitized node tree as a string, e.g. via
`.innerHTML`, and to then parse it again then mutated XSS may occur.
We discourage this practice. If processing or passing of HTML as a
string should be necessary after all, then any string should be considered
untrusted and should be sanitized (again) when inserting it into the DOM. In
other words, a sanitized and then serialized HTML tree can no
longer be considered as sanitized.
A more complete treatment of mXSS can be found in [[MXSS]].
# Acknowledgements # {#ack}
Cure53's [[DOMPURIFY]] is a clear inspiration for the API this document
describes, as is Internet Explorer's {{window.toStaticHTML()}}.