From 6a83d50d82f8c396ce09e9a8fa56ef0e2f270d51 Mon Sep 17 00:00:00 2001 From: Greg Weber Date: Sun, 30 Mar 2014 08:22:48 -0700 Subject: [PATCH] white list wiki source seems defunct --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 11f6953..b85eab7 100644 --- a/README.md +++ b/README.md @@ -55,7 +55,7 @@ In the third case, img and br tags will be output as a single self-closing tags. Ultimately this is where your security comes from. I would expect that a faulty white list would act as a strong deterrent, but this library strives for correctness. -The [source code of html5lib](http://code.google.com/p/html5lib/source/browse/python/html5lib/sanitizer.py) is the source of the white list and my implementation reference. They reference [a wiki page containing a white list](http://wiki.whatwg.org/wiki/Sanitization_rules), and hopefully they are careful of when they import into their code. Working with the maintainers of html5lib may make sense, but it doesn't make sense to merge the projects because sanitization is just one aspect of html5lib (They have a parser also). +The [source code of html5lib](http://code.google.com/p/html5lib/source/browse/python/html5lib/sanitizer.py) is the source of the white list and my implementation reference. If you feel a tag is missing from the white list, check to see if it has been added there. If anyone knows of better sources or thinks a particular tag/attribute/value may be vulnerable, please let me know. [HTML Purifier](http://htmlpurifier.org/live/smoketests/printDefinition.php) does have a more permissive and configurable (yet safe) white list if you are looking to add anything. @@ -64,7 +64,7 @@ If anyone knows of better sources or thinks a particular tag/attribute/value may Original code was taken from John MacFarlane's Pandoc (with permission), but modified by Greg Weber to be faster and with parsing redone using TagSoup, and to use html5lib's white list. Michael Snoyman added the balanced tags functionality and released css-text specifically to help with css parsing. -html5lib's sanitizer.py is used as a reference implementation, and most of the code should look the same. For css parsing, html5lib's regexes were translated to a parser. +html5lib's sanitizer.py is used as a reference implementation, and most of the code should look the same. The css parsing is different: as mentioned we use a css parser, not regexes like html5lib. ### style attribute