document, test, and name function sanitizeBalance

This commit is contained in:
Greg Weber 2010-11-01 16:21:31 -07:00
parent 018ee4889a
commit d501579948
3 changed files with 31 additions and 19 deletions

View File

@ -1,17 +1,20 @@
Summary Summary
======= =======
provides a function Text.HTML.SanitizeXSS.sanitizeXSS that filters html to prevent XSS attacks. provides 2 functions in the module Text.HTML.SanitizeXSS
* sanitizeXSS - filters html to prevent XSS attacks.
* sanitizeBalance - same as sanitizeXSS but makes sure there are no lone closing tags - this could prevent a user's html from messing up your page
Use Case Use Case
======== ========
All html from an untrusted source (user of a web application) should be ran through this function. HTML from an untrusted source (user of a web application) should be ran through this library.
If you trust the html (you wrote it), you do not need to use this. If you trust the HTML (you wrote it), you do not need to use this.
If you don't trust the html you probably also do not trust that the tags are balanced- so you should use sanitizeWithBalancing.
Detail Detail
======== ========
This is not escaping! Escaping html does prevents XSS attacks. Strings should be html escaped to show up properly and to prevent XSS attacks. However, escaping will ruin the display of the html. This is not escaping! Escaping html does prevents XSS attacks. Strings should be html escaped to show up properly and to prevent XSS attacks. However, escaping will ruin the display of the html.
This function removes any tags or attributes that are not in its white-list of. This may sound picky, but most html should make it through unchanged, making the process unnoticeable to the user but giving us safe html. This function removes any tags or attributes that are not in its white-list. This may sound picky, but most html should make it through unchanged, making the process unnoticeable to the user but giving us safe html.
Integration Integration
=========== ===========
@ -19,12 +22,17 @@ It is recommended to integrate this so that it is automatically used whenever an
Credit Credit
=========== ===========
This was taken from John MacFarlane's Pandoc (with permission) modified to be faster and parsing redone with TagSoup. html5lib is also being used as a reference (BSD style license). Original code was taken from John MacFarlane's Pandoc (with permission), but modified to be faster and with parsing redone using TagSoup. html5lib is now being used as a reference (BSD style license).
Michael Snoyman added the balanced tags functionality.
Limitations Limitations
=========== ===========
Balancing - sanitizeBalance
---------------------------------
The goal of this function is to prevent your html from breaking when unknown html is placed inside it. I would expect it to work very well in practice and don't see a downside to using it unless you have an alternative aproach. However, this function does not at all guarantee valid html. In fact, it is likely that the result of balancing will still be invalid HTML. This means there is still no guarantee what a browser will do with the html, so there is no guarantee that it will prevent you html from breaking. Other possible aproaches would be to run the html through a library like libxml2 which understands html or to first render the html in a hidden iframe or maybe a hidden div at the bottom of the page so that it is isolated, and then use javascript to insert it into the page where you want it.
TagSoup Parser TagSoup Parser
-------------- --------------
TagSoup is used to parse the HTML, and it does a good job. However TagSoup does not maintain all white space. TagSoup does not distinguish between the following cases: TagSoup is used to parse the HTML, and it does a good job. However TagSoup does not maintain all white space. TagSoup does not distinguish between the following cases:

View File

@ -1,6 +1,6 @@
module Text.HTML.SanitizeXSS module Text.HTML.SanitizeXSS
( sanitizeXSS ( sanitizeXSS
, sanitizeBalanceXSS , sanitizeBalance
) where ) where
import Text.HTML.TagSoup import Text.HTML.TagSoup
@ -14,8 +14,15 @@ import Codec.Binary.UTF8.String ( encodeString )
import qualified Data.Map as Map import qualified Data.Map as Map
sanitizeBalanceXSS :: String -> String -- | santize the html to prevent XSS attacks. See README.md <http://github.com/gregwebs/haskell-xss-sanitize> for more details
sanitizeBalanceXSS = renderTagsOptions renderOptions { sanitizeXSS :: String -> String
sanitizeXSS = renderTagsOptions renderOptions {
optMinimize = \x -> x `elem` ["br","img"] -- <img><img> converts to <img />, <a/> converts to <a></a>
} . safeTags . parseTags
-- same as sanitizeXSS but makes sure there are no lone closing tags. See README.md <http://github.com/gregwebs/haskell-xss-sanitize> for more details
sanitizeBalance :: String -> String
sanitizeBalance = renderTagsOptions renderOptions {
optMinimize = \x -> x `elem` ["br","img"] -- <img><img> converts to <img />, <a/> converts to <a></a> optMinimize = \x -> x `elem` ["br","img"] -- <img><img> converts to <img />, <a/> converts to <a></a>
} . balance Map.empty . safeTags . parseTags } . balance Map.empty . safeTags . parseTags
@ -43,12 +50,6 @@ balance m (TagOpen name as : tags) =
Just i -> Map.insert name (i + 1) m Just i -> Map.insert name (i + 1) m
balance m (t:ts) = t : balance m ts balance m (t:ts) = t : balance m ts
-- | santize the html to prevent XSS attacks. See README.md <http://github.com/gregwebs/haskell-xss-sanitize> for more details
sanitizeXSS :: String -> String
sanitizeXSS = renderTagsOptions renderOptions {
optMinimize = \x -> x `elem` ["br","img"] -- <img><img> converts to <img />, <a/> converts to <a></a>
} . safeTags . parseTags
safeTags :: [Tag String] -> [Tag String] safeTags :: [Tag String] -> [Tag String]
safeTags [] = [] safeTags [] = []
safeTags (t@(TagClose name):tags) safeTags (t@(TagClose name):tags)

13
test.hs
View File

@ -1,8 +1,11 @@
import Text.HTML.SanitizeXSS import Text.HTML.SanitizeXSS
main = do testHTML = " <a href='http://safe.com'>safe</a><a href='unsafe://hack.com'>anchor</a> <img src='evil://evil.com' /> <unsafe></foo> <bar /> <br></br> <b>Unbalanced</div><img src='http://safe.com'>"
let test = " <a href='http://safe.com'>safe</a><a href='unsafe://hack.com'>anchor</a> <img src='evil://evil.com' /> <unsafe></foo> <bar /> <br></br> <b>Unbalanced</div><img src='http://safe.com'>"
let actual = (sanitizeBalanceXSS test) test actual expected = do
let expected = " <a href=\"http://safe.com\">safe</a><a>anchor</a> <img /> <br /> <b>Unbalanced<div></div><img src=\"http://safe.com\"></b>" putStrLn $ "testing: " ++ testHTML
putStrLn $ "testing: " ++ test
putStrLn $ if actual == expected then "pass" else "failure\n" ++ "\nexpected:" ++ (show expected) ++ "\nactual: " ++ (show actual) putStrLn $ if actual == expected then "pass" else "failure\n" ++ "\nexpected:" ++ (show expected) ++ "\nactual: " ++ (show actual)
main = do
test (sanitizeBalance testHTML) " <a href=\"http://safe.com\">safe</a><a>anchor</a> <img /> <br /> <b>Unbalanced<div></div><img src=\"http://safe.com\"></b>"
test (sanitizeXSS testHTML) " <a href=\"http://safe.com\">safe</a><a>anchor</a> <img /> <br /> <b>Unbalanced</div><img src=\"http://safe.com\">"