xss:preventing_xss
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
xss:preventing_xss [2016/10/10 10:55] – [Status of CSP] peter | xss:preventing_xss [2020/04/15 08:45] (current) – removed peter | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== XSS - Preventing XSS ====== | ||
- | ===== Methods of preventing XSS ===== | ||
- | |||
- | Recall that an XSS attack is a type of code injection: user input is mistakenly interpreted as malicious program code. In order to prevent this type of code injection, secure input handling is needed. For a web developer, there are two fundamentally different ways of performing secure input handling: | ||
- | |||
- | * **Encoding**, | ||
- | |||
- | * **Validation**, | ||
- | |||
- | While these are fundamentally different methods of preventing XSS, they share several common features that are important to understand when using either of them: | ||
- | |||
- | * **Context**// | ||
- | |||
- | * **Inbound-outbound**// | ||
- | |||
- | * **Client-server**// | ||
- | |||
- | Before explaining in detail how encoding and validation work, we will describe each of these points. | ||
- | |||
- | |||
- | ==== Input handling contexts ==== | ||
- | |||
- | There are many contexts in a web page where user input might be inserted. For each of these, specific rules must be followed so that the user input cannot break out of its context and be interpreted as malicious code. Below are the most common contexts: | ||
- | |||
- | ^Context^Example code^ | ||
- | |HTML element content|< | ||
- | |HTML attribute value|< | ||
- | |URL query value|http:// | ||
- | |CSS value|color: | ||
- | |JavaScript value|var name = "< | ||
- | |||
- | |||
- | === Why context matters === | ||
- | |||
- | In all of the contexts described, an XSS vulnerability would arise if user input were inserted before first being encoded or validated. An attacker would then be able to inject malicious code by simply inserting the closing delimiter for that context and following it with the malicious code. | ||
- | |||
- | For example, if at some point a website inserts user input directly into an HTML attribute, an attacker would be able to inject a malicious script by beginning his input with a quotation mark, as shown below: | ||
- | |||
- | |Application code|< | ||
- | |Malicious string|< | ||
- | |Resulting code|< | ||
- | |||
- | This could be prevented by simply removing all quotation marks in the user input, and everything would be fine—but only in this context. If the same input were inserted into another context, the closing delimiter would be different and injection would become possible. For this reason, secure input handling always needs to be tailored to the context where the user input will be inserted. | ||
- | |||
- | |||
- | ==== Inbound/ | ||
- | |||
- | Instinctively, | ||
- | |||
- | The problem is that, as described previously, user input can be inserted into several contexts in a page. There is no easy way of determining when user input arrives which context it will eventually be inserted into, and the same user input often needs to be inserted into different contexts. Relying on inbound input handling to prevent XSS is thus a very brittle solution that will be prone to errors. (The deprecated [[http:// | ||
- | |||
- | Instead, outbound input handling should be your primary line of defense against XSS, because it can take into account the specific context that user input will be inserted into. That being said, inbound validation can still be used to add a secondary layer of protection, as we will describe later. | ||
- | |||
- | |||
- | ==== Where to perform secure input handling ==== | ||
- | |||
- | In most modern web applications, | ||
- | |||
- | * In order to protect against traditional XSS, secure input handling must be performed in server-side code. This is done using any language supported by the server. | ||
- | |||
- | * In order to protect against DOM-based XSS where the server never receives the malicious string (such as the fragment identifier attack described earlier), secure input handling must be performed in client-side code. This is done using JavaScript. | ||
- | |||
- | Now that we have explained why context matters, why the distinction between inbound and outbound input handling is important, and why secure input handling needs to be performed in both client-side code and server-side code, we will go on to explain how the two types of secure input handling (encoding and validation) are actually performed. | ||
- | |||
- | |||
- | ===== Encoding ===== | ||
- | |||
- | Encoding is the act of escaping user input so that the browser interprets it only as data, not as code. The most recognizable type of encoding in web development is HTML escaping, which converts characters like < and > into < and >, respectively. | ||
- | |||
- | The following pseudocode is an example of how user input could be encoded using HTML escaping and then inserted into a page by a server-side script: | ||
- | |||
- | <code javascript> | ||
- | print "< | ||
- | print " | ||
- | print encodeHtml(userInput) | ||
- | print "</ | ||
- | </ | ||
- | |||
- | If the user input were the string < | ||
- | |||
- | <code html> | ||
- | < | ||
- | Latest comment: | ||
- | & | ||
- | </ | ||
- | </ | ||
- | |||
- | Because all characters with special meaning have been escaped, the browser will not parse any part of the user input as HTML. | ||
- | |||
- | ==== Encoding in client-side and server-side code ==== | ||
- | |||
- | When performing encoding in your client-side code, the language used is always JavaScript, which has built-in functions that encode data for different contexts. | ||
- | |||
- | When performing encoding in your server-side code, you rely on the functions available in your server-side language or framework. Due to the large number of languages and frameworks available, this tutorial will not cover the details of encoding in any specific server-side language or framework. However, familiarity with the encoding functions used on the client-side in JavaScript is useful when writing server-side code as well. | ||
- | |||
- | === Encoding on the client-side === | ||
- | |||
- | When encoding user input on the client-side using JavaScript, there are several built-in methods and properties that automatically encode all data in a context-aware manner: | ||
- | |||
- | ^Context^Method/ | ||
- | |HTML element content|node.textContent = <color orange> | ||
- | |HTML attribute value|element.setAttribute(attribute, | ||
- | or | ||
- | element[attribute] = <color orange> | ||
- | |URL query value|window.encodeURIComponent(< | ||
- | |CSS value|element.style.property = <color orange> | ||
- | |||
- | The last context mentioned above (JavaScript values) is not included in this list, because JavaScript provides no built-in way of encoding data to be included in JavaScript source code. | ||
- | |||
- | |||
- | ==== Limitations of encoding ==== | ||
- | |||
- | Even with encoding, it will be possible to input malicious strings into some contexts. A notable example of this is when user input is used to provide URLs, such as in the example below: | ||
- | |||
- | <code javascript> | ||
- | document.querySelector(' | ||
- | </ | ||
- | |||
- | Although assigning a value to the href property of an anchor element automatically encodes it so that it becomes nothing more than an attribute value, this in itself does not prevent the attacker from inserting a URL beginning with " | ||
- | |||
- | Encoding is also an inadequate solution when you actually want the user to define part of a page's code. An example is a user profile page where the user can define custom HTML. If this custom HTML were encoded, the profile page could consist only of plain text. | ||
- | |||
- | In situations like these, encoding has to be complemented with validation, which we will describe next. | ||
- | |||
- | ===== Validation ===== | ||
- | |||
- | Validation is the act of filtering user input so that all malicious parts of it are removed, without necessarily removing all code in it. One of the most recognizable types of validation in web development is allowing some HTML elements (such as <em> and < | ||
- | |||
- | There are two main characteristics of validation that differ between implementations: | ||
- | |||
- | * **Classification strategy**// | ||
- | |||
- | * **Validation outcome**// User input identified as malicious can either be rejected or sanitised.// | ||
- | |||
- | |||
- | ==== Classification strategy ==== | ||
- | |||
- | === Blacklisting === | ||
- | |||
- | Instinctively, | ||
- | |||
- | However, blacklisting has two major drawbacks: | ||
- | |||
- | * **Complexity**// | ||
- | |||
- | * **Staleness**// | ||
- | |||
- | Because of these drawbacks, blacklisting as a classification strategy is strongly discouraged. Whitelisting is usually a much safer approach, as we will describe next. | ||
- | |||
- | === Whitelisting === | ||
- | |||
- | Whitelisting is essentially the opposite of blacklisting: | ||
- | |||
- | In contrast with the blacklisting example before, an example of whitelisting would be to allow users to submit custom URLs containing only the protocols http: and https:, nothing else. This approach would automatically mark a URL as invalid if it had the protocol javascript:, | ||
- | |||
- | Compared to blacklisting, | ||
- | |||
- | * **Simplicity**// | ||
- | |||
- | * **Longevity**// | ||
- | |||
- | |||
- | ==== Validation outcome ==== | ||
- | |||
- | When input has been marked as invalid, one of two actions can be taken: | ||
- | |||
- | * **Rejection**// | ||
- | |||
- | * **Sanitisation**// | ||
- | |||
- | Of these two, rejection is the simplest approach to implement. | ||
- | |||
- | If you decide to implement sanitisation, | ||
- | |||
- | |||
- | ==== Which prevention technique to use ==== | ||
- | |||
- | Encoding should be your first line of defense against XSS, because its very purpose is to neutralize data so that it cannot be interpreted as code. In some cases, encoding needs to be complemented with validation, as explained earlier. | ||
- | |||
- | As a second line of defense, you should use inbound validation to sanitize or reject data that is clearly invalid, such as links using the javascript: protocol. | ||
- | |||
- | If these two lines of defense are used consistently, | ||
- | |||
- | |||
- | ===== Content Security Policy (CSP) ===== | ||
- | |||
- | The disadvantage of protecting against XSS by using only secure input handling is that even a single lapse of security can compromise your website. | ||
- | |||
- | CSP is used to constrain the browser viewing your page so that it can only use resources downloaded from trusted sources. | ||
- | |||
- | CSP can be used to enforce the following rules: | ||
- | |||
- | * **No untrusted sources**// External resources can only be loaded from a set of clearly defined trusted sources.// | ||
- | |||
- | * **No inline resources**// | ||
- | |||
- | * **No eval**// The JavaScript eval function cannot be used.// | ||
- | |||
- | ==== CSP in action ==== | ||
- | |||
- | In the following example, an attacker has succeeded in injecting malicious code into a page: | ||
- | |||
- | < | ||
- | Latest comment: | ||
- | <script src=" | ||
- | </ | ||
- | |||
- | With a properly defined CSP policy, the browser would not load and execute malicious‑script.js because http:// | ||
- | |||
- | Even if the attacker had injected the script code inline rather than linking to an external file, a properly defined CSP policy disallowing inline JavaScript would also have prevented the vulnerability from causing any harm. | ||
- | |||
- | |||
- | ==== How to enable CSP ==== | ||
- | |||
- | By default, browsers do not enforce CSP. To enable CSP on your website, pages must be served with an additional HTTP header: | ||
- | Content‑Security‑Policy. | ||
- | |||
- | Since the security policy is sent with every HTTP response, it is possible for a server to set its policy on a page-by-page basis. | ||
- | |||
- | The value of the Content‑Security‑Policy header is a string defining one or more security policies that will take effect on your website. | ||
- | |||
- | The example headers in this section use newlines and indentation for clarity; this should not be present in an actual header. | ||
- | |||
- | |||
- | ==== Syntax of CSP ==== | ||
- | |||
- | The syntax of a CSP header is as follows: | ||
- | |||
- | Content‑Security‑Policy: | ||
- | |||
- | < | ||
- | directive source‑expression, | ||
- | directive ...; | ||
- | ... | ||
- | </ | ||
- | |||
- | This syntax is made up of two elements: | ||
- | |||
- | * **Directives** are strings specifying a type of resource, taken from a predefined list. | ||
- | |||
- | * **Source expressions** are patterns describing one or more servers that resources can be downloaded from. | ||
- | |||
- | For every directive, the given source expressions define which sources can be used to download resources of the respective type. | ||
- | |||
- | |||
- | === Directives === | ||
- | |||
- | The directives that can be used in a CSP header are as follows: | ||
- | |||
- | * connect‑src | ||
- | * font‑src | ||
- | * frame‑src | ||
- | * img‑src | ||
- | * media‑src | ||
- | * object‑src | ||
- | * script‑src | ||
- | * style‑src | ||
- | |||
- | In addition to these, the special directive default‑src can be used to provide a default value for all directives that have not been included in the header. | ||
- | |||
- | |||
- | === Source expressions === | ||
- | |||
- | The syntax of a source expression is as follows: | ||
- | |||
- | * protocol:// | ||
- | |||
- | < | ||
- | |||
- | In addition to the syntax above, a source expression can alternatively be one of four keywords with special meaning (quotes included): | ||
- | |||
- | * **' | ||
- | |||
- | * **' | ||
- | |||
- | * **' | ||
- | |||
- | * **' | ||
- | |||
- | Note that whenever CSP is used, inline resources and eval are automatically disallowed by default. | ||
- | |||
- | ==== An example policy ==== | ||
- | |||
- | Content‑Security‑Policy: | ||
- | |||
- | < | ||
- | script‑src ' | ||
- | media‑src ' | ||
- | img‑src *; | ||
- | default‑src ' | ||
- | </ | ||
- | |||
- | In this example policy, the page is subject to the following restrictions: | ||
- | |||
- | * Scripts can be downloaded only from the host serving the page and from scripts.example.com. | ||
- | |||
- | * Audio and video files cannot be downloaded from anywhere. | ||
- | |||
- | * Image files can be downloaded from any host. | ||
- | |||
- | * All other resources can be downloaded only from the host serving the page and from any subdomain of example.com. | ||
- | ==== Status of CSP ==== | ||
- | |||
- | As of June 2013, Content Security Policy is [[a W3C candidate recommendation|https:// | ||
- | |||
- | ===== Summary ===== | ||
- | |||
- | ==== Summary: Overview of XSS ==== | ||
- | |||
- | * XSS is a code injection attack made possible through insecure handling of user input. | ||
- | |||
- | * A successful XSS attack allows an attacker to execute malicious JavaScript in a victim' | ||
- | |||
- | * A successful XSS attack compromises the security of both the website and its users. | ||
- | |||
- | ==== Summary: XSS Attacks ==== | ||
- | |||
- | * There are three major types of XSS attacks: | ||
- | |||
- | * Persistent XSS, where the malicious input originates from the website' | ||
- | |||
- | * Reflected XSS, where the malicious input originates from the victim' | ||
- | |||
- | * DOM-based XSS, where the vulnerability is in the client-side code rather than the server-side code. | ||
- | |||
- | * All of these attacks are performed in different ways but have the same effect if they succeed. | ||
- | |||
- | ==== Summary: Preventing XSS ==== | ||
- | |||
- | * The most important way of preventing XSS attacks is to perform secure input handling. | ||
- | |||
- | * Most of the time, encoding should be performed whenever user input is included in a page. | ||
- | |||
- | * In some cases, encoding has to be replaced by or complemented with validation. | ||
- | |||
- | * Secure input handling has to take into account which context of a page the user input is inserted into. | ||
- | |||
- | * To prevent all types of XSS attacks, secure input handling has to be performed in both client-side and server-side code. | ||
- | |||
- | * Content Security Policy provides an additional layer of defense for when secure input handling fails. | ||
- | |||
- | |||
- | ===== Appendix ===== | ||
- | |||
- | ==== Terminology ==== | ||
- | |||
- | It should be noted that there is overlap in the terminology currently used to describe XSS: a DOM-based XSS attack is also either persistent or reflected at the same time; it's not a separate type of attack. There is no widely accepted terminology that covers all types of XSS without overlap. Regardless of the terminology used to describe XSS, however, the most important thing to identify about any given attack is where the malicious input comes from and where the vulnerability is located. | ||
- | |||
- | |||
- | ===== Addendums ===== | ||
- | |||
- | [[How to implement whitelisting securely (July 9th, 2016)|http:// | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | ===== References ===== | ||
- | |||
- | |||
- | http:// |
xss/preventing_xss.1476096923.txt.gz · Last modified: 2020/07/15 09:30 (external edit)