Javascript Tools¶
The tools described here can be found in the → tdi.tools.javascript
module. The module deals a lot with safe javascript manipulation and
provides also a minifier.
Escaping Javascript Variables¶
Escaping is important. And it’s easy to get wrong. Whenever you modify script blocks or attributes, the placed variables need to be properly escaped. Otherwise the result is open to XSS attacks. A large part of the javascript tools deals with this problem. When escaping stuff for javascript, there are various levels of context which need to be taken care of:
- Javascript quotes (
"
,'
– double and single) and escapes (\
), of course. - Unicode, and various issues with encoded characters
- The surrounding HTML (for example, the sequence
]]>
is harmless in javascript, but ends the CDATA block, which possibly contains the script itself) - The replacement implementation. The naïve way to replace multiple
placeholders is to call
str.replace
for each of them. However, if a malicious or unaware user puts a placeholder into the content, you’re getting a mess.
TDI provides two high-level functions,
→ javascript.fill
and
→ javascript.fill_attr
, which handle all these issues.
Both are based on the more generic → javascript.replace
function, which itself uses more basic escape functions:
→ javascript.escape_string
and
→ javascript.escape_inlined
.
When placing complex structures (more complex than simple strings) into
a script, JSON is the way to go. There are two classes available,
which connect your structures or already available JSON input with the
fill
and replace
functions:
→ SimpleJSON
and
→ LiteralJSON
.
replace¶
The → replace
function basically takes a
script (as a string) and a mapping of placeholders and returns a new
string with placeholders replaced. The “fill” functions described below modify a node in-place. Use those for
node manipulations.
The function scans for and replaces the placeholders in a single pass.
If a placeholder found that way is not in the mapping, it’s simply left
as is. The default placeholder pattern looks like __name__
.
You can pass a different pattern if you like. Note however, that the
default pattern makes placeholders typically look like javascript
identifiers, which might be helpful, if you run a minifier on the
original script. On the other hand, the pattern does not allow for names
containing double underscores.
The placeholder values are passed to escape_string before being used for replacement – except when
those objects provide an as_json([inlined])
method (a feature
that be turned off using the as_json
parameter of the
replace and fill
functions). If the as_json()
method exists and is allowed to be
called, it’s used instead of escape_string
. Afterwards the result
is possibly transcoded to the document’s character set. If the
inlined
parameter is true, the intermediate result is piped through
escape_inlined before eventually being used
as replacement value.
TDI ships with two container classes that already provide appropriate
as_json()
methods: LiteralJSON and SimpleJSON. The former is a simple (more or less) pass-through
container for a JSON string you already have available. The latter takes
a python object which is passed through the json
module before being
emitted by as_json()
.
Warning
When writing your own as_json()
methods, make sure,
the result is at least syntactically valid javascript code.
Here’s a simple example:
from tdi.tools import javascript
script = u'''
var a = __var__;
var b = '__str__';
'''.strip()
print javascript.replace(script, dict(
var=javascript.SimpleJSON([1, 2, 3, 4]),
str=u'my "string"'
))
var a = [1,2,3,4];
var b = 'my \"string\"';
fill, fill_attr¶
In contrast to the replace function,
→ fill
and
→ fill_attr
modifiy a node in-place. Both
actually call replace
, but use different options. The function
signatures are also simpler than replace
‘s, because some options are
implicit or can be determined directly from the node. Here is how it’s
used:
from tdi import html
from tdi.tools import javascript
tpl = html.from_string('''
<button tdi="button" onclick="alert('__alert__')">
Click me!
</button>
<script tdi="script">
var a = __var__;
var b = '__str__';
</script>
'''.lstrip())
class Model(object):
def render_button(self, node):
javascript.fill_attr(node, 'onclick', dict(
alert=u'"Hey André! ---]]>"'
))
def render_script(self, node):
javascript.fill(node, dict(
var=javascript.SimpleJSON([1, 2, 3, 4]),
str=u'"Hey André! ---]]>"',
))
tpl.render(Model())
And here’s the result:
<button onclick="alert('\"Hey Andr\xe9! ---]]>\"')">
Click me!
</button>
<script>
var a = [1,2,3,4];
var b = '\"Hey Andr\xe9! -\-\-]\]>\"';
</script>
Note how the data is escaped for different levels of presentation and escaping varies depending on the context:
- Quotes are escaped for javascript using a backslash
- The same quotes are escaped for HTML using
"
, but only in attribute context. The same goes for the>
character. - potentially dangerous sequences like multiple dashes or
]]>
are hidden from the containing HTML by applying harmless backslashes. This is not needed for attributes - and applied to blocks only. - the
é
ofAndré
is escaped to\xe9
, which is an encoding-independent representation of the character (JSON-content is encoded to the document encoding though; only characters not fitting into this encoding are transformed to\uxxxx
escapes).
escape_string, escape_inlined¶
→ javascript.escape_string
and
→ javascript.escape_inlined
are the building blocks of
the fill and replace functions described above. Use those for regular
placeholder handling.
escape_string
takes a string (or calls str()
on the passed
object) and:
- escapes
\
,"
and'
(by prepending them with\
) - encodes non-ASCII and non-printable characters to escape sequences, understandable by javascript.
- passes the result (by default) through
escape_inlined
Here’s a little example:
from tdi.tools import javascript
print javascript.escape_string(u"\n - é - € - \U0001d51e")
\n - \xe9 - \u20ac - \ud835\udd1e
Now, escape_inlined
prepares a string for inclusion in a HTML
script block by mangeling certain character sequences. These are:
- multiple consecutive dashes (
---
) – two dashes mark the end of an HTML comment (in XHTML they are not allowed in comments at all, but XHTML has different problems, see the cdata function). </
- this is the endtag opener (ETAGO), which by HTML’s rules may end the script block.]]>
- this ends a CDATA markup, which possibly contains the script.
escape_inlined
destroys these sequences by scattering harmless \
characters inside them. Javascript just strips those backslashes since
the characters “escaped” that way simply represent themselves:
from tdi.tools import javascript
print javascript.escape_inlined("----]]> </script>")
-\-\-\-]\]> <\/script>
LiteralJSON and SimpleJSON¶
The classes → SimpleJSON
and
→ LiteralJSON
are designed as data connectors
for the fill and replace functions. They are initialized with some data
object and emit this data object via their as_json()
method.
SimpleJSON
pipes the input through the simplejson library
(available as json
in the standard library since python 2.6). If
there’s neither json
nor simplejson
available, as_json()
raises an ImportError
. The JSON encoder is configured to emit
the smallest output possible but not modified otherwise. If you need
something more fancy, like custom data type support, you need to create
your own class; or simply generate the JSON yourself and pass it to
LiteralJSON
.
LiteralJSON
takes some JSON string as input and passes it
through to the as_json()
method. That’s needed to avoid
double-escaping when passing JSON strings to replace.
Note
When writing your own classes, note the following:
as_json()
is expected to return unicodeas_json()
is more or less inserted literally (modulo some encoding stuff and inline escaping)- the signature of
as_json()
expects an optional booleaninlined
argument, indicating whetheras_json()
itself should do inline escaping. However, replace will always set that toFalse
and do that work by itself.
Minifying Javascript¶
Minifying reduces the size of a document by removing redundant or irrelevant content. Typically this includes whitespace and comments. Some minifiers also rename variables and functions and remove unneeded braces and so on.
TDI ships with the latest version of the rJSmin minifier, which only removes spaces and comments – but does that very fast.
There are two use cases here:
- Minify script blocks within HTML templates during the loading phase
- Minify some standalone javascript
The first case is handled by hooking the
→ tdi.tools.javascript.MinifyFilter
into the template loader. See the
filters documentation for a description how to do that.
For the second case there’s the → tdi.tools.javascript.minify
function:
from tdi.tools import javascript
print javascript.minify(u"""
if (n.tagName.toLowerCase() == 'label') {
n = n.parentNode;
if (t && n == t) continue;
t = n;
}
""".lstrip())
if(n.tagName.toLowerCase()=='label'){n=n.parentNode;if(t&&n==t)continue;t=n;}
Masking Script Blocks¶
HTML has a long history of adding new elements all the time. Conceptually this is possible, because browsers simply ignore the tags of unknown elements and apply no semantics. That’s forward compatibility. Now if you place something inside these new elements, which is not HTML content, you get a backwards compatibility problem.
cdata and cleanup¶
Script elements (and style elements for that matter) suffer from this problem since they were invented. The first solution was to enclose them with comment markers, but add special rules for browsers to accept these markers as part of the content:
<script><!--
the script.
//--></script>
Then XHTML was invented. The XML parser cannot be tricked into such special rules. It would start throwing away the comment again, so people gave up backwards compliance and wrote:
<script><![CDATA[
the script.
]]></script>
and then, to be compatible with HTML again:
<script>//<![CDATA[
the script.
//]]></script>
Then someone finally figured out [1] a mix of CDATA and comments completely compatible with all HTML/TagSoup parsers, and it looks like this:
[1] | http://lists.w3.org/Archives/Public/www-html/2002Apr/0053.html |
<script><!--//--><![CDATA[//><!--
the script.
//--><!]]></script>
This is funny stuff, but you wouldn’t want to write it all the time. You should consider applying that, however, because browsers are typically not the only applications parsing your HTML.
The functions → javascript.cdata
and
→ javascript.cleanup
can do that for you.
cdata()
takes a script and encloses it in such an all-compatible
CDATA/comment-mix-container. cleanup()
does the reverse. It
looks for common containers (like the ones described above) and strips
them. In fact, the cdata()
function calls cleanup()
itself in order to avoid doubling itself:
from tdi.tools import javascript
print javascript.cdata("""
//<![CDATA[
the script.
//]]>
""".strip())
<!--//--><![CDATA[//><!--
the script.
//--><!]]>
The cdata()
function can be applied automatically to all script
blocks of a template by hooking the
→ tdi.tools.javascript.CDATAFilter
into the template loader. See
the filters documentation for a description how to do
that.