Javascript Tools

The tools described here can be found in the → tdi.tools.javascript module. The module deals a lot with safe javascript manipulation and provides also a minifier.

Escaping Javascript Variables

Escaping is important. And it’s easy to get wrong. Whenever you modify script blocks or attributes, the placed variables need to be properly escaped. Otherwise the result is open to XSS attacks. A large part of the javascript tools deals with this problem. When escaping stuff for javascript, there are various levels of context which need to be taken care of:

  • Javascript quotes (", ' – double and single) and escapes (\), of course.
  • Unicode, and various issues with encoded characters
  • The surrounding HTML (for example, the sequence ]]> is harmless in javascript, but ends the CDATA block, which possibly contains the script itself)
  • The replacement implementation. The naïve way to replace multiple placeholders is to call str.replace for each of them. However, if a malicious or unaware user puts a placeholder into the content, you’re getting a mess.

TDI provides two high-level functions, → javascript.fill and → javascript.fill_attr, which handle all these issues. Both are based on the more generic → javascript.replace function, which itself uses more basic escape functions: → javascript.escape_string and → javascript.escape_inlined.

When placing complex structures (more complex than simple strings) into a script, JSON is the way to go. There are two classes available, which connect your structures or already available JSON input with the fill and replace functions: → SimpleJSON and → LiteralJSON.

replace

The → replace function basically takes a script (as a string) and a mapping of placeholders and returns a new string with placeholders replaced. The “fill” functions described below modify a node in-place. Use those for node manipulations.

The function scans for and replaces the placeholders in a single pass. If a placeholder found that way is not in the mapping, it’s simply left as is. The default placeholder pattern looks like __name__. You can pass a different pattern if you like. Note however, that the default pattern makes placeholders typically look like javascript identifiers, which might be helpful, if you run a minifier on the original script. On the other hand, the pattern does not allow for names containing double underscores.

The placeholder values are passed to escape_string before being used for replacement – except when those objects provide an as_json([inlined]) method (a feature that be turned off using the as_json parameter of the replace and fill functions). If the as_json() method exists and is allowed to be called, it’s used instead of escape_string. Afterwards the result is possibly transcoded to the document’s character set. If the inlined parameter is true, the intermediate result is piped through escape_inlined before eventually being used as replacement value.

TDI ships with two container classes that already provide appropriate as_json() methods: LiteralJSON and SimpleJSON. The former is a simple (more or less) pass-through container for a JSON string you already have available. The latter takes a python object which is passed through the json module before being emitted by as_json().

Warning

When writing your own as_json() methods, make sure, the result is at least syntactically valid javascript code.

Here’s a simple example:

from tdi.tools import javascript

script = u'''
var a = __var__;
var b = '__str__';
'''.strip()
print javascript.replace(script, dict(
    var=javascript.SimpleJSON([1, 2, 3, 4]),
    str=u'my "string"'
))
var a = [1,2,3,4];
var b = 'my \"string\"';

fill, fill_attr

In contrast to the replace function, → fill and → fill_attr modifiy a node in-place. Both actually call replace, but use different options. The function signatures are also simpler than replace‘s, because some options are implicit or can be determined directly from the node. Here is how it’s used:

from tdi import html
from tdi.tools import javascript

tpl = html.from_string('''
<button tdi="button" onclick="alert('__alert__')">
    Click me!
</button>
<script tdi="script">
var a = __var__;
var b = '__str__';
</script>
'''.lstrip())


class Model(object):
    def render_button(self, node):
        javascript.fill_attr(node, 'onclick', dict(
            alert=u'"Hey André! ---]]>"'
        ))

    def render_script(self, node):
        javascript.fill(node, dict(
            var=javascript.SimpleJSON([1, 2, 3, 4]),
            str=u'"Hey André! ---]]>"',
        ))

tpl.render(Model())

And here’s the result:

<button onclick="alert('\&quot;Hey Andr\xe9! ---]]&gt;\&quot;')">
    Click me!
</button>
<script>
var a = [1,2,3,4];
var b = '\"Hey Andr\xe9! -\-\-]\]>\"';
</script>

Note how the data is escaped for different levels of presentation and escaping varies depending on the context:

  • Quotes are escaped for javascript using a backslash
  • The same quotes are escaped for HTML using &quot;, but only in attribute context. The same goes for the > character.
  • potentially dangerous sequences like multiple dashes or ]]> are hidden from the containing HTML by applying harmless backslashes. This is not needed for attributes - and applied to blocks only.
  • the é of André is escaped to \xe9, which is an encoding-independent representation of the character (JSON-content is encoded to the document encoding though; only characters not fitting into this encoding are transformed to \uxxxx escapes).

escape_string, escape_inlined

→ javascript.escape_string and → javascript.escape_inlined are the building blocks of the fill and replace functions described above. Use those for regular placeholder handling.

escape_string takes a string (or calls str() on the passed object) and:

  • escapes \, " and ' (by prepending them with \)
  • encodes non-ASCII and non-printable characters to escape sequences, understandable by javascript.
  • passes the result (by default) through escape_inlined

Here’s a little example:

from tdi.tools import javascript

print javascript.escape_string(u"\n - é - € - \U0001d51e")
\n - \xe9 - \u20ac - \ud835\udd1e

Now, escape_inlined prepares a string for inclusion in a HTML script block by mangeling certain character sequences. These are:

  • multiple consecutive dashes (---) – two dashes mark the end of an HTML comment (in XHTML they are not allowed in comments at all, but XHTML has different problems, see the cdata function).
  • </ - this is the endtag opener (ETAGO), which by HTML’s rules may end the script block.
  • ]]> - this ends a CDATA markup, which possibly contains the script.

escape_inlined destroys these sequences by scattering harmless \ characters inside them. Javascript just strips those backslashes since the characters “escaped” that way simply represent themselves:

from tdi.tools import javascript

print javascript.escape_inlined("----]]> </script>")
-\-\-\-]\]> <\/script>

LiteralJSON and SimpleJSON

The classes → SimpleJSON and → LiteralJSON are designed as data connectors for the fill and replace functions. They are initialized with some data object and emit this data object via their as_json() method.

SimpleJSON pipes the input through the simplejson library (available as json in the standard library since python 2.6). If there’s neither json nor simplejson available, as_json() raises an ImportError. The JSON encoder is configured to emit the smallest output possible but not modified otherwise. If you need something more fancy, like custom data type support, you need to create your own class; or simply generate the JSON yourself and pass it to LiteralJSON.

LiteralJSON takes some JSON string as input and passes it through to the as_json() method. That’s needed to avoid double-escaping when passing JSON strings to replace.

Note

When writing your own classes, note the following:

  • as_json() is expected to return unicode
  • as_json() is more or less inserted literally (modulo some encoding stuff and inline escaping)
  • the signature of as_json() expects an optional boolean inlined argument, indicating whether as_json() itself should do inline escaping. However, replace will always set that to False and do that work by itself.

Minifying Javascript

Minifying reduces the size of a document by removing redundant or irrelevant content. Typically this includes whitespace and comments. Some minifiers also rename variables and functions and remove unneeded braces and so on.

TDI ships with the latest version of the rJSmin minifier, which only removes spaces and comments – but does that very fast.

There are two use cases here:

  1. Minify script blocks within HTML templates during the loading phase
  2. Minify some standalone javascript

The first case is handled by hooking the → tdi.tools.javascript.MinifyFilter into the template loader. See the filters documentation for a description how to do that.

For the second case there’s the → tdi.tools.javascript.minify function:

from tdi.tools import javascript

print javascript.minify(u"""
if (n.tagName.toLowerCase() == 'label') {
    n = n.parentNode;
    if (t && n == t) continue;
    t = n;
}
""".lstrip())
if(n.tagName.toLowerCase()=='label'){n=n.parentNode;if(t&&n==t)continue;t=n;}

Masking Script Blocks

HTML has a long history of adding new elements all the time. Conceptually this is possible, because browsers simply ignore the tags of unknown elements and apply no semantics. That’s forward compatibility. Now if you place something inside these new elements, which is not HTML content, you get a backwards compatibility problem.

cdata and cleanup

Script elements (and style elements for that matter) suffer from this problem since they were invented. The first solution was to enclose them with comment markers, but add special rules for browsers to accept these markers as part of the content:

<script><!--
    the script.
//--></script>

Then XHTML was invented. The XML parser cannot be tricked into such special rules. It would start throwing away the comment again, so people gave up backwards compliance and wrote:

<script><![CDATA[
    the script.
]]></script>

and then, to be compatible with HTML again:

<script>//<![CDATA[
    the script.
//]]></script>

Then someone finally figured out [1] a mix of CDATA and comments completely compatible with all HTML/TagSoup parsers, and it looks like this:

[1]http://lists.w3.org/Archives/Public/www-html/2002Apr/0053.html
<script><!--//--><![CDATA[//><!--
    the script.
//--><!]]></script>

This is funny stuff, but you wouldn’t want to write it all the time. You should consider applying that, however, because browsers are typically not the only applications parsing your HTML.

The functions → javascript.cdata and → javascript.cleanup can do that for you. cdata() takes a script and encloses it in such an all-compatible CDATA/comment-mix-container. cleanup() does the reverse. It looks for common containers (like the ones described above) and strips them. In fact, the cdata() function calls cleanup() itself in order to avoid doubling itself:

from tdi.tools import javascript

print javascript.cdata("""
//<![CDATA[
    the script.
//]]>
""".strip())
<!--//--><![CDATA[//><!--
the script.
//--><!]]>

The cdata() function can be applied automatically to all script blocks of a template by hooking the → tdi.tools.javascript.CDATAFilter into the template loader. See the filters documentation for a description how to do that.