Package tdi :: Package tools :: Module html
[frames] | no frames]

Module html

source code

HTML Tools

HTML Tools.


Copyright: Copyright 2006 - 2015 André Malo or his licensors, as applicable

License:

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Author: André Malo

Classes
  CommentStripFilter
Strip comments from the event chain
  MinifyFilter
Strip unneeded whitespace and comments
Functions
unicode
decode(value, encoding='latin-1', errors='strict', entities=None)
Decode HTML encoded text
source code
 
class_add(node, *class_)
Add class(es) to a node's class attribute
source code
 
class_del(node, *class_)
Remove class(es) from node's class attribute
source code
str
multiline(content, encoding='ascii', tabwidth=8, xhtml=True)
Encode multiline content to HTML, assignable to node.raw.content
source code
basestring
minify(html, encoding='ascii', fail_silently=False, comment_filter=None, cdata_containers=False)
Minify HTML
source code
Variables
dict entities = {u'AElig': u'Æ', u'AMP': u'&', u'Aacute': u'Á', u'A...
HTML named character references, generated from the HTML5 spec.
Function Details

decode(value, encoding='latin-1', errors='strict', entities=None)

source code 
Decode HTML encoded text
Parameters:
  • value (basestring) - HTML content to decode
  • encoding (str) - Unicode encoding to be applied before value is being processed further. If value is already a unicode instance, the encoding is ignored. If omitted, 'latin-1' is applied (because it can't fail and maps bytes 1:1 to unicode codepoints).
  • errors (str) - Error handling, passed to .decode() and evaluated for entities. If the entity name or character codepoint could not be found or not be parsed then the error handler has the following semantics:

    strict (or anything different from the other tokens below)

    A ValueError is raised.

    ignore

    The original entity is passed through

    replace

    The character is replaced by the replacement character (U+FFFD)

  • entities (dict) - Entity name mapping (unicode(name) -> unicode(value)). If omitted or None, the HTML5 entity list is applied.

Returns: unicode
The decoded content

class_add(node, *class_)

source code 
Add class(es) to a node's class attribute
Parameters:
  • node (TDI node) - The node to modify
  • class_ (tuple) - Class name(s) to add

class_del(node, *class_)

source code 
Remove class(es) from node's class attribute
Parameters:
  • node (TDI node) - The node to modify
  • class_ (tuple) - Class name(s) to remove. It is not an error if a class is not defined before.

multiline(content, encoding='ascii', tabwidth=8, xhtml=True)

source code 
Encode multiline content to HTML, assignable to node.raw.content
Parameters:
  • content (unicode) - Content to encode
  • encoding (str) - Target encoding
  • tabwidth (int) - Tab width? Used to expand tabs. If None, tabs are not expanded.
  • xhtml (bool) - XHTML? Only used to determine if <br> or <br /> is emitted.
Returns: str
The multilined content

minify(html, encoding='ascii', fail_silently=False, comment_filter=None, cdata_containers=False)

source code 

Minify HTML

Enclosed <script> and <style> blocks are minified as well.

Parameters:
  • html (basestring) - HTML to minify
  • encoding (str) - Initially assumed encoding. Only marginally interesting.
  • fail_silently (bool) - Fail if a parse error is encountered? If true, the parse error is passed. Otherwise it's swallowed and the input html is returned.
  • comment_filter (callable) - HTML Comment filter. A function which takes the comment data and returns a filtered comment (which is passed through to the builder) or None (meaning the comment can be stripped completely). For example:

    def keep_ad_comments(data):
        if 'google_ad_section' in data:
            return data
        return None
    

    If omitted or None, all HTML comments are stripped.

  • cdata_containers (bool) - Add CDATA containers to enclosed <script> or <style> content? If true, these containers are added after minimization of the content. Default is false.
Returns: basestring
the minified HTML - typed as input

Variables Details

entities

HTML named character references, generated from the HTML5 spec.

Type:
dict
Value:
{u'AElig': u'Æ',
 u'AMP': u'&',
 u'Aacute': u'Á',
 u'Abreve': u'Ă',
 u'Acirc': u'Â',
 u'Acy': u'А',
 u'Afr': u'𝔄',
 u'Agrave': u'À',
...