Package tdi :: Module _htmldecode
[frames] | no frames]

Module _htmldecode

source code

HTML Decoder

HTML Decoder.

Copyright: Copyright 2006 - 2015 André Malo or his licensors, as applicable


Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Author: André Malo

decode(value, encoding='latin-1', errors='strict', entities=None)
Decode HTML encoded text
source code
  __doc__ = __doc__.encode('ascii').decode('unicode_escape')
  __package__ = 'tdi'
Function Details

decode(value, encoding='latin-1', errors='strict', entities=None)

source code 
Decode HTML encoded text
  • value (basestring) - HTML content to decode
  • encoding (str) - Unicode encoding to be applied before value is being processed further. If value is already a unicode instance, the encoding is ignored. If omitted, 'latin-1' is applied (because it can't fail and maps bytes 1:1 to unicode codepoints).
  • errors (str) - Error handling, passed to .decode() and evaluated for entities. If the entity name or character codepoint could not be found or not be parsed then the error handler has the following semantics:

    strict (or anything different from the other tokens below)
    A ValueError is raised.
    The original entity is passed through
    The character is replaced by the replacement character (U+FFFD)
  • entities (dict) - Entity name mapping (unicode(name) -> unicode(value)). If omitted or None, the HTML5 entity list is applied.

Returns: unicode
The decoded content

Variables Details