Package tdi :: Package markup :: Package soup :: Module parser
[frames] | no frames]

Module parser

source code

Markup Parser Logic

Soup Parser

This module provides a very lenient HTML/XML lexer. The SoupLexer class is initialized with a listener object, which receives all low level events (like starttag, endtag, text etc). Listeners must implement the ListenerInterface.

On top of the lexer there's SoupParser class, which actually implements the ListenerInterface itself (the parser listens to the lexer). The parser adds HTML semantics to the lexed data and passes the events to a building listener (BuildingListenerInterface). In addition to the events sent by the lexer the SoupParser class generates endtag events (with empty data arguments) for implicitly closed elements. Furthermore it knows about CDATA elements like <script> or <style> and modifies the lexer state accordingly.

The actual semantics are provided by a DTD query class (implementing DTDInterface.)


Copyright: Copyright 2006 - 2015 André Malo or his licensors, as applicable

License:

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Author: André Malo

Classes
  SoupLexer
(X)HTML Tagsoup Lexer
  DEFAULT_LEXER
(X)HTML Tagsoup Lexer
  SoupParser
The parser is actually a tagsoup parser by design in order to process most of the "HTML" that can be found out there.
  DEFAULT_PARSER
The parser is actually a tagsoup parser by design in order to process most of the "HTML" that can be found out there.
Variables
  __doc__ = __doc__.encode('ascii').decode('unicode_escape')
  __package__ = 'tdi.markup.soup'
Variables Details

__doc__

Value:
__doc__.encode('ascii').decode('unicode_escape')

__package__

Value:
'tdi.markup.soup'