Home | Trees | Indices | Help |
|
---|
|
This module provides a very lenient HTML/XML lexer. The SoupLexer class is initialized with a listener object, which receives all low level events (like starttag, endtag, text etc). Listeners must implement the ListenerInterface.
On top of the lexer there's SoupParser class, which actually implements the ListenerInterface itself (the parser listens to the lexer). The parser adds HTML semantics to the lexed data and passes the events to a building listener (BuildingListenerInterface). In addition to the events sent by the lexer the SoupParser class generates endtag events (with empty data arguments) for implicitly closed elements. Furthermore it knows about CDATA elements like <script> or <style> and modifies the lexer state accordingly.
The actual semantics are provided by a DTD query class (implementing DTDInterface.)
Copyright: Copyright 2006 - 2015 André Malo or his licensors, as applicable
License:
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Author: André Malo
Classes | |
SoupLexer (X)HTML Tagsoup Lexer |
|
DEFAULT_LEXER (X)HTML Tagsoup Lexer |
|
SoupParser The parser is actually a tagsoup parser by design in order to process most of the "HTML" that can be found out there. |
|
DEFAULT_PARSER The parser is actually a tagsoup parser by design in order to process most of the "HTML" that can be found out there. |
Variables | |
__doc__ = __doc__.encode('ascii').decode('unicode_escape')
|
|
__package__ =
|
Variables Details |
__doc__
|
__package__
|
Home | Trees | Indices | Help |
|
---|