Package tdi :: Package markup :: Package soup :: Module parser :: Class SoupLexer
[frames] | no frames]

Class SoupLexer

source code

object --+
         |
        SoupLexer

(X)HTML Tagsoup Lexer

The lexer works hard to preserve the original data. In order to achieve this goal, it does not validate the input and recognizes its input in a quite lenient way.

Instance Methods
 
__init__(self, listener, conditional_ie_comments=True)
Initialization
source code
 
feed(self, food)
Feed the lexer with new data
source code
 
finalize(self)
Finalize the lexer
source code
 
cdata(self, normalize, name)
Set CDATA state
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables
    Lexer states
int CDATA = 2
Lexer state CDATA (between (P)CDATA tags)
int COMMENT = 6
Lexer state COMMENT (<!--)
int DECL = 8
Lexer state DECL (<!)
int EMPTY = 10
Lexer state EMPTY (<>)
int ENDTAG = 5
Lexer state ENDTAG (</)
int FINAL = 0
Lexer state FINAL
int MARKUP = 3
Lexer state MARKUP (<)
int MSECTION = 7
Lexer state MSECTION (<![)
int PI = 9
Lexer state PI (<?)
int STARTTAG = 4
Lexer state STARTTAG (<[letter])
int TEXT = 1
Lexer state TEXT (between tags)
Properties

Inherited from object: __class__

Method Details

__init__(self, listener, conditional_ie_comments=True)
(Constructor)

source code 

Initialization

Parameters:
  • listener (ListenerInterface) - The event listener
  • conditional_ie_comments (bool) - Handle conditional IE comments as text?

    Conditional comments are described in full detail at MSDN.

Overrides: object.__init__

feed(self, food)

source code 
Feed the lexer with new data
Parameters:
  • food (str) - The data to process

finalize(self)

source code 

Finalize the lexer

This processes the rest buffer (if any)

Raises:

cdata(self, normalize, name)

source code 
Set CDATA state

Class Variable Details

CDATA

Lexer state CDATA (between (P)CDATA tags)
Type:
int
Value:
2

COMMENT

Lexer state COMMENT (<!--)
Type:
int
Value:
6

DECL

Lexer state DECL (<!)
Type:
int
Value:
8

EMPTY

Lexer state EMPTY (<>)
Type:
int
Value:
10

ENDTAG

Lexer state ENDTAG (</)
Type:
int
Value:
5

FINAL

Lexer state FINAL
Type:
int
Value:
0

MARKUP

Lexer state MARKUP (<)
Type:
int
Value:
3

MSECTION

Lexer state MSECTION (<![)
Type:
int
Value:
7

PI

Lexer state PI (<?)
Type:
int
Value:
9

STARTTAG

Lexer state STARTTAG (<[letter])
Type:
int
Value:
4

TEXT

Lexer state TEXT (between tags)
Type:
int
Value:
1