Package tdi :: Package markup :: Package soup :: Module parser :: Class SoupLexer

Class SoupLexer

object --+
         |
        SoupLexer

(X)HTML Tagsoup Lexer

The lexer works hard to preserve the original data. In order to achieve this goal, it does not validate the input and recognizes its input in a quite lenient way.

Instance Methods

__init__(self, listener, conditional_ie_comments=True)
Initialization

source code

feed(self, food)
Feed the lexer with new data

source code

finalize(self)
Finalize the lexer

source code

cdata(self, normalize, name)
Set CDATA state

source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Lexer states
Class Variables
`int`	CDATA = `2` Lexer state `CDATA` (between (P)CDATA tags)
`int`	COMMENT = `6` Lexer state `COMMENT` (`<!--`)
`int`	DECL = `8` Lexer state `DECL` (`<!`)
`int`	EMPTY = `10` Lexer state `EMPTY` (`<>`)
`int`	ENDTAG = `5` Lexer state `ENDTAG` (`</`)
`int`	FINAL = `0` Lexer state `FINAL`
`int`	MARKUP = `3` Lexer state `MARKUP` (`<`)
`int`	MSECTION = `7` Lexer state `MSECTION` (`<![`)
`int`	PI = `9` Lexer state `PI` (`<?`)
`int`	STARTTAG = `4` Lexer state `STARTTAG` (`<[letter]`)
`int`	TEXT = `1` Lexer state `TEXT` (between tags)

Properties
Inherited from `object`: `__class__`

Method Details

init(self, listener, conditional_ie_comments=True)
(Constructor)

source code

Initialization

Parameters:

listener (ListenerInterface) - The event listener
conditional_ie_comments (bool) - Handle conditional IE comments as text?

Conditional comments are described in full detail at MSDN.

Overrides: object.__init__

feed(self, food)

source code

Feed the lexer with new data

Parameters:

food (str) - The data to process

finalize(self)

source code

Finalize the lexer

This processes the rest buffer (if any)

Raises:

LexerEOFError - The rest buffer could not be consumed

cdata(self, normalize, name)

source code

Set CDATA state

Class Variable Details

CDATA

Lexer state CDATA (between (P)CDATA tags)

Type:: int

Value:

COMMENT

Lexer state COMMENT (<!--)

Type:: int

Value:

DECL

Lexer state DECL (<!)

Type:: int

Value:

EMPTY

Lexer state EMPTY (<>)

Type:: int

Value:

ENDTAG

Lexer state ENDTAG (</)

Type:: int

Value:

FINAL

Lexer state FINAL

Type:: int

Value:

MARKUP

Lexer state MARKUP (<)

Type:: int

Value:

MSECTION

Lexer state MSECTION (<![)

Type:: int

Value:

PI

Lexer state PI (<?)

Type:: int

Value:

STARTTAG

Lexer state STARTTAG (<[letter])

Type:: int

Value:

TEXT

Lexer state TEXT (between tags)

Type:: int

Value:

Class SoupLexer

__init__(self, listener, conditional_ie_comments=True) (Constructor)

feed(self, food)

finalize(self)

cdata(self, normalize, name)

CDATA

COMMENT

DECL

EMPTY

ENDTAG

FINAL

MARKUP

MSECTION

PI

STARTTAG

TEXT

init(self, listener, conditional_ie_comments=True)
(Constructor)