Python: module spug.web.xmlo

spug.web.xmlo

This module provides object model for capturing XML documents that I think is superior to DOM for most purposes.

Modules

copy
spug.web.cxml
os
re
weakref
_xmlplus

Classes



web.xmlo.Element
web.xmlo.Translator
xml.sax.handler.ContentHandler

web.xmlo.ContentHandler

class ContentHandler(xml.sax.handler.ContentHandler)

    Content handler to parse an Element from an XML document.

Methods defined here:

__init__(self, elementFactory)

characters(self, chars)

endElement(self, name)

endElementNS(self, name, namespace)

getDocNode(self)
Returns the root element of the document.  Returns *None* if the document has not been completely parsed.

gotDocNode(self, elem)
Called when the top-level document element is completely parsed. This may be overriden by derived classes: derived classes choosing to override it should call the base class version so that getDocNode() remains useable. parms:    elem::       [@Element] the root element of the document

startElement(self, name, attrs)

startElementNS(self, name, qname, attrs)

startPrefixMapping(self, prefix, uri)

Methods inherited from xml.sax.handler.ContentHandler:

endDocument(self)
Receive notification of the end of a document. The SAX parser will invoke this method only once, and it will be the last method invoked during the parse. The parser shall not invoke this method until it has either abandoned parsing (because of an unrecoverable error) or reached the end of input.

endPrefixMapping(self, prefix)
End the scope of a prefix-URI mapping. See startPrefixMapping for details. This event will always occur after the corresponding endElement event, but the order of endPrefixMapping events is not otherwise guaranteed.

ignorableWhitespace(self, whitespace)
Receive notification of ignorable whitespace in element content. Validating Parsers must use this method to report each chunk of ignorable whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating parsers may also use this method if they are capable of parsing and using content models. SAX parsers may return all contiguous whitespace in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information. The application must not attempt to read from the array outside of the specified range.

processingInstruction(self, target, data)
Receive notification of a processing instruction. The Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element. A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.

setDocumentLocator(self, locator)
Called by the parser to give the application a locator for locating the origin of document events. SAX parsers are strongly encouraged (though not absolutely required) to supply a locator: if it does so, it must supply the locator to the application by invoking this method before invoking any of the other methods in the DocumentHandler interface. The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. Typically, the application will use this information for reporting its own errors (such as character content that does not match an application's business rules). The information returned by the locator is probably not sufficient for use with a search engine. Note that the locator will return correct information only during the invocation of the events in this interface. The application should not attempt to use it at any other time.

skippedEntity(self, name)
Receive notification of a skipped entity. The Parser will invoke this method once for each entity skipped. Non-validating processors may skip entities if they have not seen the declarations (because, for example, the entity was declared in an external DTD subset). All processors may skip external entities, depending on the values of the http://xml.org/sax/features/external-general-entities and the http://xml.org/sax/features/external-parameter-entities properties.

startDocument(self)
Receive notification of the beginning of a document. The SAX parser will invoke this method only once, before any other methods in this interface or in DTDHandler (except for setDocumentLocator).

class Element

    Representation of an XML element.  Attributes can be accessed using the dictionary interface, child elements can be accessed using the sequence interface. When accessing attributes, either a string or a tuple of two strings may be used as an accessor: if a tuple of two strings is used, the first string is the namespace and the second string is the name.  A single string refers to an attribute without any namespace.

Methods defined here:

__delitem__(self, accessor)

__getitem__(self, accessor)
Gets the specified attribute or content item, depending on whether /accessor/ is a string (or tuple) or integer. parms:    accessor::       [string or tuple<string, string> or int] If this is a string       or tuple, returns the associated attribute.  If it is an       integer, returns the associated content item.              If this is a single string, returns the value of an attribute       in the global namespace.  If it is a tuple of two strings,       the first string is a namespace URI.              If this is an integer, returns the value of a child element.       Interspersed text is ignored.

__init__(self, *parms, **kw)
parms:    /parms/::       [tuple<string or tuple<string, string>, string or Element...>]       The tag name and its children.       "parms" must contain at least one item, the element name.  This       was originally a separate argument, but the choice of a name       for it ("name") conflicted with the commonly used "name"       attribute passed in through "kw".  Since any identifier could       have such an issue, it was decided that the best way to avoid       this was to hide it as the first element of "parms".       The element name is either a simple name or a combination of       namespace URI and simple name.  If it is a simple name, the       namespace is assumed to be that of the enclosing element.       The child elements (the remaining arguments) can either be       @Element instances or strings.  If they are strings they are       assumed to be CTEXT.    kw::       [dict<string, string>] attributes.

__len__(self)
Returns the number of content nodes in the element.

__nonzero__(self)
Because we define "len", we have to define this to always return true so that all instances of @Element are considered "true".

__setitem__(self, accessor, value)

append(self, val)
Appends a new element or text string to the contents of the element. parms:    val::       [string or @Element]

copy(self)
Returns a deep copy of the element.

deleteItem(self, index)
Deletes the item at the specified index from the contents. The more "correct" way to do this is through `__delitem__`: i.e. "#del node[index]#".  However, in Python 2.2.3, doing this to a proxy (as return from the @getParent() method) causes a *SystemError* due to a bug in the weak reference code.

formatTo(self, output, indentContent=0, indent=2)
Writes the element to the given output stream, indenting nested elements. parms:    output::       [file.write] output stream    indentContent::       [boolean] if true, multiline content is indented.    indent::       [int] number of characters to indent children

get(self, attr, default=None)

getActualNamespace(self)
Returns the actual namespace of the element name.  The "actual namespace" is the specified namespace if there is one, it is the default namespace if there was no specified namespace.

getAllChildren(self)
Returns all child elements (list<@Element>).  This actually returns a reference to an internal list, so the caller should not modify the value returned.

getChild(self, namespace, name)
Returns the first child with the given name.  Returns None if no such child exists. parms:    namespace::       [string] namespace URI    name::       [string]

getChildren(self, namespace, name)
Returns all children with the given name. parms:    namespace::       [string] namespace URI    name::       [name]

getDefaultNamespace(self)
Returns the default namespace URI for the node (string or None, None means that the default namespace is the global namespace).

getFullName(self)
Returns the namespace and name of the of the element (tuple<string, string>).  Note that the namespace is the /specified namespace/, not the /actual namespace/.

getName(self)
Returns the simple unqualified name of the element.

getNamespace(self, prefix)
Returns the namespace URI for the given prefix.  Returns None if no such prefix is defined. parms:    prefix::       [string] namespace prefix

getParent(self)
Returns the parent object (actually, a proxy to the parent object), *None* if there is no parent.

getPrefix(self, uri)
Returns the namespace prefix for the given URI.  Returns None if no prefix is found for this URI. parms:    uri::       [string] namespace URI to find the prefix for

getValue(self)
Returns the value of the element as one big string.  This will raise a ValueError if the element contains nested child elements.

hasChildren(self)
Returns true if the element has child elements.

hasContents(self)
Returns true if the element has contents (children or character data).

has_key(self, attr)
Returns true if the element has the given attribute. parms:    attr::       [string or tuple<string, string>] attribute to check for.

index(self, item)
Returns the index of the item within the contents list (an integer). parms:    item::       [string or @Element]

insert(self, index, val)

iterate(self, explorer)
Selectively iterates over the tree using the given explorer in a depth-first traversal. parms:    explorer::       [callable<any>] This object will be called for the node.       The value it returns indicates how the iteration proceeds.       The following return values are allowed:                 EXPAND::             Iterate over the children of the node.          CONTINUE::             Do not iterate over the children of the node, continue             iteration with the next node in the parent's child list.          ABORT::             Abort iteration.

setName = __setName(self, name)
Sets the full name of the element. parms:    name::       [tuple<string, string> or string] Either an unqualified name or       a tuple of namespace URI and local name.

setValue = __setValue(self, newValue)
Deletes all current contens of the element, and replaces it with the new value. parms:    newValue::       [string]

setdefault(self, attr, default)

strip(self, stripContent=0)
Strips all "unnecessary" whitespace from the element and all nested elements.  Unncessary whitespace is whitespace between child elements - when using XML for structured data representation, this is usually just for formatting. parms:    stripContent::       [boolean] if true, indicates that indentation of content       elements (nodes with no nested children

stripEmptyContents(self)
If the element contents consists of nothing but whitespace, deletes all contents.  Otherwise makes no changes to the element. Returns true if contents were removed, false if not. This can used to "clean out" whitespace content after removing a child element.

writeTo(self, output)
Writes the element to the given output stream. parms:    output::       [file.write]

Data and other attributes defined here:

ABORT = 3

CONTINUE = 2

EXPAND = 1

class Translator

    Methods defined here:

__init__(self, rx, subst)

sub(self, match)

translate(self, data)

Functions

encodeAsAttrVal(data)

encodeChars(data)

loadXML(fileName, elementFactory=None)
Loads an XML or CXML file given a filename. Returns the document object (an @Element). parms: fileName:: [string] name of the file to load. elementFactory:: [@spug.web.cxml.ElementFactory or None] object used to create elements. If *None*, the @spug.web.cxml.DefaultElementFactory is used.