mò
mã¸Ec @ sß d Z d k Z d k Z d k Z d k Z d k l Z l Z d e f d „ ƒ YZ d f d „ ƒ YZ
d „ Z d f d „ ƒ YZ d
e e i f d „ ƒ YZ
d e e i f d
„ ƒ YZ d „ Z e d j o e ƒ n d S( sà A simple "pull API" for HTML parsing, after Perl's HTML::TokeParser.
Examples
This program extracts all links from a document. It will print one
line for each link, containing the URL and the textual description
between the ... tags:
import pullparser, sys
f = file(sys.argv[1])
p = pullparser.PullParser(f)
for token in p.tags("a"):
if token.type == "endtag": continue
url = dict(token.attrs).get("href", "-")
text = p.get_compressed_text(endat=("endtag", "a"))
print "%s %s" % (url, text)
This program extracts the
from the document:
import pullparser, sys
f = file(sys.argv[1])
p = pullparser.PullParser(f)
if p.get_tag("title"):
title = p.get_compressed_text()
print "Title: %s" % title
Copyright 2003-2006 John J. Lee
Copyright 1998-2001 Gisle Aas (original libwww-perl code)
This code is free software; you can redistribute it and/or modify it
under the terms of the BSD or ZPL 2.1 licenses.
N( s unescapes unescape_charreft NoMoreTokensErrorc B s t Z RS( N( t __name__t
__module__( ( ( t4 /data/zmath/zope/lib/python/mechanize/_pullparser.pyR * s t Tokenc B s> t Z d Z e d „ Z d „ Z d „ Z d „ Z d „ Z RS( sk Represents an HTML tag, declaration, processing instruction etc.
Behaves as both a tuple-like object (ie. iterable) and has attributes
.type, .data and .attrs.
>>> t = Token("starttag", "a", [("href", "http://www.python.org/")])
>>> t == ("starttag", "a", [("href", "http://www.python.org/")])
True
>>> (t.type, t.data) == ("starttag", "a")
True
>>> t.attrs == [("href", "http://www.python.org/")]
True
Public attributes
type: one of "starttag", "endtag", "startendtag", "charref", "entityref",
"data", "comment", "decl", "pi", after the corresponding methods of
HTMLParser.HTMLParser
data: For a tag, the tag name; otherwise, the relevant data carried by the
tag, as a string
attrs: list of (name, value) pairs representing HTML attributes
(or None if token does not represent an opening tag)
c C s | | _ | | _ | | _ d S( N( t typet selft datat attrs( R R R R ( ( R t __init__E s c C s t | i | i | i f ƒ S( N( t iterR R R R ( R ( ( R t __iter__I s c C sO | \ } } } | i | j o( | i | j o | i | j o t Sn t Sd S( N( t otherR R R R t Truet False( R R R R R ( ( R t __eq__K s 0c C s | i | ƒ S( N( R R R ( R R ( ( R t __ne__S s c C s<