mò %U²Ic@sdZdkZdkZeidƒZeidƒZeidƒZeidƒZeidƒZeidƒZ eidƒZ eid ƒZ eid ƒZ eid ƒZ eid eiƒZeidƒZeid ƒZdefd„ƒYZdeifd„ƒYZdS(sA parser for HTML and XHTML.Ns[&<]s<(/|\Z)s &[a-zA-Z#]s%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]s)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]s <[a-zA-Z]t>s--\s*>s[a-zA-Z][-.a-zA-Z0-9:_]*s_\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]*))?sê <[a-zA-Z][-.a-zA-Z0-9:_]* # tag name (?:\s+ # whitespace before attribute name (?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name (?:\s*=\s* # value indicator (?:'[^']*' # LITA-enclosed value |\"[^\"]*\" # LIT-enclosed value |[^'\">\s]+ # bare value ) )? ) )* \s* # trailing whitespace s#tHTMLParseErrorcBs)tZdZeefd„Zd„ZRS(s&Exception raised for all parse errors.cCs'||_|d|_|d|_dS(Nii(tmsgtselftpositiontlinenotoffset(RRR((t'/data/zmath/lib/python2.4/HTMLParser.pyt__init__4s  cCs[|i}|idj o|d|i}n|idj o|d|id}n|S(Ns , at line %ds , column %di(RRtresultRtNoneR(RR ((Rt__str__:s  (t__name__t __module__t__doc__R RR (((RR1s t HTMLParsercBsòtZdZdZd„Zd„Zd„Zd„Zd„Ze Z d„Z d „Z d „Z d „Zd „Zd „Zd„Zd„Zd„Zd„Zd„Zd„Zd„Zd„Zd„Zd„Zd„Zd„Zd„ZRS(sÇFind tags and other markup and call handler functions. Usage: p = HTMLParser() p.feed(data) ... p.close() Start tags are handled by calling self.handle_starttag() or self.handle_startendtag(); end tags by self.handle_endtag(). The data between tags is passed from the parser to the derived class by calling self.handle_data() with the data as argument (the data may be split up in arbitrary chunks). Entity references are passed by calling self.handle_entityref() with the entity reference as the argument. Numeric character references are passed to self.handle_charref() with the string containing the reference as the argument. tscripttstylecCs|iƒdS(s#Initialize and reset this instance.N(Rtreset(R((RRZscCs/d|_d|_t|_tii|ƒdS(s1Reset this instance. Loses all unprocessed data.ts???N(Rtrawdatatlasttagtinteresting_normalt interestingt markupbaset ParserBaseR(R((RR^s    cCs!|i||_|idƒdS(sFeed data to the parser. Call this as often as you want, with as little or as much text as you want (may include ' '). iN(RRtdatatgoahead(RR((RtfeedescCs|idƒdS(sHandle any buffered data.iN(RR(R((RtclosenscCst||iƒƒ‚dS(N(RtmessageRtgetpos(RR((RterrorrscCs|iS(s)Return full source of start tag: '<...>'.N(Rt_HTMLParser__starttag_text(R((Rtget_starttag_textwscCs t|_dS(N(tinteresting_cdataRR(R((Rtset_cdata_mode{scCs t|_dS(N(RRR(R((Rtclear_cdata_mode~sc CsÝ|i}d}t|ƒ}xp||job|ii||ƒ} | o| iƒ}n|}||jo|i |||!ƒn|i ||ƒ}||joPn|i }|d|ƒot i||ƒo|i|ƒ}nº|d|ƒo|i|ƒ}n—|d|ƒo|i|ƒ}nt|d|ƒo|i|ƒ}nQ|d|ƒo|i|ƒ}n.|d|jo|i dƒ|d}nP|djo|o|idƒnPn|i ||ƒ}q|d |ƒo…ti||ƒ} | og| iƒd d !}|i|ƒ| iƒ}|d |dƒp|d}n|i ||ƒ}qq‰Pq|d |ƒoti||ƒ} | oc| idƒ}|i|ƒ| iƒ}|d |dƒp|d}n|i ||ƒ}qnti||ƒ} | o4|o(| iƒ||jo|idƒnPq‰|d|jo'|i d ƒ|i ||dƒ}q‰PqqW|o7||jo*|i |||!ƒ|i ||ƒ}n|||_dS(Nits s junk characters in start tag: %ri(Rs/>($R RR!tcheck_for_whole_start_tagR)tendposRtattrsttagfindR-R:R5tlowerRttagtattrfindtmR<tattrnametrestt attrvaluetunescapetappendtstripRRRtcountR*trfindR tendswiththandle_startendtagthandle_starttagtCDATA_CONTENT_ELEMENTSR$(RR)R:RORLRMRRJRRRNR-R5RFRG((RR4àsL      L  ##cCs|i}ti||ƒ}|oí|iƒ}|||d!}|djo |dSn|djo_|i d|ƒo |dSn|i d|ƒodSn|i ||dƒ|i dƒn|djodSn|d jodSn|i ||ƒ|i d ƒnt d ƒ‚dS( NiRt/s/>iiÿÿÿÿsmalformed empty start tagRs6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZsmalformed start tagswe should not get here!( RRtlocatestarttagendR-R)RLR:R/tnextR2R1R tAssertionError(RR)R/RLR[R((RREs*        cCs¢|i}ti||dƒ}|pdSn|iƒ}ti||ƒ}|p|i d|||!fƒn|i dƒ}|i |i ƒƒ|iƒ|S(Niiÿÿÿÿsbad end tag: %r(RRt endendtagR,R)R-R:R/t endtagfindR R<RJt handle_endtagRIR%(RR)R/RJRR-((RR61s   cCs!|i||ƒ|i|ƒdS(N(RRWRJRGR_(RRJRG((RRVAscCsdS(N((RRJRG((RRWFscCsdS(N((RRJ((RR_JscCsdS(N((RR=((RR>NscCsdS(N((RR=((RR@RscCsdS(N((RR((RR0VscCsdS(N((RR((Rthandle_commentZscCsdS(N((Rtdecl((Rt handle_decl^scCsdS(N((RR((RRCbscCs|id|fƒdS(Nsunknown declaration: %r(RR R(RR((Rt unknown_declescCssd|jo|Sn|iddƒ}|iddƒ}|iddƒ}|idd ƒ}|id dƒ}|S( NR(s<R&s>Rs't's"RDs&(tstreplace(RRe((RRPis (sscriptR(R R RRXRRRRR R R!R"R$R%RR8R4RER6RVRWR_R>R@R0R`RbRCRcRP(((RRCs6         P 3            (RRtretcompileRR#RAR?R;R3RBt commentcloseRHRKtVERBOSERZR]R^t ExceptionRRR(R3RRZRR;RR]R#RHRR?RKRgR^RBRiRA((Rt?s"