mò mã¸Ec@sìdZdkZdkZdkZdkZdklZdkZdkl Z l Z dZ dZ defd„ƒYZd„Zd „Zd fd „ƒYZedded „Zd fd„ƒYZd„Zdfd„ƒYZdfd„ƒYZdfd„ƒYZd„Zd„Zy dkZWnej on6XdkZei dƒe_!deifd„ƒYZ"dfd„ƒYZ#defd„ƒYZ$dfd „ƒYZ%d!fd"„ƒYZ&d#e&fd$„ƒYZ'd%e&fd&„ƒYZ(dS('sñHTML handling. Copyright 2003-2006 John J. Lee This code is free software; you can redistribute it and/or modify it under the terms of the BSD or ZPL 2.1 licenses (see the file COPYING.txt included with the distribution). N(surljoin(ssplit_header_wordssis_htmls!*'();:@&=+$,/?%#[]~slatin-1tCachingGeneratorFunctioncBs tZdZd„Zd„ZRS(s½Caching wrapper around a no-arguments iterable. >>> i = [1] >>> func = CachingGeneratorFunction(i) >>> list(func()) [1] >>> list(func()) [1] >>> i = [1, 2, 3] >>> func = CachingGeneratorFunction(i) >>> list(func()) [1, 2, 3] >>> i = func() >>> i.next() 1 >>> i.next() 2 >>> i.next() 3 >>> i = func() >>> j = func() >>> i.next() 1 >>> j.next() 1 >>> i.next() 2 >>> j.next() 2 >>> j.next() 3 >>> i.next() 3 >>> i.next() Traceback (most recent call last): ... StopIteration >>> j.next() Traceback (most recent call last): ... StopIteration cs%‡d†}g|_|ƒ|_dS(Nc#sxˆD] }|VqWdS(N(titerabletitem(R(R(t./data/zmath/zope/lib/python/mechanize/_html.pytmake_genTs(Rtselft_cachet _generator(RRR((RRt__init__Ss  ccsG|i}x|D] }|VqWx"|iD]}|i|ƒ|Vq(WdS(N(RRtcacheRRtappend(RRR ((Rt__call__[s   (t__name__t __module__t__doc__RR (((RR%s - cs‡d†}|S(Ncs]xV|iƒidƒD]?}x6t|gƒdD]!\}}|djo|Sq0q0WqWˆS(Ns content-typeitcharset(tresponsetinfot getheaderstcttsplit_header_wordstktvtdefault_encoding(RRRR(R(Rtencodinges  (R(RR((RRtencoding_finderds cs‡d†}|S(Ncs1|iƒidƒ}|iƒ}t||ˆƒS(Ns content-type(RRRtct_hdrstgeturlturlt_is_htmlt allow_xhtml(RRRR(R(Rtis_htmlqs (R(RR((RRt make_is_htmlps tArgscBstZd„Zd„ZRS(NcCst|ƒ|_dS(N(tdicttargs_mapRt dictionary(RR#((RRzscCs9y|i|SWn#tj ot|i|ƒSnXdS(N(RR$tkeytKeyErrortgetattrt __class__(RR%((Rt __getattr__|s(R R RR)(((RR!ys cCs ttƒƒS(N(R!tlocals(tselect_defaulttform_parser_classt request_classtbackwards_compat((Rtform_parser_args‚stLinkcBs#tZd„Zd„Zd„ZRS(NcCsfd|||gjpt‚||_t||ƒ|_||||f\|_|_ |_|_dS(N( tNoneRttagtattrstAssertionErrortbase_urlRturljoint absolute_urlttext(RR5RR8R2R3((RRŒs cCsZy<x5dD]-}t||ƒt||ƒjodSq q WWntj o dSnXdS(NRR8R2R3iÿÿÿÿi(surlstextstagsattrs(tnameR'RtothertAttributeError(RR:R9((Rt__cmp__‘s cCs&d|i|i|i|i|ifS(Ns4Link(base_url=%r, url=%r, text=%r, tag=%r, attrs=%r)(RR5RR8R2R3(R((Rt__repr__™s(R R RR<R=(((RR0‹s  cCsTt|ƒtdƒjo|i|dƒ}n|iƒ}ti|i|ƒtƒS(Nttreplace( ttypeRtdecodeRtstripturllibtquotetencodetURLQUOTE_SAFE_URL_CHARS(RR((Rt clean_urlžs t LinksFactorycBs,tZdedd„Zd„Zd„ZRS(NcCsdk}|djo |i}n||_||_|djo.hdd<dd<dd<dd<}n||_d|_d|_dS(Ntathreftareatframetsrctiframe( t _pullparsertlink_parser_classR1tTolerantPullParserRt link_classturltagst _responset _encoding(RRPRRRSRO((RR­s      .  cCs||_||_||_dS(N(RRRTRRUR5t _base_url(RRR5R((Rt set_responseÂs  c csW|i} |i}|i}|i| d|ƒ}x |i |i i ƒdgŒD]ÿ}|i djot|iƒidƒ}qPn|idjoqPnt|iƒ} |i }| idƒ}d}| i|i |ƒ}|pqPnt||ƒ}|djo-|idjo|id|fƒ}q6nt|||||iƒVqPWdS( s7Return an iterator that provides links of the document.RtbaseRJtendtagR9RIt startendtagN(RRTRRURRVR5RPtpttagsRStkeysttokentdataR"R3tgetR@R2R9R1R8RRGtget_compressed_textR0( RR9RRR8R5R[R^R2R3R((RtlinksÇs0      (R R R1R0RRWRb(((RRH«s t FormsFactorycBs5tZdZeeeed„Zd„Zd„ZRS(sŸMakes a sequence of objects satisfying ClientForm.HTMLForm interface. For constructor argument docs, see ClientForm.ParseResponse argument docs. cCswdk}||_|djo |i}n||_|djo ti}n||_||_ d|_ d|_ dS(N( t ClientFormR+RR,R1t FormParserR-t_requesttRequestR.RTR(RR+R,R-R.Rd((RRõs          cCs||_||_dS(N(RRRTR(RRR((RRWs c CsLdk}|i}|i|id|id|id|id|id|ƒS(NR+R,R-R.R( RdRRt ParseResponseRTR+R,R-R.(RRdR((Rtforms s       (R R RtFalseR1RRWRi(((RRcìs  t TitleFactorycBs#tZd„Zd„Zd„ZRS(NcCsd|_|_dS(N(R1RRTRU(R((RRscCs||_||_dS(N(RRRTRRU(RRR((RRWs cCs`dk}|i|id|iƒ}y|idƒWn|ij o dSn X|i ƒSdS(NRttitle( RORQRRTRUR[tget_tagtNoMoreTokensErrorR1tget_text(RROR[((RRls  (R R RRWRl(((RRks  csD|djp d|jo|Sn‡‡d†}tid||ƒS(Nt&cs¾|iƒ}|ddjot|dd!ˆƒSnˆi|dd!ƒ}|dj o\t |ƒ}t |ƒt dƒjo3y|i ˆƒ}Wq°t j o |}q°Xqºn|}|S(Nit#iiÿÿÿÿR>( tmatchtgrouptenttunescape_charrefRtentitiesR`treplR1tunichrR@REt UnicodeError(RrRwRt(RRv(Rtreplace_entities/s   s&#?[A-Za-z0-9]+?;(R_R1Rztretsub(R_RvRRz((RvRRtunescape+scCs—|d}}|idƒo|dd}}ntt||ƒƒ}|djo|Sn8y|i |ƒ}Wnt j od|}nX|SdS(Ni txiis&#%s;( R_R9RXt startswithRxtinttucRR1RERwRy(R_RR9RwRXR((RRuBs  s&#(x?[0-9a-fA-F]+)[^0-9a-fA-F]t MechanizeBscBsntZeiZeidƒd„feidƒd„fgZde e d„Z d„Z d„Z d„Z RS( Ns (<[^<>]*)/>cCs|}|idƒdS(Nis />(R~Rs(t.0R~((Rt]ss]*)>cCs|}d|idƒdS(Ns(R~Rs(RƒR~((RR„_scCs&||_tii||||ƒdS(N(RRRUt BeautifulSoupRR8tavoidParserProblemstinitialTextIsEverything(RRR8R‡Rˆ((RRbs cCs-td||i|iƒ}|i|ƒdS(Ns&#%s;(R}trefRt _entitydefsRUttt handle_data(RR‰R‹((Rthandle_charrefhscCs-td||i|iƒ}|i|ƒdS(Ns&%s;(R}R‰RRŠRUR‹RŒ(RR‰R‹((Rthandle_entityrefkscCsLg}x?|D]7\}}t||i|iƒ}|i||fƒq W|S(N( t escaped_attrsR3R%tvalR}RRŠRUR (RR3RRR%((Rtunescape_attrsns  (R R thtmlentitydefstname2codepointRŠR{tcompiletPARSER_MASSAGER1tTrueRRRŽR‘(((RR‚Ys  0  tRobustLinksFactorycBs;tZeidƒZdedd„Zd„Zd„Z RS(Ns\s+cCs•dk}|djo t}n||_||_|djo.hdd<dd<dd<dd<}n||_d|_d|_d|_ dS(NRIRJRKRLRMRN( R†RPR1R‚RRRRSt_bsRURV(RRPRRRSR†((RRys      .   cCs||_||_||_dS(N(tsoupRR˜R5RVRRU(RR™R5R((Rtset_soups  c cs‚dk}|i} |i}|i}| iƒ} xK| iƒD]=}t ||i ƒo$|i |iiƒdgjo|} | i| iƒ} t| ƒ}| i djo|idƒ}q=n|i| i }|i|ƒ}|pq=nt||ƒ}| id„ƒ}||ijo$| i djo d}q]d}n|iid|iƒƒ}t |||| i | ƒVq=q=WdS(NRXRJcCstS(N(R–(R‹((RR„¨sRIR>t (!R†RR˜tbsRVR5RURtrecursiveChildGeneratortgentcht isinstancetTagR9RSR]tlinkR‘R3R"t attrs_dictR`turl_attrRRGt firstTextR8tNullR1t compress_reR|RBR0( RR£RŸRRR8R¤R5R†R¢R3RœRž((RRb”s4      3   ( R R R{R”R§R1R0RRšRb(((RR—us tRobustFormsFactorycBstZd„Zd„ZRS(NcOsOdk}t||Ž}|idjo|i|_nti||i dS(N( RdR/targstkwdsR,R1tRobustFormParserRcRRR$(RR©RªRd((RRµs  cCs||_||_dS(N(RRRTR(RRR((RRW¼s (R R RRW(((RR¨´s tRobustTitleFactorycBs#tZd„Zd„Zd„ZRS(NcCsd|_|_dS(N(R1RR˜RU(R((RRÂscCs||_||_dS(N(R™RR˜RRU(RR™R((RRšÅs cCsGdk}tiidƒ}||ijodSn|id„ƒSdS(NRlcCstS(N(R–(R‹((RR„Ïs(R†RR˜tfirstRlR¦R1R¥(R™R†Rl((RRlÉs  (R R RRšRl(((RR¬Ás  tFactorycBsYtZdZeeƒedeƒd„Zd„Zd„Z d„Z d„Z d„Z RS(s/Factory for forms, links, etc. This interface may expand in future. Public methods: set_request_class(request_class) set_response(response) forms() links() Public attributes: encoding: string specifying the encoding of response if it contains a text document (this value is left unspecified for documents that do not have an encoding, e.g. an image file) is_html: true if response contains an HTML document (XHTML may be regarded as HTML too) title: page title, or None if no title or not HTML RcCs>||_||_||_||_||_ |i dƒdS(s– Pass keyword arguments only. default_encoding: character encoding to use if encoding cannot be determined (or guessed) from the response. You should turn on HTTP-EQUIV handling if you want the best chance of getting this right without resorting to this default. The default value of this parameter (currently latin-1) may change in future. N( t forms_factoryRt_forms_factoryt links_factoryt_links_factoryt title_factoryt_title_factoryt get_encodingt _get_encodingt is_html_pt _is_html_pRWR1(RR¯R±R³RµR·((RRés     cCs||i_dS(sSet urllib2.Request class. ClientForm.HTMLForm instances returned by .forms() will return instances of this class when .click()ed. N(R-RR°(RR-((Rtset_request_classscCsg||_d|_|_d|_x>dddgD]-}yt||ƒWq2t j oq2Xq2WdS(sSet response. The response must implement the same interface as objects returned by urllib2.urlopen(). RRRlN( RRRTR1t _forms_genft _links_genft _get_titleR9tdelattrR;(RRR9((RRW s  cCsç|dddgjot|i|ƒSnz¥|djo |i|iƒ|_|iSnu|djo&|i|i|iƒ|_|iSnB|djo4|io|i i ƒ|_ n d|_ |i SnWd|ii dƒXdS(NRRRli( R9R'RR(R¶RTRR¸RR´RlR1tseek(RR9((RR)s        cCs6|idjot|iiƒƒ|_n|iƒS(s6Return iterable over ClientForm.HTMLForm-like objects.N(RRºR1RR°Ri(R((RRi-scCs6|idjot|iiƒƒ|_n|iƒS(s1Return iterable over mechanize.Link-like objects.N(RR»R1RR²Rb(R((RRb4s( R R RRtDEFAULT_ENCODINGR RjRR¹RWR)RiRb(((RR®Òs    tDefaultFactorycBs#tZdZed„Zd„ZRS(sBased on sgmllib.c Cs;ti|dtƒdtƒdtƒdtd|ƒƒdS(NR¯R±R³R·R(R®RRRcRHRkR ti_want_broken_xhtml_support(RRÁ((RR=s     cCsŽti||ƒ|dj om|iiti|ƒ|iƒ|iiti|ƒ|i i ƒ|iƒ|i iti|ƒ|iƒndS(N( R®RWRRR1R°tcopyRR²RTRR´(RR((RRWFs  +(R R RRjRRW(((RRÀ;s  t RobustFactorycBs&tZdZeed„Zd„ZRS(saBased on BeautifulSoup, hopefully a bit more robust to bad HTML than is DefaultFactory. c Cs[ti|dtƒdtƒdtƒdtd|ƒƒ|djo t }n||_ dS(NR¯R±R³R·R( R®RRR¨R—R¬R RÁt soup_classR1R‚t _soup_class(RRÁRÄ((RRUs      cCsšdk}ti||ƒ|dj op|iƒ}|i|i |ƒ}|i i||i ƒ|i i ||iƒ|i ƒ|ii ||i ƒndS(N(R†R®RWRRR1treadR_RÅRR™R°R²RšRR´(RRR†R™R_((RRWbs   (R R RRjR1RRW(((RRÃPs  ()RR{RÂRCR’turlparseR6Rft _headersutilRRRRFR¿tobjectRRR R!RjR1R/R0RGRHRcRkR}RuR†t ImportErrortsgmllibR”tcharrefR‚R—R¨R¬R®RÀRÃ(RfRÃRFR®RRHRÀRËRR¬R!RR†RCR/R{R0RcR¨R—R6R’R‚R}RÂRRuR¿RGRkR ((Rt? s>$  ?    A+    ? i