m 0Ec@sdZyeWnej odZdZnXyeWnej odZnXy dkZWnej odZn)XeidZ ea dZdZ dk Z dk Z dkZdkZdkZdkZdkZdkZdkZdkZd klZd klZy dkZWnej od Zn Xd Zd ZdZdZeidZdZ edZ!edZ"dZ#dZ$dZ%dZ&dZ'dfdYZ(de)fdYZ*de*fdYZ+de*fd YZ,d!e*fd"YZ-d#e)fd$YZ.d%e/fd&YZ0d'fd(YZ1y dk2Z2Wn&ej od)fd*YZ3nXd)e1e2i2fd+YZ3dk4Z4eid,e4_5d-e1fd.YZ6d/e6e4i7fd0YZ8y1e i9d1 d1d1fjo endk:Z:Wnej onPXd2e6fd3YZ;d4e;e:i:fd5YZ<d6e;e:i=fd7YZ>eee8ei?e@eed8ZAeee8ei?e@eed9ZBd:fd;YZCd<ZDd=fd>YZEd?eEfd@YZFdAeFfdBYZGdCeFfdDYZHdEeFfdFYZIdGeFfdHYZJdIfdJYZKdKZLdLeEfdMYZMdNeMfdOYZNdPeMfdQYZOdReMfdSYZPdTeFfdUYZQdVeQfdWYZRdXeGfdYYZSdZeGfd[YZTd\eGfd]YZUd^eQfd_YZVd`ZWdafdbYZXdS(csHTML form handling for web clients. ClientForm is a Python module for handling HTML forms on the client side, useful for parsing HTML forms, filling them in and returning the completed forms to the server. It has developed from a port of Gisle Aas' Perl module HTML::Form, from the libwww-perl library, but the interface is not the same. The most useful docstring is the one for HTMLForm. RFC 1866: HTML 2.0 RFC 1867: Form-based File Upload in HTML RFC 2388: Returning Values from Forms: multipart/form-data HTML 3.2 Specification, W3C Recommendation 14 January 1997 (for ISINDEX) HTML 4.01 Specification, W3C Recommendation 24 December 1999 Copyright 2002-2006 John J. Lee Copyright 2005 Gary Poster Copyright 2005 Zope Corporation Copyright 1998-2000 Gisle Aas. This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file COPYING included with the distribution). iicCs|otSntSdS(N(texprtTruetFalse(R((t)/data/zmath/zope/lib/python/ClientForm.pytboolLsNcOsdS(N((tmsgtargstkwds((RtdebugSst ClientFormcOsutodSny tWn&tidiiiii}nXd|}|f|}t i|||}dS(Nis%%s %s(tOPTIMIZATION_HACKt Exceptiontsystexc_infottb_frametf_backtf_codetco_namet caller_nameRt extended_msgRt extended_argst_loggerRR(RRRRRRR((RRYs #  cCsItatitititi}|ititi |dS(N( RR RtsetLeveltloggingtDEBUGt StreamHandlerR tstdoutthandlert addHandler(R((Rt_show_debug_messagesfs (surljoin(sStringIOcCsdS(N((tmessage((Rt deprecationvscCsti|tdddS(Nt stackleveli(twarningstwarnRtDeprecationWarning(R((RRyss0.2.2islatin-1s\s+cCstid|iS(Nt (t _compress_retsubttexttstrip(R'((Rt compress_textsc CsKt|do|i}nyDt|}t|o't|dtijo tnWn7tj o+t i \}}}td|nXg}|pZx|D]K\}}tit|}tit|}|i|d|qWn:x6|D].\}}tit|}t|tijo(ti|}|i|d|q t|tijo4ti|idd}|i|d|q yt|}Wn=tj o1tit|}|i|d|q Xx2|D]*} |i|dtit| q Wq Wdi|S( svEncode a sequence of two-element tuples or dictionary into a URL query string. If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter. If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input. titemsis1not a valid non-string sequence or mapping objectt=tASCIItreplacet&N(thasattrtqueryR*tlentxttypettypest TupleTypet TypeErrorR R ttytvattbtltdoseqtktvturllibt quote_pluststrtappendt StringTypet UnicodeTypetencodetelttjoin( R0R;R8R7R9R<R=R:R2RE((Rt urlencodesH  '  ,cCsD|djp d|jo|Sn||d}tid||S(NR.cCs|i}|ddjot|dd!|Sn|i|}|dj oPt |t djo3y|i |}Wqt j o |}qXqn|}|S(Nit#iit( tmatchtgrouptenttunescape_charreftencodingtentitiestgettrepltNoneR3RDt UnicodeError(RJRORNRQRL((Rtreplace_entitiess  s&#?[A-Za-z0-9]+?;(tdataRRRORNRTtreR&(RURORNRT((RtunescapescCs|d}}|ido|dd}}ntt||}|djo|Sn8y|i |}Wnt j od|}nX|SdS(Ni R2iis&#%s;( RUtnametbaset startswithtunichrtinttucRNRRRDRQRS(RURNRXRQRYR]((RRMs  cCsdk}dkl}h}y |iWntj oh}x|iiD]d\}}||d}|i do*|i dot |dd!d}n||d| w = MimeWriter(f) ...call w.addheader(key, value) 0 or more times... followed by either: f = w.startbody(content_type) ...call f.write(data) for body data... or: w.startmultipartbody(subtype) for each part: subwriter = w.nextpart() ...use the subwriter's methods to create the subpart... w.lastpart() The subwriter is another MimeWriter instance, and should be treated in the same way as the toplevel MimeWriter. This way, writing recursive body parts is easy. Warning: don't forget to call lastpart()! XXX There should be more state so calls made in the wrong order are detected. Some special cases: - startbody() just returns the file passed to the constructor; but don't use this knowledge, as it may be changed. - startmultipartbody() actually returns a file as well; this can be used to write the initial 'if you can read this your mailer is not MIME-aware' message. - If you call flushheaders(), the headers accumulated so far are written out (and forgotten); this is useful if you don't need a body part at all, e.g. for a subpart of type message/rfc822 that's (mis)used to store some header-like information. - Passing a keyword argument 'prefix=' to addheader(), start*body() affects where the header is inserted; 0 means append at the end, 1 means insert at the start; default is append for addheader(), but insert for start*body(), which use it to determine where the Content-type header goes. cCs1||_||_g|_g|_t|_dS(N( t http_hdrstselft _http_hdrstfpt_fpt_headerst _boundaryRt _first_part(RwRyRv((Rt__init__`s     icCs|id}x|o|d o |d=qWx|o|d o |d=q4W|o)di|}|ii||fnx5t dt |D]}d||i ||yt|d|}Wntj oqnX||n X||dS(Ntstart_tdo_(tgetattrRwttagRRdR(RwRRR((Rthandle_starttagscCs8yt|d|}Wntj on X|dS(Ntend_(RRwRRRd(RwRR((Rt handle_endtags cCs |i|S(N(RwRRX(RwRX((RRWscCs|S(N(RX(RwRX((Rtunescape_attr_if_required!scCs|S(N(R(RwR((Rtunescape_attrs_if_required#s( RRRRRRR~RRRRRWRR(((RRs      s&#(x?[0-9a-fA-F]+)[^0-9a-fA-F]t_AbstractSgmllibParsercBs#tZdZdZdZRS(NcCsti||dS(N(RRRwR(RwR((Rt do_option*scCs |i|S(N(RwRRX(RwRX((RR-scCs |i|S(N(RwRR(RwR((RR/s(RRRRR(((RR)s  t FormParsercBstZdZeedZRS(s4Good for tolerance of incorrect HTML, bad for XHTML.cCs'tii|ti|||dS(N(tsgmllibt SGMLParserR~RwRRbRN(RwRbRN((RR~4s(RRRRRRR~(((RR2s it_AbstractBSFormParsercBs&tZdZdedZdZRS(NcCs'ti||||ii|dS(N(RR~RwRbRNt bs_base_class(RwRbRN((RR~AscCs'ti|||ii||dS(N(RRRwRUR(RwRU((RRDs(RRRRRRR~R(((RR?stRobustFormParsercBstZdZeiZRS(s.Tries to be highly tolerant of incorrect HTML.(RRRt BeautifulSoupR(((RRHs tNestingRobustFormParsercBstZdZeiZRS(sTries to be highly tolerant of incorrect HTML. Different from RobustFormParser in that it more often guesses nesting above missing end tags (see BeautifulSoup docs). (RRRRtICantBelieveItsBeautifulSoupR(((RRKs c Cs(t||i|t||||| S(sq Parse HTTP response and return a list of HTMLForm instances. The return value of urllib2.urlopen can be conveniently passed to this function as the response parameter. ClientForm.ParseError is raised on parse errors. response: file-like object (supporting read() method) with a method geturl(), returning the URI of the HTTP response select_default: for multiple-selection SELECT controls and RADIO controls, pick the first item as the default if none are selected in the HTML form_parser_class: class to instantiate and use to pass request_class: class to return from .click() method (default is urllib2.Request) entitydefs: mapping like {"&": "&", ...} containing HTML entity definitions (a sensible default is used) encoding: character encoding used for encoding numeric character references when matching link text. ClientForm does not attempt to find the encoding in a META HTTP-EQUIV attribute in the document itself (mechanize, for example, does do that and will pass the correct value to ClientForm using this parameter). backwards_compat: boolean that determines whether the returned HTMLForm objects are backwards-compatible with old code. If backwards_compat is true: - ClientForm 0.1 code will continue to work as before. - Label searches that do not specify a nr (number or count) will always get the first match, even if other controls match. If backwards_compat is False, label searches that have ambiguous results will raise an AmbiguityError. - Item label matching is done by strict string comparison rather than substring matching. - De-selecting individual list items is allowed even if the Item is disabled. The backwards_compat argument will be deprecated in a future release. Pass a true value for select_default if you want the behaviour specified by RFC 1866 (the HTML 2.0 standard), which is to select the first item in a RADIO or multiple-selection SELECT control if none were selected in the HTML. Most browsers (including Microsoft Internet Explorer (IE) and Netscape Navigator) instead leave all items unselected in these cases. The W3C HTML 4.0 standard leaves this behaviour undefined in the case of multiple-selection SELECT controls, but insists that at least one RADIO button should be checked at all times, in contradiction to browser behaviour. There is a choice of parsers. ClientForm.XHTMLCompatibleFormParser (uses HTMLParser.HTMLParser) works best for XHTML, ClientForm.FormParser (uses sgmllib.SGMLParser) (the default) works better for ordinary grubby HTML. Note that HTMLParser is only available in Python 2.2 and later. You can pass your own class in here as a hack to work around bad HTML, but at your own risk: there is no well-defined interface. N( t ParseFiletresponsetgeturltselect_defaultRtform_parser_classt request_classRbtbackwards_compatRN(RRt ignore_errorsRRRbRRN((Rt ParseResponseXsBc  Csh|otdn|||}xa|it}y|i |Wn!t j o}||_ nXt|tjoPq'q'W|idj o |i}ng} h}xn|iD]c}t|}| i||d} |i| } | djo|g|| (RwRR'(Rw((Rt__str__s(RRR~RRR(((RRs   cCs2|id}|dj ot|SndSdS(NR(RRPR'RRR(RR'((Rt _get_labels tControlcBs}tZdZedZdZdZdZdZdZ dZ dZ d Z d Z d Zd ZRS( s An HTML form control. An HTMLForm contains a sequence of Controls. The Controls in an HTMLForm are accessed using the HTMLForm.find_control method or the HTMLForm.controls attribute. Control instances are usually constructed using the ParseFile / ParseResponse functions. If you use those functions, you can ignore the rest of this paragraph. A Control is only properly initialised after the fixup method has been called. In fact, this is only strictly necessary for ListControl instances. This is necessary because ListControls are built up from ListControls each containing only a single item, and their initial value(s) can only be known after the sequence is complete. The types and values that are acceptable for assignment to the value attribute are defined by subclasses. If the disabled attribute is true, this represents the state typically represented by browsers by 'greying out' a control. If the disabled attribute is true, the Control will raise AttributeError if an attempt is made to change its value. In addition, the control will not be considered 'successful' as defined by the W3C HTML 4 standard -- ie. it will contribute no data to the return value of the HTMLForm.click* methods. To enable a control, set the disabled attribute to a false value. If the readonly attribute is true, the Control will raise AttributeError if an attempt is made to change its value. To make a control writable, set the readonly attribute to a false value. All controls have the disabled and readonly attributes, not only those that may have the HTML attributes of the same names. On assignment to the value attribute, the following exceptions are raised: TypeError, AttributeError (if the value attribute should not be assigned to, because the control is disabled, for example) and ValueError. If the name or value attributes are None, or the value is an empty list, or if the control is disabled, the control is not successful. Public attributes: type: string describing type of control (see the keys of the HTMLForm.type2class dictionary for the allowable values) (readonly) name: name of control (readonly) value: current value of control (subclasses may allow a single value, a sequence of values, or either) disabled: disabled state readonly: readonly state id: value of id HTML attribute cCs tdS(s type: string describing type of control (see the keys of the HTMLForm.type2class dictionary for the allowable values) name: control name attrs: HTML attributes of control's HTML element N(tNotImplementedError(RwR3RXRR((RR~AscCs||_|ii|dS(N(RRwt_formRRA(RwR((Rt add_to_formKs cCsdS(N((Rw((RROscCs tdS(N(R!(Rwtkind((Rt is_of_kindRscCs tdS(N(R!(Rw((RtclearUscCs tdS(N(R!(RwRX((RRXscCs tdS(N(R!(RwRXR((RRYscCs4g}|iD]\}}}|||fq~S(sMReturn list of (key, value) pairs suitable for passing to urlencode. N(RnRwt_totally_ordered_pairsRoR<R=(RwRnRoR<R=((Rtpairs[scCs tdS(sReturn list of (key, value, index) tuples. Like pairs, but allows preserving correct ordering even where several controls are involved. N(R!(Rw((RR'`scCsF|i}|idd|d|idd}|i|dS(s9Write data for a subitem of this control to a MimeWriter.sContent-dispositionsform-data; name="%s"iRiN( tmwRtmw2RRXRtfRR(RwR)RXRR*R+((Rt_write_mime_datais   cCs tdS(N(R!(Rw((RRrscCsXg}|io|i|in|io&|i|iii|ifn|S(sReturn all labels (Label instances) for this control. If the control was surrounded by a