m (Ec@sdZdkZdkZdkZdkZdkZdkZdkZdklZdk l Z dZ dZ ei iei ie dklZdklZdklZd klZd klZd klZdkZd klZd kl Z dk!l"Z"l#Z#dk!l$Z$l%Z%dk&l'Z'dk(l)Z)dZ*dZ+dZ,dfdYZ-defdYZ.dZ/e0djoei1e,ndS(sMH mail indexer. To index messages from a single folder (messages defaults to 'all'): mhindex.py [options] -u +folder [messages ...] To bulk index all messages from several folders: mhindex.py [options] -b folder ...; the folder name ALL means all folders. To execute a single query: mhindex.py [options] query To enter interactive query mode: mhindex.py [options] Common options: -d FILE -- specify the Data.fs to use (default ~/.Data.fs) -w -- dump the word list in alphabetical order and exit -W -- dump the word list ordered by word id and exit Indexing options: -O -- do a prescan on the data to compute optimal word id assignments; this is only useful the first time the Data.fs is used -t N -- commit a transaction after every N messages (default 20000) -p N -- pack after every N commits (by default no packing is done) Querying options: -m N -- show at most N matching lines from the message (default 3) -n N -- show the N best matching messages (default 3) N(sStringIO(sST_MTIMEs ~/.Data.fss~/projects/Zope/lib/python(sDB(s FileStorage(s Persistent(sIOBTree(sOIBTree(sIIBTree(sNBest(s OkapiIndex(sLexiconsSplitter(sCaseNormalizersStopWordRemover(s QueryParser(s get_stopdicticCsgy#titidd\}}Wn&tij o}|GHdGHdSnXd}d}d}t } t }tiit}d}d}d}} } xC|D];\}} |djo d}n|djo | }n|d jo d} n|d jo tGHdSn|d jot| }n|d jot| } n|d jo d}n|djot| }n|djot| }n|djo d}n|djo d}n|djo d} qqWt|d|p|d|d|}| o|in|o|in| o|in|p| p| odSn|o)|o|i |n|i|n|o|i|n|oxrtt |D]^} || } d| joA| ddjod| dd|| itFromtTotCctBcctSubjecttDates%-8s %st:R(RRRRRR(0R2R8thas_keytstopt_[1]tretfindallRitlowertwRaR'tpatterntreplacetcompilet IGNORECASEtprogtlotrankR,t query_weighttqwRkthiRJR{R-RRR;tgetpathR@tfptIOErrortOSErrorR R9tMessagetheadert getheaderthtgetmessagetextRtnlefttpartt splitlinesRmtsearch(R2RiRkRRRRRRRJR{RRRR RaRRmRRRRRR((R)Rx;sT D           c Cssd} g} xS|D]K}|ido(| djo|d} q^dGHdSq| i|qW| p|ii } n| p dg} ny|ii | }Wn!t i j o}|GHdSnXh}xl| D]d}y|i|} Wn-t i j o}|p d| GHdSnXx| D]}|||i(smultipart/alternativesmultipart/mixed(RtgettypetctypetleveltstrRTR\t getbodytextt getbodypartsRR2RtStringIOR[R9RR(R2RRTRR[RR((R)Rs     cCscg}x5dD]-}|i|}|o|i|q q W|o|idi|ndS(Ntfromttotcctbcctsubjects (RRsccRR(tHtkeyRRYtvalueR\RTR'(R2RRTRRR((R)RscCs|ii|}|dj o|i||i|<|Sn|id}||_||i |<|i||i|<||i|<|S(Ni( R2RHRYRRJRcRR.RLR-(R2RRJ((R)Rs     cCs^tii|ii|}yti|}Wntij o }dSnXt |t S(Ni( RRR'R2R;RtstattstR R RtST_MTIME(R2RRR ((R)Rs  cCsE|id7_|i|ijo djno|indS(Nii(R2R5R3R(R2((R)R$s$cCsq|idjo]dGHtid|_|id7_|i|ijo djno|iqmndS(Nis committing...i(R2R5t transactionRR6R4R(R2((R)R)s  $cCs3|idjodGH|iid|_ndS(Nis packing...(R2R6R?R(R2((R)R2s (#t__name__t __module__RcR=R?RARBRPRRRRdRRR(RqR&RrRtmaxintRxR RRR"RRRRRRRRRR(((R)R s2'  ,  ) (        RDcBs8tZdZdZdZddZdZRS(NcCs4tttt|_t|i|_dS(N(tLexiconRRRR2RNRRR,(R2((R)RP:scCs |ii||d|_dS(Ni(R2R,t index_docRJRit _p_changed(R2RJRi((R)R>scCs|ii|d|_dS(Ni(R2R,t unindex_docRJR(R2RJ((R)RBsi cCst|i}|i|}|i|i}|djogdfSnt |}|i|i|it|fS(Ni(t QueryParserR2RNtparsert parseQueryR&ttreet executeQueryR,RkRctNBestRtchoosertaddmanytitemstgetbestR$(R2R&RRRRRk((R)R&Fs  cCs:t|i}|i|}|i}|ii |S(N( RR2RNRRR&RttermsR,R(R2R&RRR((R)RQs (RRRPRRR&R(((R)RD8s     cCstidS(N(t tracebackt print_exc(((R)RuWst__main__(2RRRRRR9RRRRRRtZOPECODERR\RtZODBR>tZODB.FileStorageR<t Persistencet PersistenttBTrees.IOBTreeREtBTrees.OIBTreeRGtBTrees.IIBTreeRFRtProducts.ZCTextIndex.NBestRtProducts.ZCTextIndex.OkapiIndexRRtProducts.ZCTextIndex.LexiconRRRRt Products.ZCTextIndex.QueryParserRtProducts.ZCTextIndex.StopDictR7RRR*R RDRuRtexit( RR>R RRRDR<RRRRR*RRR7RER RRRRRRuRRRFRGRR9RRRR((R)t?sD                     G