ZCatalog Tutorial This document provides a tutorial for 'ZCatalog', the new search engine machinery in Zope. The audience for the document is content managers. Contents o What is it? What's it for? Why's it so cool? o Installing ZCatalog o ZCatalog Objects o Example using ZCatalog o Creating Search Forms And Result Reports o Using ZCatalog In A Zope Site o ZCatalog vs. Catalog What is it? What's it for? Why's it so cool? The 'ZCatalog' provides powerful indexing and searching on a Zope database using a Zope management interface. A 'ZCatalog' is a Zope object that can be added to a Folder, managed through the web, and extended in many ways. The 'ZCatalog' is a very significant project, providing a number of compelling features: o **Searches are fast**. The data structures used by the index provide extremely quick searches without consuming much memory. o **Searches are robust**. The 'ZCatalog' supports boolean search terms, proximity searches, synonyms and stopwords. o **Indexing is wildly flexible**. A 'ZCatalog' can catalog custom properties and track unique values. Since 'ZCatalog' catalogs objects instead of file handles, you can index any content that can have a Python object wrapped around it. This also lets objects participate in how they are cataloged, e.g. de-HTML-ifying contents or extracting PDF properties. o **Usable outside of Zope**. The software is broken into a Python 'Catalog' which wrapped by a 'ZCatalog'. The Python 'Catalog' can be used in any Python program; all it requires is the Z object database and the indexing machinery from Zope. o **Transactional**. An indexing operation is part of a Zope transaction. If something goes wrong after content is indexed, the index is restored to its previous condition. This also means that Undo will restore an index to its previous condition. Finally, a 'ZCatalog' can be altered privately in a Version, meaning no one else can see the changes to the index. o **Cache-friendly**. The index is internally broken into different "buckets", with each bucket being a separate Zope database object. Thus, only the part of the index that is needed is loaded into memory. Alternatively, an un-needed part of the index can be removed from memory. o **Results are lazy**. A search that returns a tremendous number of matches won't return a large result set. Only the part of the results, such as the second batch of twenty, are returned. The 'ZCatalog' is a free, Open Source part of the Zope software repository and thus is covered under the same license as Zope. It is being developed in conjunction with the Zope Portal Toolkit effort. However, the 'ZCatalog' product is managed as its own module in CVS. Installing ZCatalog 'ZCatalog' can be downloaded from the Zope download area and is also a module in the public CVS for Zope. Untar it while in the root directory of your Zope installation:: $ cd Zope-2.0.0a3-src/ $ tar xzf ../ZCatalog-x.x.tgz Windows users can use WinZip or a similar utility to accomplish the same thing. Also, Zope 2.0.0a3 does not have the latest version of UnIndex and UnTextIndex which fix a couple of bugs in the alpha 3 versions. The latest CVS of the SearchIndex packages *must* be used. Remember, you have to restart your Zope server before you will see 'ZCatalog'. ZCatalog Objects A 'ZCatalog' performs two activities: indexing information and performing searches. Most the work is done in the first step, which is getting objects into the index. This is done in two ways. First, if your objects are ZCatalog-aware they automatically update the index when they are added, edited or directly deleted. A ZCatalog-aware object is one that is an instance of a 'Z Class' that informs the 'ZCatalog' of changes. *Directly deleted* means the object was deleted from a Folder, not the deletion of a containing Folder. The second way that site contents get updated is by "finding" information "into" the 'ZCatalog'. An operation based on Zope's Find view traverses Folders looking for objects matching the criterion. The objects are then registered with the Catalog. Objects in the index but no longer in the site are removed from the Catalog. Either way -- automatically updating or walking the Folders -- 'ZCatalog' indexes the objects it finds. The 'ZCatalog' is set up to look for properties, each of which are added to the index. There are two kinds of indexes, called FieldIndex and TextIndex. FieldIndex indexes treat data atomically. The entire contents of a FieldIndex-indexed property is treated as a unit. With a TextIndex index, it is broken into words which are indexed individually. A TextIndex is also known as *full-text index*. Note that the 'ZCatalog' doesn't track ZCatalog-unaware objects after it has indexed them. This means that the 'ZCatalog' must reindex its objects occasionally when the objects have been changed. Out of date indexes can be prevented by inheriting from a ZCatalog-aware class which can tell the 'ZCatalog' to reindex it whenever a change is made. Just such a class will be included with the Portal toolkit. ZCatalogs are "searchable objects", meaning they cooperate with Z Search Interfaces documented in Z SQL Methods. Creating a search form for a 'ZCatalog' is a simple matter of adding a Z Search Interface from the management screen and filling in a form. ZCatalogs can also be queried directly from DTML, as shown in the example below. Example using Z Classes The first example shows how to give your Zope site a long-desired feature: full text-searches of your content. The example assumes you already have a number of DTML Methods/Documents to catalog. o Install 'ZCatalog' as instructed above o In the root folder of your Zope server, add a 'ZCatalog'. o Type in the id 'catalog' and hit 'Add'. You now have a brand new 'ZCatalog' named 'catalog' in your root folder. o Click on it. Now you are looking at the 'ZCatalog' 'Contents' view. It says the catalog is empty. We'll catalog some objects in a moment, but first we have to tell it what portions of objects we are specifically interested in. o Click on 'Indexes'. This management view is where the attributes to be indexed are defined. o In the 'Add index' field, type 'raw'. o Click 'Add'. Now that the indexes are defined, a set of objects can be selected for cataloging. o Click on 'Find items to ZCatalog'. For this example, we are only interested in DTML Documents and Methods. o Deselect 'All type'. o Select 'DTML Method' and 'DTML Document'. o Click 'Find'. ZCatalog will report how many items it found, and then present an interface for excluding specific objects. o Click 'Catalog Items'. Great, now that the catalog is stocked, we can create a user interface to it. o Return to the root folder's management view. o Add a 'Z Search Interface'. 'ZCatalog' participates in the Zope Search architecture. You simply have to fill in this form, and a basic user interface will be created. o Select 'catalog' in the list beside 'Select one or more searchable objects'. o Beside 'Report Id', type 'report'. o Beside 'Search Input Id', type 'search'. 'report' and 'search' are the Ids of two DTML Methods which will be created in your root folder. o Click 'Add'. Congratulations, if all has gone well, you can now find references to any word in your DTML pages. Try it by viewing 'search'. Type a common word in the 'Raw' field, and you should be presented with a list of hits. However, none of the results returned can be clicked on. To fix this, go to the management view of 'report'. 'report' is called by 'search' to display the results from 'catalog'. 'report' is just a simple '' loop with a few refinements. 'catalog' knows which results to return by looking at the REQUEST variable, which contains the input from the 'search' form. o In the source of 'report', find the following line::