1 <chapter id="tutorial">
2 <!-- $Id: tutorial.xml,v 1.4 2008-02-07 12:36:35 marc Exp $ -->
3 <title>Tutorial</title>
6 <sect1 id="tutorial-oai">
7 <title>A first &acro.oai; indexing example</title>
10 In this section, we will test the system by indexing a small set of
11 sample &acro.oai; records that are included with the &zebra; distribution,
12 running a &zebra; server against the newly created database, and
13 searching the indexes with a client that connects to that server.
16 Go to the <literal>examples/oai-pmh</literal> subdirectory of the
17 distribution archive, or make a deep copy of the Debian installation
19 <literal>/usr/share/idzebra-2.0.-examples/oai-pmh</literal>.
20 An XML file containing multiple &acro.oai;
21 records is located in the sub
22 directory <literal>examples/oai-pmh/data</literal>.
25 Additional OAI test records can be downloaded by running a shell
26 script (you may want to abort the script when you have waitet
27 longer than your coffe brews ..).
35 To index these &acro.oai; records, type:
37 zebraidx-2.0 -c conf/zebra.cfg init
38 zebraidx-2.0 -c conf/zebra.cfg update data
39 zebraidx-2.0 -c conf/zebra.cfg commit
41 In case you have not installed zebra yet but have compiled the
42 binaries from this tarball, use the following command form:
44 ../../index/zebraidx -c conf/zebra.cfg this and that
46 On some systems the &zebra; binaries are installed under the
47 generic names, you need to use the following command form:
49 zebraidx -c conf/zebra.cfg this and that
54 In this command, the word <literal>update</literal> is followed
55 by the name of a directory: <literal>zebraidx</literal> updates all
56 files in the hierarchy rooted at <literal>data</literal>.
58 <literal>-c conf/zebra.cfg</literal> points to the proper
63 You might ask yourself how &acro.xml; content is indexed using &acro.xslt;
64 stylesheets: to satisfy your curiosity, you might want to run the
65 indexing transformation on an example debugging &acro.oai; record.
67 xsltproc conf/oai2index.xsl data/debug-record.xml
69 Here you see the &acro.oai; record transformed into the indexing
70 &acro.xml; format. &zebra; is creating several inverted indexes,
71 and their name and type are clearly visible in the indexing
76 If your indexing command was successful, you are now ready to
77 fire up a server. To start a server on port 9999, type:
79 zebrasrv-2.0 -c conf/zebra.cfg @:9999
84 The &zebra; index that you have just created has a single database
85 named <literal>Default</literal>.
86 The database contains several &acro.oai; records, and the server will
87 return records in the &acro.xml; format only. The indexing machine
88 did the splitting into individual records just behind the scenes.
94 <sect1 id="tutorial-oai-sru-pqf">
95 <title>Searching the &acro.oai; database by web service</title>
98 &zebra; has a build-in web service, which is close to the
99 &acro.sru; standard web service. We use it to access our new
100 database using any &acro.xml; enabled web browser.
101 This service is using the &acro.pqf; query language.
103 section we show how to run a fully compliant &acro.sru; server,
104 including support for the query language &acro.cql;
108 Searching and retrieving &acro.xml; records is easy. For example,
109 you can point your browser to one of the following url's to
110 search for the term <literal>the</literal>. Just point your
111 browser at this link:
113 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the">
114 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the</ulink>
119 These URL's woun't work unless you have indexed the example data
120 and started an &zebra; server as outlined in the previous section.
125 In case we actually want to retrieve one record, we need to alter
126 our URl to the following
127 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
128 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
133 This way we can page through our result set in chunks of records,
134 for example, we access the 6th to the 10th record using the URL
135 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc">
136 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc
145 http://localhost:9999/?version=1.1&operation=searchRetrieve
146 &x-pquery=title%3Cthe
150 <sect1 id="tutorial-oai-sru-present">
151 <title>Presenting search results in different formats</title>
154 &zebra; uses &acro.xslt; stylesheets for both &acro.xml;record
156 display retrieval. In this example installation, they are two
157 retrieval schema's defined in
158 <literal>conf/dom-conf.xml</literal>:
159 the <literal>dc</literal> schema implemented in
160 <literal>conf/oai2dc.xsl</literal>, and
161 the <literal>zebra</literal> schema implemented in
162 <literal>conf/oai2zebra.xsl</literal>.
163 The URL's for acessing both are the same, except for the different
164 value of the <literal>recordSchema</literal> parameter:
165 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
166 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
169 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra">
170 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra
172 For the curious, one can see that the &acro.xslt; transformations
175 xsltproc conf/oai2dc.xsl data/debug-record.xml
176 xsltproc conf/oai2zebra.xsl data/debug-record.xml
178 Notice also that the &zebra; specific parameters are injected by
179 the engine when retrieving data, therefore some of the attributes
180 in the <literal>zebra</literal> retrieval schema are not filled
181 when running the transformation from the command line.
186 In addition to the user defined retrieval schema's one can always
187 choose from many build-in schema's. In case one is only
188 interested in the &zebra; internal metadata about a certain
189 record, one uses the <literal>zebra::meta</literal> schema.
190 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::meta">
191 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::meta
196 The <literal>zebra::data</literal> schema is used to retrieve the
197 original stored &acro.oai; &acro.xml; record.
198 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data">
199 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data
205 <sect1 id="tutorial-oai-sru-searches">
206 <title>More interesting searches</title>
209 The &acro.oai; indexing example defines many different index
210 names, a study of the <literal>conf/oai2index.xsl</literal>
211 stylesheet reveals the following word type indexes (i.e. those
212 swith suffix <literal>:w</literal>):
224 By default, searches do access the <literal>anr:w</literal> index,
225 but we can direct searches to any access point by constructing the
226 correct &acro.pqf; query. For example, to search in titles only,
229 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr
230 1=dc_title the&startRecord=1&maximumRecords=1&recordSchema=dc">
231 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr
232 1=dc_title the&startRecord=1&maximumRecords=1&recordSchema=dc
237 Similar we can direct searches to the other indexes defined. Or we
238 can create boolean combinations of searches on different
239 indexes. In this case we search for <literal>the</literal> in
240 <literal>dc_title</literal> and for <literal>fish</literal> in
241 <literal>dc_description</literal> using the query
242 <literal>@and @attr 1=dc_title the @attr 1=dc_description fish</literal>.
244 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and
246 @attr 1=dc_description
247 fish&startRecord=1&maximumRecords=1&recordSchema=dc">
248 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and
250 @attr 1=dc_description fish&startRecord=1&maximumRecords=1&recordSchema=dc
257 <sect1 id="tutorial-oai-sru-zebra-indexess">
258 <title>Investigating the content of the indexes</title>
261 How doess the magic work? What is inside the indexes? Why is a certain
262 record foound by a search, and another not?. The answer is in the
263 inverterd indexes. You can easily investigate them using the
264 special &zebra; schema
265 <literal>zebra::index::fieldname</literal>. In this example you
266 can see that the <literal>dc_title</literal> index has both word
267 (type <literal>:w</literal>) and phrase (type
268 <literal>:p</literal>)
270 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::dc_title">
271 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::dc_title
276 But where in the indexes did the term match for the query occur?
277 Easily answered with the special &zebra; schema
278 <literal>zebra::snippet</literal>. The matching terma are
279 encapsulated by <literal><s></literal> tags.
280 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet">
281 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet
286 How can I refine my search? Which interesting search terms are
287 found inside my hit set? Try the special &zebra; schema
288 <literal>zebra::facet::fieldname:type</literal>. In this case, we
289 investigate additional search terms for the
290 <literal>dc_title:w</literal> index.
291 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_title:w">
292 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_title:w
297 One can ask for multiple facets. Here, we want them from phrase
299 <literal>:p</literal>.
300 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_publisher:p,dc_title:p">
301 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_publisher:p,dc_title:p
308 <sect1 id="tutorial-oai-sru-yazfrontend">
309 <title>Setting up a correct &acro.sru; web service</title>
312 The &acro.sru; specification mandates that the &acro.cql; query
313 language is supported and properly configure. Also, the server
314 needs to be able to emmit a proper &acro.explain; &acro.xml;
315 record, which is used to determine the capabilities of the
316 specific server instance.
320 In this example configuration we expoit the similarities between
321 the &acro.explain; record and the &acro.cql; query language
322 configuration, we generate the later from the former using an
323 &acro.xslt; transformation.
325 xsltproc conf/explain2cqlpqftxt.xsl conf/explain.xml > conf/cql2pqf.txt
330 The we are all set to start the &acro.sru;/acro.z3950; server including
331 &acro.pqf; and &acro.cql; query configuration. It uses the &yaz; frontend
332 server configuration - just type
334 zebrasrv -f conf/yazserver.xml
339 First, we'd like to be sure that we can see the &acro.explain;
340 &acro.xml; response correctly. You might use either of these equivalent
343 url="http://localhost:9999">http://localhost:9999
346 url="http://localhost:9999/?version=1.1&operation=explain">
347 http://localhost:9999/?version=1.1&operation=explain
353 Now we can issue true &acro.sru; requests. For example,
354 <literal>dc.title=the
355 and dc.description=fish</literal> results in the following page
357 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the
358 and dc.description=fish
359 &startRecord=1&maximumRecords=1&recordSchema=dc">
360 http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the
361 and dc.description=fish &startRecord=1&maximumRecords=1&recordSchema=dc
366 Scan of indexes is a part of the &acro.sru; server business. For example,
367 scanning the <literal>dc.title</literal> index gives us an idea
368 what search terms are found there
370 url="http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.title=fish">
371 http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.title=fish
375 url="http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish">
376 http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish
378 accesses the indexed indentifiers.
382 In addition, all &zebra; internal special elemen sets or record
384 <literal>zebra::</literal> just work right out of the box
386 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the
387 and dc.description=fish
388 &startRecord=1&maximumRecords=1&recordSchema=zebra::snippet">
389 http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the
390 and dc.description=fish &startRecord=1&maximumRecords=1&recordSchema=zebra::snippet
399 <sect1 id="tutorial-oai-z3950">
400 <title>Searching the &acro.oai; database by &acro.z3950; protocol</title>
403 In this section we repeat the searches and presents we have done so
404 far using the binary &acro.z3950; protocol, you can use any
406 For instance, you can use the demo command-line client that comes
410 Connecting to the server is done by the command
412 yaz-client localhost:9999
417 When the client has connected, you can type:
428 &acro.z3950; presents using presentation stylesheets:
439 &acro.z3950; buildin Zebra presents (in this configuration only if
440 started without yaz-frontendserver):
443 Z> elements zebra::meta
446 Z> elements zebra::meta::sysno
453 Z> elements zebra::index
456 Z> elements zebra::snippet
459 Z> elements zebra::facet::any:w
462 Z> elements zebra::facet::dc_publisher:p,dc_title:p
468 &acro.z3950; searches targeted at specific indexes and boolean
469 combinations of these can be issued as well.
473 Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4
476 Z> find @attr 1=oai_datestamp @attr 4=3 2001-04-20
479 Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562
482 Z> find @attr 1=dc_title communication
485 Z> find @attr 1=dc_identifier @attr 4=3
486 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86
495 yaz-client localhost:9999
498 Z> scan @attr 1=oai_identifier @attr 4=3 oai
499 Z> scan @attr 1=oai_datestamp @attr 4=3 1
500 Z> scan @attr 1=oai_setspec @attr 4=3 2000
502 Z> scan @attr 1=dc_title communication
503 Z> scan @attr 1=dc_identifier @attr 4=3 a
508 &acro.z3950; search using server-side CQL conversion:
516 Z> find dc.creator = the
517 Z> find dc.creator = the
518 Z> find dc.title = the
520 Z> find dc.description < the
521 Z> find dc.title > some
523 Z> find dc.identifier="http://resolver.caltech.edu/CaltechCSTR:1978.2276-tr-78"
524 Z> find dc.relation = something
529 etc, etc. Notice that all indexes defined by 'type="0"' in the
530 indexing style sheet must be searched using the 'eq'
540 &acro.z3950; scan using server side CQL conversion -
541 unfortunately, this will _never_ work as it is not supported by the
542 &acro.z3950; standard.
543 If you want to use scan using server side CQL conversion, you need to
544 make an SRW connection using yaz-client, or a
545 SRU connection using REST Web Services - any browser will do.
551 All indexes defined by 'type="0"' in the
552 indexing style sheet must be searched using the '@attr 4=3'
553 structure attribute instruction.
558 Notice that searching and scan on indexes
559 <literal>dc_contributor</literal>, <literal>dc_language</literal>,
560 <literal>dc_rights</literal>, and <literal>dc_source</literal>
561 might fail, simply because none of the records in the small example set
562 have these fields set, and consequently, these indexes might not
576 <!-- Keep this comment at the end of the file
581 sgml-minimize-attributes:nil
582 sgml-always-quote-attributes:t
585 sgml-parent-document: "zebra.xml"
586 sgml-local-catalogs: nil
587 sgml-namecase-general:t