1 <chapter id="tutorial">
2 <!-- $Id: tutorial.xml,v 1.2 2008-02-05 10:15:58 marc Exp $ -->
3 <title>Tutorial</title>
6 <sect1 id="tutorial-oai">
7 <title>A first &acro.oai; indexing example</title>
10 In this section, we will test the system by indexing a small set of
11 sample &acro.oai; records that are included with the &zebra; distribution,
12 running a &zebra; server against the newly created database, and
13 searching the indexes with a client that connects to that server.
16 Go to the <literal>examples/oai-pmh</literal> subdirectory of the
17 distribution archive, or make a deep copy of the Debian installation
19 <literal>/usr/share/idzebra-2.0.-examples/oai-pmh</literal>.
20 An XML file containing multiple &acro.oai;
21 records is located in the sub
22 directory <literal>examples/oai-pmh/data</literal>.
25 Additional OAI test records can be downloaded by running a shell
26 script (you may want to abort the script when you have waitet
27 longer than your coffe brews ..).
35 To index these &acro.oai; records, type:
37 zebraidx-2.0 -c conf/zebra.cfg init
38 zebraidx-2.0 -c conf/zebra.cfg update data/oai-caltech.xml
39 zebraidx-2.0 -c conf/zebra.cfg commit
41 In case you have not installed zebra yet but have compiled the
42 binaries from this tarball, use the following command form:
44 ../../index/zebraidx -c conf/zebra.cfg this and that
46 On some systems the &zebra; binaries are installed under the
47 generic names, you need to use the following command form:
49 zebraidx -c conf/zebra.cfg this and that
54 In this command, the word <literal>update</literal> is followed
55 by the name of a directory: <literal>zebraidx</literal> updates all
56 files in the hierarchy rooted at that directory. The command option
57 <literal>-c conf/zebra.cfg</literal> points to the proper
62 You might ask yourself how &acro.xml; content is indexed using &acro.xslt;
63 stylesheets: to satisfy your curiosity, you might want to run the
64 indexing transformation on an example debugging &acro.oai; record.
66 xsltproc conf/oai2index.xsl data/debug-record.xml
68 Here you see the &acro.oai; record transformed into the indexing
69 &acro.xml; format. &zebra; is creating several inverted indexes,
70 and their name and type are clearly visible in the indexing
75 If your indexing command was successful, you are now ready to
76 fire up a server. To start a server on port 9999, type:
78 zebrasrv-2.0 -c conf/zebra.cfg @:9999
83 The &zebra; index that you have just created has a single database
84 named <literal>Default</literal>.
85 The database contains several &acro.oai; records, and the server will
86 return records in the &acro.xml; format only. The indexing machine
87 did the splitting into individual records just behind the scenes.
93 <sect1 id="tutorial-oai-sru-pqf">
94 <title>Searching the &acro.oai; database by web service</title>
97 &zebra; has a build-in web service, which is close to the
98 &acro.sru; standard web service. We use it to access our new
99 database using any &acro.xml; enabled web browser.
100 This service is using the &acro.pqf; query language.
102 section we show how to run a fully compliant &acro.sru; server,
103 including support for the query language &acro.cql;
107 Searching and retrieving &acro.xml; records is easy. For example,
108 you can point your browser to one of the following url's to
109 search for the term <literal>the</literal>. Just point your
110 browser at this link:
111 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=creator=adam">
112 http://localhost:9999/?version=1.1&operation=searchRetrieve&query=the</ulink>
117 These URL's woun't work unless you have indexed the example data
118 and started an &zebra; server as outlined in the previous section.
123 In case we actually want to retrieve one record, we need to alter
124 our URl to the following
125 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=the&startRecord=1&maximumRecords=1&recordSchema=dc">
126 http://localhost:9999/?version=1.1&operation=searchRetrieve&query=the&startRecord=1&maximumRecords=1&recordSchema=dc
131 This way we can page through our result set in chunks of records,
132 for example, we access the 6th to the 10th record using the URL
133 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=the&startRecord=6&maximumRecords=5&recordSchema=dc">
134 http://localhost:9999/?version=1.1&operation=searchRetrieve&query=the&startRecord=6&maximumRecords=5&recordSchema=dc
143 http://localhost:9999/?version=1.1&operation=searchRetrieve
144 &query=title%3Cthe
151 <sect1 id="tutorial-oai-sru-present">
152 <title>Presenting search results in different formats</title>
158 yaz-client localhost:9999
166 Z39.50 presents using presentation stylesheets:
175 Z39.50 buildin Zebra presents (in this configuration only if
176 started without yaz-frontendserver):
179 Z> elements zebra::meta
182 Z> elements zebra::meta::sysno
189 Z> elements zebra::index
192 Z> elements zebra::snippet
195 Z> elements zebra::facet::any:w
198 Z> elements zebra::facet::any:w,dc_title:w
204 Z39.50 searches targeted at specific indexes
207 Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4
210 Z> find @attr 1=oai_datestamp @attr 4=3 2001-04-20
213 Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562
216 Z> find @attr 1=dc_title communication
219 Z> find @attr 1=dc_identifier @attr 4=3
220 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86
225 Notice that all indexes defined by 'type="0"' in the
226 indexing style sheet must be searched using the '@attr 4=3'
227 structure attribute instruction.
229 Notice also that searching and scan on indexes
230 'dc_contributor', 'dc_language', 'dc_rights', and 'dc_source'
231 fails, simply because none of the records in this example set
232 have these fields set, and consequently, these indexes are
240 <sect1 id="tutorial-oai-z3950">
241 <title>Searching the &acro.oai; database by &acro.z3950; protocol</title>
245 In this section we repeat the searches and presents we have done so
246 far using the binary &acro.z3950; protocol, you can use any
248 For instance, you can use the demo command-line client that comes
252 Connecting to the server is done by the command
254 yaz-client localhost:9999
259 When the client has connected, you can type:
270 Z39.50 presents using presentation stylesheets:
281 Z39.50 buildin Zebra presents (in this configuration only if
282 started without yaz-frontendserver):
285 Z> elements zebra::meta
288 Z> elements zebra::meta::sysno
295 Z> elements zebra::index
298 Z> elements zebra::snippet
301 Z> elements zebra::facet::any:w
304 Z> elements zebra::facet::any:w,dc_title:w
310 Z39.50 searches targeted at specific indexes and boolean
311 combinations of these can be issued as well.
315 Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4
318 Z> find @attr 1=oai_datestamp @attr 4=3 2001-04-20
321 Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562
324 Z> find @attr 1=dc_title communication
327 Z> find @attr 1=dc_identifier @attr 4=3
328 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86
335 Notice that all indexes defined by 'type="0"' in the
336 indexing style sheet must be searched using the '@attr 4=3'
337 structure attribute instruction.
341 Notice also that searching and scan on indexes
342 'dc_contributor', 'dc_language', 'dc_rights', and 'dc_source'
343 might fail, simply because none of the records in the small example set
344 have these fields set, and consequently, these indexes might not
352 <sect1 id="tutorial-oai-sru-yazfrontend">
353 <title>Setting up a correct &acro.sru; web service</title>
355 Or, alternatively, starting the SRU/SRW/Z39.50 server including
356 PQF and CQL query configuration:
358 zebrasrv -f yazserver.xml
366 Z39.50 presents using presentation stylesheets:
375 Z39.50 buildin Zebra presents (in this configuration only if
376 started without yaz-frontendserver):
378 Z> elements zebra::meta
381 Z> elements zebra::meta::sysno
388 Z> elements zebra::index
391 Z> elements zebra::snippet
394 Z> elements zebra::facet::any:w
397 Z> elements zebra::facet::any:w,dc_title:w
402 Z39.50 searches targeted at specific indexes
405 Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4
408 Z> find @attr 1=oai_datestamp @attr 4=3 2001-04-20
411 Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562
414 Z> find @attr 1=dc_title communication
417 Z> find @attr 1=dc_identifier @attr 4=3
418 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86
423 Notice that all indexes defined by 'type="0"' in the
424 indexing style sheet must be searched using the '@attr 4=3'
425 structure attribute instruction.
427 Notice also that searching and scan on indexes
428 'dc_contributor', 'dc_language', 'dc_rights', and 'dc_source'
429 fails, simply because none of the records in this example set
430 have these fields set, and consequently, these indexes are
439 yaz-client localhost:9999
442 Z> scan @attr 1=oai_identifier @attr 4=3 oai
443 Z> scan @attr 1=oai_datestamp @attr 4=3 1
444 Z> scan @attr 1=oai_setspec @attr 4=3 2000
446 Z> scan @attr 1=dc_title communication
447 Z> scan @attr 1=dc_identifier @attr 4=3 a
452 Z39.50 search using server-side CQL conversion:
460 Z> find creator = the
461 Z> find dc.creator = the
464 Z> find description < the
465 Z> find title le some
466 Z> find title ge some
469 Z> find identifier eq
470 "http://resolver.caltech.edu/CaltechCSTR:1978.2276-tr-78"
471 Z> find relation eq something
474 etc, etc. Notice that all indexes defined by 'type="0"' in the
475 indexing style sheet must be searched using the 'eq'
483 Z39.50 scan using server side CQL conversion:
485 Unfortunately, this will _never_ work as it is not supported by the
487 If you want to use scan using server side CQL conversion, you need to
488 make an SRW connection using yaz-client, or a
489 SRU connection using REST Web Services - any browser will do.
492 SRU Explain ZeeRex response:
494 http://localhost:9999/
495 http://localhost:9999/?version=1.1&operation=explain
498 SRU Search Retrieve records:
500 http://localhost:9999/?version=1.1&operation=searchRetrieve
503 http://localhost:9999/?version=1.1&operation=searchRetrieve
504 &query=date=1978-01-01
505 &startRecord=1&maximumRecords=1&recordSchema=dc
507 http://localhost:9999/?version=1.1&operation=searchRetrieve
510 http://localhost:9999/?version=1.1&operation=searchRetrieve
511 &query=description=the
516 http://localhost:9999/?version=1.1&operation=searchRetrieve
522 http://localhost:9999/?version=1.1&operation=scan&scanClause=title=a
523 http://localhost:9999/?version=1.1&operation=scan
524 &scanClause=identifier%20eq%20a
526 Notice: you need to use the 'eq' relation for all @attr 4=3 indexes
530 SRW explain with CQL index points:
532 Z> open http://localhost:9999
535 Notice: when opening a connection using the 'http.//' prefix, yaz-client
536 uses SRW SOAP connections, and 'form xml' and 'querytype cql' are
540 SRW search using implicit server side CQL:
542 Z> open http://localhost:9999
543 Z> find identifier eq
544 "http://resolver.caltech.edu/CaltechCSTR:1978.2276-tr-78"
545 Z> find description < the
548 In SRW connection mode, the follwing fails due to problem in yaz-client:
553 SRW scan using implicit server side CQL:
555 yaz-client http://localhost:9999
556 Z> scan title = communication
557 Z> scan identifier eq a
559 Notice: you need to use the 'eq' relation for all @attr 4=3 indexes
571 <!-- Keep this comment at the end of the file
576 sgml-minimize-attributes:nil
577 sgml-always-quote-attributes:t
580 sgml-parent-document: "zebra.xml"
581 sgml-local-catalogs: nil
582 sgml-namecase-general:t