1 <chapter id="tutorial">
2 <title>Tutorial</title>
5 <sect1 id="tutorial-oai">
6 <title>A first &acro.oai; indexing example</title>
9 In this section, we will test the system by indexing a small set of
10 sample &acro.oai; records that are included with the &zebra; distribution,
11 running a &zebra; server against the newly created database, and
12 searching the indexes with a client that connects to that server.
15 Go to the <literal>examples/oai-pmh</literal> subdirectory of the
16 distribution archive, or make a deep copy of the Debian installation
18 <literal>/usr/share/idzebra-2.0.-examples/oai-pmh</literal>.
19 An XML file containing multiple &acro.oai;
20 records is located in the sub
21 directory <literal>examples/oai-pmh/data</literal>.
24 Additional OAI test records can be downloaded by running a shell
25 script (you may want to abort the script when you have waited
26 longer than your coffee brews ..).
34 To index these &acro.oai; records, type:
36 zebraidx-2.0 -c conf/zebra.cfg init
37 zebraidx-2.0 -c conf/zebra.cfg update data
38 zebraidx-2.0 -c conf/zebra.cfg commit
40 In case you have not installed zebra yet but have compiled the
41 binaries from this tarball, use the following command form:
43 ../../index/zebraidx -c conf/zebra.cfg this and that
45 On some systems the &zebra; binaries are installed under the
46 generic names, you need to use the following command form:
48 zebraidx -c conf/zebra.cfg this and that
53 In this command, the word <literal>update</literal> is followed
54 by the name of a directory: <literal>zebraidx</literal> updates all
55 files in the hierarchy rooted at <literal>data</literal>.
57 <literal>-c conf/zebra.cfg</literal> points to the proper
62 You might ask yourself how &acro.xml; content is indexed using &acro.xslt;
63 stylesheets: to satisfy your curiosity, you might want to run the
64 indexing transformation on an example debugging &acro.oai; record.
66 xsltproc conf/oai2index.xsl data/debug-record.xml
68 Here you see the &acro.oai; record transformed into the indexing
69 &acro.xml; format. &zebra; is creating several inverted indexes,
70 and their name and type are clearly visible in the indexing
75 If your indexing command was successful, you are now ready to
76 fire up a server. To start a server on port 9999, type:
78 zebrasrv-2.0 -c conf/zebra.cfg @:9999
83 The &zebra; index that you have just created has a single database
84 named <literal>Default</literal>.
85 The database contains several &acro.oai; records, and the server will
86 return records in the &acro.xml; format only. The indexing machine
87 did the splitting into individual records just behind the scenes.
93 <sect1 id="tutorial-oai-sru-pqf">
94 <title>Searching the &acro.oai; database by web service</title>
97 &zebra; has a build-in web service, which is close to the
98 &acro.sru; standard web service. We use it to access our new
99 database using any &acro.xml; enabled web browser.
100 This service is using the &acro.pqf; query language.
102 section we show how to run a fully compliant &acro.sru; server,
103 including support for the query language &acro.cql;
107 Searching and retrieving &acro.xml; records is easy. For example,
108 you can point your browser to one of the following URLs to
109 search for the term <literal>the</literal>. Just point your
110 browser at this link:
112 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the">
113 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the</ulink>
118 These URLs won't work unless you have indexed the example data
119 and started an &zebra; server as outlined in the previous section.
124 In case we actually want to retrieve one record, we need to alter
125 our URL to the following
126 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
127 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
132 This way we can page through our result set in chunks of records,
133 for example, we access the 6th to the 10th record using the URL
134 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc">
135 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc
144 http://localhost:9999/?version=1.1&operation=searchRetrieve
145 &x-pquery=title%3Cthe
149 <sect1 id="tutorial-oai-sru-present">
150 <title>Presenting search results in different formats</title>
153 &zebra; uses &acro.xslt; stylesheets for both &acro.xml;record
155 display retrieval. In this example installation, they are two
156 retrieval schema's defined in
157 <literal>conf/dom-conf.xml</literal>:
158 the <literal>dc</literal> schema implemented in
159 <literal>conf/oai2dc.xsl</literal>, and
160 the <literal>zebra</literal> schema implemented in
161 <literal>conf/oai2zebra.xsl</literal>.
162 The URLs for accessing both are the same, except for the different
163 value of the <literal>recordSchema</literal> parameter:
164 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
165 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
168 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra">
169 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra
171 For the curious, one can see that the &acro.xslt; transformations
174 xsltproc conf/oai2dc.xsl data/debug-record.xml
175 xsltproc conf/oai2zebra.xsl data/debug-record.xml
177 Notice also that the &zebra; specific parameters are injected by
178 the engine when retrieving data, therefore some of the attributes
179 in the <literal>zebra</literal> retrieval schema are not filled
180 when running the transformation from the command line.
185 In addition to the user defined retrieval schema's one can always
186 choose from many build-in schema's. In case one is only
187 interested in the &zebra; internal metadata about a certain
188 record, one uses the <literal>zebra::meta</literal> schema.
189 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::meta">
190 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::meta
195 The <literal>zebra::data</literal> schema is used to retrieve the
196 original stored &acro.oai; &acro.xml; record.
197 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data">
198 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data
204 <sect1 id="tutorial-oai-sru-searches">
205 <title>More interesting searches</title>
208 The &acro.oai; indexing example defines many different index
209 names, a study of the <literal>conf/oai2index.xsl</literal>
210 stylesheet reveals the following word type indexes (i.e. those
211 with suffix <literal>:w</literal>):
223 By default, searches do access the <literal>any:w</literal> index,
224 but we can direct searches to any access point by constructing the
225 correct &acro.pqf; query. For example, to search in titles only,
228 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr
229 1=title the&startRecord=1&maximumRecords=1&recordSchema=dc">
230 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr
231 1=title the&startRecord=1&maximumRecords=1&recordSchema=dc
236 Similar we can direct searches to the other indexes defined. Or we
237 can create boolean combinations of searches on different
238 indexes. In this case we search for <literal>the</literal> in
239 <literal>title</literal> and for <literal>fish</literal> in
240 <literal>description</literal> using the query
241 <literal>@and @attr 1=title the @attr 1=description fish</literal>.
243 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and
246 fish&startRecord=1&maximumRecords=1&recordSchema=dc">
247 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and
249 @attr 1=description fish&startRecord=1&maximumRecords=1&recordSchema=dc
256 <sect1 id="tutorial-oai-sru-zebra-indexes">
257 <title>Investigating the content of the indexes</title>
260 How does the magic work? What is inside the indexes? Why is a certain
261 record found by a search, and another not?. The answer is in the
262 inverted indexes. You can easily investigate them using the
263 special &zebra; schema
264 <literal>zebra::index::fieldname</literal>. In this example you
265 can see that the <literal>title</literal> index has both word
266 (type <literal>:w</literal>) and phrase (type
267 <literal>:p</literal>)
269 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::title">
270 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::title
275 But where in the indexes did the term match for the query occur?
276 Easily answered with the special &zebra; schema
277 <literal>zebra::snippet</literal>. The matching terms are
278 encapsulated by <literal><s></literal> tags.
279 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet">
280 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet
285 How can I refine my search? Which interesting search terms are
286 found inside my hit set? Try the special &zebra; schema
287 <literal>zebra::facet::fieldname:type</literal>. In this case, we
288 investigate additional search terms for the
289 <literal>title:w</literal> index.
290 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::title:w">
291 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::title:w
296 One can ask for multiple facets. Here, we want them from phrase
298 <literal>:p</literal>.
299 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::publisher:p,title:p">
300 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::publisher:p,title:p
307 <sect1 id="tutorial-oai-sru-yazfrontend">
308 <title>Setting up a correct &acro.sru; web service</title>
311 The &acro.sru; specification mandates that the &acro.cql; query
312 language is supported and properly configure. Also, the server
313 needs to be able to emit a proper &acro.explain; &acro.xml;
314 record, which is used to determine the capabilities of the
315 specific server instance.
319 In this example configuration we exploit the similarities between
320 the &acro.explain; record and the &acro.cql; query language
321 configuration, we generate the later from the former using an
322 &acro.xslt; transformation.
324 xsltproc conf/explain2cqlpqftxt.xsl conf/explain.xml > conf/cql2pqf.txt
329 We are all set to start the &acro.sru;/acro.z3950; server including
330 &acro.pqf; and &acro.cql; query configuration. It uses the &yaz; frontend
331 server configuration - just type
333 zebrasrv -f conf/yazserver.xml
338 First, we'd like to be sure that we can see the &acro.explain;
339 &acro.xml; response correctly. You might use either of these equivalent
342 url="http://localhost:9999">http://localhost:9999
345 url="http://localhost:9999/?version=1.1&operation=explain">
346 http://localhost:9999/?version=1.1&operation=explain
352 Now we can issue true &acro.sru; requests. For example,
353 <literal>dc.title=the
354 and dc.description=fish</literal> results in the following page
356 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the
357 and dc.description=fish
358 &startRecord=1&maximumRecords=1&recordSchema=dc">
359 http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the
360 and dc.description=fish &startRecord=1&maximumRecords=1&recordSchema=dc
365 Scan of indexes is a part of the &acro.sru; server business. For example,
366 scanning the <literal>dc.title</literal> index gives us an idea
367 what search terms are found there
369 url="http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.title=fish">
370 http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.title=fish
374 url="http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish">
375 http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish
377 accesses the indexed identifiers.
381 In addition, all &zebra; internal special element sets or record
383 <literal>zebra::</literal> just work right out of the box
385 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the
386 and dc.description=fish
387 &startRecord=1&maximumRecords=1&recordSchema=zebra::snippet">
388 http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the
389 and dc.description=fish &startRecord=1&maximumRecords=1&recordSchema=zebra::snippet
398 <sect1 id="tutorial-oai-z3950">
399 <title>Searching the &acro.oai; database by &acro.z3950; protocol</title>
402 In this section we repeat the searches and presents we have done so
403 far using the binary &acro.z3950; protocol, you can use any
405 For instance, you can use the demo command-line client that comes
409 Connecting to the server is done by the command
411 yaz-client localhost:9999
416 When the client has connected, you can type:
427 &acro.z3950; presents using presentation stylesheets:
438 &acro.z3950; buildin Zebra presents (in this configuration only if
439 started without yaz-frontendserver):
442 Z> elements zebra::meta
445 Z> elements zebra::meta::sysno
452 Z> elements zebra::index
455 Z> elements zebra::snippet
458 Z> elements zebra::facet::any:w
461 Z> elements zebra::facet::publisher:p,title:p
467 &acro.z3950; searches targeted at specific indexes and boolean
468 combinations of these can be issued as well.
472 Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4
475 Z> find @attr 1=oai_datestamp @attr 4=3 2001-04-20
478 Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562
481 Z> find @attr 1=title communication
484 Z> find @attr 1=identifier @attr 4=3
485 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86
494 yaz-client localhost:9999
497 Z> scan @attr 1=oai_identifier @attr 4=3 oai
498 Z> scan @attr 1=oai_datestamp @attr 4=3 1
499 Z> scan @attr 1=oai_setspec @attr 4=3 2000
501 Z> scan @attr 1=title communication
502 Z> scan @attr 1=identifier @attr 4=3 a
507 &acro.z3950; search using server-side CQL conversion:
515 Z> find dc.creator = the
516 Z> find dc.creator = the
517 Z> find dc.title = the
519 Z> find dc.description < the
520 Z> find dc.title > some
522 Z> find dc.identifier="http://resolver.caltech.edu/CaltechCSTR:1978.2276-tr-78"
523 Z> find dc.relation = something
528 etc, etc. Notice that all indexes defined by 'type="0"' in the
529 indexing style sheet must be searched using the 'eq'
539 &acro.z3950; scan using server side CQL conversion -
540 unfortunately, this will _never_ work as it is not supported by the
541 &acro.z3950; standard.
542 If you want to use scan using server side CQL conversion, you need to
543 make an SRW connection using yaz-client, or a
544 SRU connection using REST Web Services - any browser will do.
550 All indexes defined by 'type="0"' in the
551 indexing style sheet must be searched using the '@attr 4=3'
552 structure attribute instruction.
557 Notice that searching and scan on indexes
558 <literal>contributor</literal>, <literal>language</literal>,
559 <literal>rights</literal>, and <literal>source</literal>
560 might fail, simply because none of the records in the small example set
561 have these fields set, and consequently, these indexes might not
570 <!-- Keep this comment at the end of the file
575 sgml-minimize-attributes:nil
576 sgml-always-quote-attributes:t
579 sgml-parent-document: "zebra.xml"
580 sgml-local-catalogs: nil
581 sgml-namecase-general:t