2 <!-- $Id: server.xml,v 1.19 2006-04-25 12:26:26 marc Exp $ -->
3 <title>The Z39.50 Server</title>
6 <title>Running the Z39.50 Server (zebrasrv)</title>
9 FIXME - We need to be consistent here, zebraidx had the options at the
10 end, and lots of explaining text before them. Same for zebrasvr! -H
11 FIXME - At least we need a small intro, what is zebrasvr, and how it
12 can be run (inetd, nt service, stand-alone program, daemon...) -H
15 <!-- re-write by MC, using the newly created input files for the
19 <sect2><title>Description</title>
20 <para>Zebra is a high-performance, general-purpose structured text indexing
21 and retrieval engine. It reads structured records in a variety of input
22 formats (eg. email, XML, MARC) and allows access to them through exact
23 boolean search expressions and relevance-ranked free-text queries.
26 <command>zebrasrv</command> is the Z39.50 and <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>/U frontend
27 server for the <command>Zebra</command> indexer.
30 On Unix you can run the <command>zebrasrv</command>
31 server from the command line - and put it
32 in the background. It may also operate under the inet daemon.
33 On WIN32 you can run the server as a console application or
39 <title>Synopsis</title>
44 <title>Options</title>
47 The options for <command>zebrasrv</command> are the same
48 as those for YAZ' <command>yaz-ztest</command>.
49 Option <literal>-c</literal> specifies a Zebra configuration
50 file - if omitted <filename>zebra.cfg</filename> is read.
56 <sect2><title>Files</title>
58 <filename>zebra.cfg</filename>
61 <sect2><title>See Also</title>
64 <refentrytitle>zebraidx</refentrytitle>
65 <manvolnum>1</manvolnum>
68 <refentrytitle>yaz-ztest</refentrytitle>
69 <manvolnum>8</manvolnum>
73 The Zebra software is Copyright <command>Index Data</command>
74 <filename>http://www.indexdata.dk</filename>
75 and distributed under the
82 <emphasis remap="bf">Syntax</emphasis>
85 zebrasrv [options] [listener-address ...]
91 <emphasis remap="bf">Options</emphasis>
95 <term>-a <replaceable>APDU file</replaceable></term>
98 Specify a file for dumping PDUs (for diagnostic purposes).
99 The special name "-" sends output to <literal>stderr</literal>.
104 <term>-c <replaceable>config-file</replaceable></term>
107 Read configuration information from
108 <replaceable>config-file</replaceable>.
109 The default configuration is <literal>./zebra.cfg</literal>.
117 Don't fork on connection requests. This can be useful for
118 symbolic-level debugging. The server can only accept a single
119 connection in this mode.
127 Use the Z39.50 protocol. Currently the only protocol supported.
128 The option is retained for historical reasons, and for future
134 <term>-l <replaceable>logfile</replaceable></term>
137 Specify an output file for the diagnostic messages.
138 The default is to write this information to <literal>stderr</literal>.
143 <term>-v <replaceable>log-level</replaceable></term>
146 The log level. Use a comma-separated list of members of the set
147 {fatal,debug,warn,log,all,none}.
152 <term>-u <replaceable>username</replaceable></term>
155 Set user ID. Sets the real UID of the server process to that of the
156 given <replaceable>username</replaceable>.
157 It's useful if you aren't comfortable with having the
158 server run as root, but you need to start it as such to bind a
164 <term>-w <replaceable>working-directory</replaceable></term>
167 Change working directory.
175 Run under the Internet superserver, <literal>inetd</literal>.
176 Make sure you use the logfile option <literal>-l</literal> in
177 conjunction with this mode and specify the <literal>-l</literal>
178 option before any other options.
183 <term>-t <replaceable>timeout</replaceable></term>
186 Set the idle session timeout (default 60 minutes).
191 <term>-k <replaceable>kilobytes</replaceable></term>
194 Set the (approximate) maximum size of
195 present response messages. Default is 1024 KB (1 MB).
205 <sect1 id="protocol-support">
206 <title>Z39.50 Protocol Support and Behavior</title>
209 <title>Initialization</title>
212 During initialization, the server will negotiate to version 3 of the
213 Z39.50 protocol, and the option bits for Search, Present, Scan,
214 NamedResultSets, and concurrentOperations will be set, if requested by
215 the client. The maximum PDU size is negotiated down to a maximum of
222 <title>Search</title>
225 FIXME - Need to explain the string tag stuff before people get bogged
226 down with all these attribute numbers. Perhaps in its own
231 The supported query type are 1 and 101. All operators are currently
232 supported with the restriction that only proximity units of type "word"
233 are supported for the proximity operator.
234 Queries can be arbitrarily complex.
235 Named result sets are supported, and result sets can be used as operands
237 Searches may span multiple databases.
241 The server has full support for piggy-backed retrieval (see
242 also the following section).
246 <emphasis>Use</emphasis> attributes are interpreted according to the
247 attribute sets which have been loaded in the
248 <literal>zebra.cfg</literal> file, and are matched against specific
249 fields as specified in the <literal>.abs</literal> file which
250 describes the profile of the records which have been loaded.
251 If no Use attribute is provided, a default of Bib-1 Any is assumed.
255 If a <emphasis>Structure</emphasis> attribute of
256 <emphasis>Phrase</emphasis> is used in conjunction with a
257 <emphasis>Completeness</emphasis> attribute of
258 <emphasis>Complete (Sub)field</emphasis>, the term is matched
259 against the contents of the phrase (long word) register, if one
260 exists for the given <emphasis>Use</emphasis> attribute.
261 A phrase register is created for those fields in the
262 <literal>.abs</literal> file that contains a
263 <literal>p</literal>-specifier.
264 <!-- ### whatever the hell _that_ is -->
268 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
269 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
270 default value for <emphasis>Completeness</emphasis>, the
271 search is directed against the normal word registers, but if the term
272 contains multiple words, the term will only match if all of the words
273 are found immediately adjacent, and in the given order.
274 The word search is performed on those fields that are indexed as
275 type <literal>w</literal> in the <literal>.abs</literal> file.
279 If the <emphasis>Structure</emphasis> attribute is
280 <emphasis>Word List</emphasis>,
281 <emphasis>Free-form Text</emphasis>, or
282 <emphasis>Document Text</emphasis>, the term is treated as a
283 natural-language, relevance-ranked query.
284 This search type uses the word register, i.e. those fields
285 that are indexed as type <literal>w</literal> in the
286 <literal>.abs</literal> file.
290 If the <emphasis>Structure</emphasis> attribute is
291 <emphasis>Numeric String</emphasis> the term is treated as an integer.
292 The search is performed on those fields that are indexed
293 as type <literal>n</literal> in the <literal>.abs</literal> file.
297 If the <emphasis>Structure</emphasis> attribute is
298 <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
299 The search is performed on those fields that are indexed as type
300 <literal>u</literal> in the <literal>.abs</literal> file.
304 If the <emphasis>Structure</emphasis> attribute is
305 <emphasis>Local Number</emphasis> the term is treated as
306 native Zebra Record Identifier.
310 If the <emphasis>Relation</emphasis> attribute is
311 <emphasis>Equals</emphasis> (default), the term is matched
312 in a normal fashion (modulo truncation and processing of
313 individual words, if required).
314 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
315 <emphasis>Less Than or Equal</emphasis>,
316 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
317 Equal</emphasis>, the term is assumed to be numerical, and a
318 standard regular expression is constructed to match the given
320 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
321 the standard natural-language query processor is invoked.
325 For the <emphasis>Truncation</emphasis> attribute,
326 <emphasis>No Truncation</emphasis> is the default.
327 <emphasis>Left Truncation</emphasis> is not supported.
328 <emphasis>Process # in search term</emphasis> is supported, as is
329 <emphasis>Regxp-1</emphasis>.
330 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
331 search. As a default, a single error (deletion, insertion,
332 replacement) is accepted when terms are matched against the register
337 <title>Regular expressions</title>
340 Each term in a query is interpreted as a regular expression if
341 the truncation value is either <emphasis>Regxp-1</emphasis> (102)
342 or <emphasis>Regxp-2</emphasis> (103).
343 Both query types follow the same syntax with the operands:
350 Matches the character <emphasis>x</emphasis>.
358 Matches any character.
363 <term><literal>[</literal>..<literal>]</literal></term>
366 Matches the set of characters specified;
367 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
379 Matches <emphasis>x</emphasis> zero or more times. Priority: high.
387 Matches <emphasis>x</emphasis> one or more times. Priority: high.
395 Matches <emphasis>x</emphasis> zero or once. Priority: high.
403 Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
412 Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
418 The order of evaluation may be changed by using parentheses.
422 If the first character of the <emphasis>Regxp-2</emphasis> query
423 is a plus character (<literal>+</literal>) it marks the
424 beginning of a section with non-standard specifiers.
425 The next plus character marks the end of the section.
426 Currently Zebra only supports one specifier, the error tolerance,
427 which consists one digit.
431 Since the plus operator is normally a suffix operator the addition to
432 the query syntax doesn't violate the syntax for standard regular
439 <title>Query examples</title>
442 Phrase search for <emphasis>information retrieval</emphasis> in
445 @attr 1=4 "information retrieval"
450 Ranked search for the same thing:
452 @attr 1=4 @attr 2=102 "Information retrieval"
457 Phrase search with a regular expression:
459 @attr 1=4 @attr 5=102 "informat.* retrieval"
464 Ranked search with a regular expression:
466 @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
471 In the GILS schema (<literal>gils.abs</literal>), the
472 west-bounding-coordinate is indexed as type <literal>n</literal>,
473 and is therefore searched by specifying
474 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
475 To match all those records with west-bounding-coordinate greater
476 than -114 we use the following query:
478 @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
485 <title>Present</title>
487 The present facility is supported in a standard fashion. The requested
488 record syntax is matched against the ones supported by the profile of
489 each record retrieved. If no record syntax is given, SUTRS is the
490 default. The requested element set name, again, is matched against any
491 provided by the relevant record profiles.
497 The attribute combinations provided with the termListAndStartPoint are
498 processed in the same way as operands in a query (see above).
499 Currently, only the term and the globalOccurrences are returned with
500 the termInfo structure.
507 Z39.50 specifies three different types of sort criteria.
508 Of these Zebra supports the attribute specification type in which
509 case the use attribute specifies the "Sort register".
510 Sort registers are created for those fields that are of type "sort" in
511 the default.idx file.
512 The corresponding character mapping file in default.idx specifies the
513 ordinal of each character used in the actual sort.
517 Z39.50 allows the client to specify sorting on one or more input
518 result sets and one output result set.
519 Zebra supports sorting on one result set only which may or may not
520 be the same as the output result set.
526 If a Close PDU is received, the server will respond with a Close PDU
527 with reason=FINISHED, no matter which protocol version was negotiated
528 during initialization. If the protocol version is 3 or more, the
529 server will generate a Close PDU under certain circumstances,
530 including a session timeout (60 minutes by default), and certain kinds of
531 protocol errors. Once a Close PDU has been sent, the protocol
532 association is considered broken, and the transport connection will be
533 closed immediately upon receipt of further data, or following a short
541 <chapter id="server-sru">
542 <title>The SRU/SRW Server</title>
544 In addition to Z39.50, Zebra supports the more recent and
545 web-friendly IR protocol SRU, described at
546 <ulink url="http://www.loc.gov/sru"/>.
547 SRU is ``Search/Retrieve via URL'', a simple, REST-like protocol
548 that uses HTTP GET to request search responses. The request
549 itself is made of parameters such as
550 <literal>query</literal>,
551 <literal>startRecord</literal>,
552 <literal>maximumRecords</literal>
554 <literal>recordSchema</literal>;
555 the response is an XML document containing hit-count, result-set
556 records, diagnostics, etc. SRU can be thought of as a re-casting
557 of Z39.50 semantics in web-friendly terms; or as a standardisation
558 of the ad-hoc query parameters used by search engines such as Google
559 and AltaVista; or as a superset of A9's OpenSearch (which it
563 Zebra further supports SRW, described at
564 <ulink url="http://www.loc.gov/srw"/>.
565 SRW is the ``Search/Retrieve Web Service'', a SOAP-based alternative
566 implementation of the abstract protocol that SRU implements as HTTP
567 GET requests. In SRW, requests are encoded as XML documents which
568 are posted to the server. The responses are identical to those
569 returned by SRU servers, except that they are wrapped in a several
570 layers of SOAP envelope.
573 Zebra supports all three protocols - Z39.50, SRU and SRW - on the
574 same port, recognising what protocol is used by each incoming
575 requests and handling them accordingly. This is a achieved through
576 the use of Deep Magic; civilians are warned not to stand too close.
579 From here on, ``SRU'' is used to indicate both the SRU and SRW
580 protocols, as they are identical except for the transport used for
581 the protocol packets and Zebra's support for them is equivalent.
584 <sect1 id="server-sru-run">
585 <title>Running the SRU Server (zebrasrv)</title>
587 Because Zebra supports all three protocols on one port, it would
588 seem to follow that the SRU server is run in the same way as
589 the Z39.50 server, as described above. This is true, but only in
590 an uninterestingly vacuous way: a Zebra server run in this manner
591 will indeed recognise and accept SRU requests; but since it
592 doesn't know how to handle the CQL queries that these protocols
593 use, all it can do is send failure responses.
597 It is possible to cheat, by having SRU search Zebra with
598 a PQF query instead of CQL, using the
599 <literal>x-pquery</literal>
601 <literal>query</literal>.
603 <emphasis role="strong">non-standard extension</emphasis>
605 <emphasis role="strong">very naughty</emphasis>
606 thing to do, but it does give you a way to see Zebra serving SRU
607 ``right out of the box''. If you start your favourite Zebra
608 server in the usual way, on port 9999, then you can send your web
612 http://localhost:9999/Default?version=1.1
613 &operation=searchRetrieve
614 &x-pquery=mineral
616 &maximumRecords=1
619 This will display the XML-formatted SRU response that includes the
620 first record in the result-set found by the query
621 <literal>mineral</literal>. (For clarity, the SRU URL is shown
622 here broken across lines, but the lines should be joined to gether
623 to make single-line URL for the browser to submit.)
627 In order to turn on Zebra's support for CQL queries, it's necessary
628 to have the YAZ generic front-end (which Zebra uses) translate them
629 into the Z39.50 Type-1 query format that is used internally. And
630 to do this, the generic front-end's own configuration file must be
631 used. This file is described
632 <link linkend="gfs-config">elsewhere</link>;
633 the salient point for SRU support is that
634 <command>zebrasrv</command>
635 must be started with the
636 <literal>-f frontendConfigFile</literal>
637 option rather than the
638 <literal>-c zebraConfigFile</literal>
640 and that the front-end configuration file must include both a
641 reference to the Zebra configuration file and the CQL-to-PQF
642 translator configuration file.
645 A minimal front-end configuration file that does this would read as
651 <config>zebra.cfg</config>
652 <cql2rpn>../../tab/pqf.properties</cql2rpn>
658 <literal><config></literal>
659 element contains the name of the Zebra configuration file that was
660 previously specified by the
661 <literal>-c</literal>
662 command-line argument, and the
663 <literal><cql2rpn></literal>
664 element contains the name of the CQL properties file specifying how
665 various CQL indexes, relations, etc. are translated into Type-1
669 A zebra server running with such a configuration can then be
670 queried using proper, conformant SRU URLs with CQL queries:
673 http://localhost:9999/Default?version=1.1
674 &operation=searchRetrieve
675 &query=title=utah and description=epicent*
677 &maximumRecords=1
681 <sect1 id="server-sru-support">
682 <title>SRU and SRW Protocol Support and Behavior</title>
684 Zebra running as an SRU server supports SRU version 1.1, including
685 CQL version 1.1. In particular, it provides support for the
686 following elements of the protocol.
690 <title>Search and Retrieval</title>
692 Zebra fully supports SRU's core
693 <literal>searchRetrieve</literal>
694 operation, as described at
695 <ulink url="http://www.loc.gov/standards/sru/sru-spec.html"/>
698 One of the great strengths of SRU is that it mandates a standard
699 query language, CQL, and that all conforming implementations can
700 therefore be trusted to correctly interpret the same queries. It
701 is with some shame, then, that we admit that Zebra also supports
702 an additional query language, our own Prefix Query Format (PQF,
703 <ulink url="http://indexdata.com/yaz/doc/tools.tkl#PQF"/>).
704 A PQF query is submitted by using the extension parameter
705 <literal>x-pquery</literal>,
707 <literal>query</literal>
708 parameter must be omitted, which makes the request not valid SRU.
709 Please don't do this.
716 Zebra does <emphasis>not</emphasis> support SRU's
717 <literal>scan</literal>
718 operation, as described at
719 <ulink url="http://www.loc.gov/standards/sru/scan/"/>
722 This is a rather embarrassing surprise as the pieces are all
723 there: Z39.50 scan is supported, and SRU scan requests are
724 recognised and diagnosed. To add further to the embarrassment, a
725 mutant form of SRU scan <emphasis>is</emphasis> supported, using
726 the non-standard <literal>x-pScanClause</literal> parameter in
727 place of the standard <literal>scanClause</literal> to scan on a
733 <title>Explain</title>
735 Zebra fully supports SRU's core
736 <literal>explain</literal>
737 operation, as described at
738 <ulink url="http://www.loc.gov/standards/sru/explain/index.html"/>
741 The ZeeRex record explaining a database may be requested either
742 with a fully fledged SRU request (with
743 <literal>operation</literal>=<literal>explain</literal>
744 and version-number specified)
745 or with a simple HTTP GET at the server's basename.
746 The ZeeRex record returned in response is the one embedded
747 in the YAZ Frontend Server configuration file that is described in the
748 <link linkend="gfs-config">Virtual Hosts</link> documentation.
751 Unfortunately, the data found in the
752 CQL-to-PQF text file must be added by hand-craft into the explain
753 section of the YAZ Frontend Server configuration file to be able
754 to provide a suitable explain record.
755 Too bad, but this is all extreme
756 new alpha stuff, and a lot of work has yet to be done ..
759 There is no linkeage whatsoever between the Z39.50 explain model
760 and the SRU/SRW explain response (well, at least not implemented
761 in Zebra, that is ..). Zebra does not provide a means using
762 Z39.50 to obtain the ZeeRex record.
767 <title>Some SRU Examples</title>
769 Surf into <literal>http://localhost:9999</literal>
770 to get an explain response, or use
772 http://localhost:9999/?version=1.1&operation=explain
776 See number of hits for a query
778 http://localhost:9999/?version=1.1&operation=searchRetrieve
779 &query=text=(plant%20and%20soil)
783 Fetch record 5-7 in Dublin Core format
785 http://localhost:9999/?version=1.1&operation=searchRetrieve
786 &query=text=(plant%20and%20soil)
787 &startRecord=5&maximumRecords=2&recordSchema=dc
791 Even search using PQF queries using the <emphasis>extended naughty
792 verb</emphasis> <literal>x-pquery</literal>
794 http://localhost:9999/?version=1.1&operation=searchRetrieve
795 &x-pquery=@attr%201=text%20@and%20plant%20soil
799 Or scan indexes using the <emphasis>extended extremely naughty
800 verb</emphasis> <literal>x-pScanClause</literal>
802 http://localhost:9999/?version=1.1&operation=scan
803 &x-pScanClause=@attr%201=text%20something
805 <emphasis>Don't do this in production code!</emphasis>
806 But it's a great fast debugging aid.
811 <title>Initialization, Present, Sort, Close</title>
813 In the Z39.50 protocol, Initialization, Present, Sort and Close
814 are separate operations. In SRU, however, these operations do not
820 SRU has no explicit initialization handshake phase, but
821 commences immediately with searching, scanning and explain
827 Neither does SRU have a close operation, since the protocol is
828 stateless and each request is self-contained. (It is true that
829 multiple SRU request/response pairs may be implemented as
830 multiple HTTP request/response pairs over a single persistent
831 TCP/IP connection; but the closure of that connection is not a
832 protocol-level operation.)
837 Retrieval in SRU is part of the
838 <literal>searchRetrieve</literal> operation, in which a search
839 is submitted and the response includes a subset of the records
840 in the result set. There is no direct analogue of Z39.50's
841 Present operation which requests records from an established
842 result set. In SRU, this is achieved by sending a subsequent
843 <literal>searchRetrieve</literal> request with the query
844 <literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
845 <emphasis>id</emphasis> is the identifier of the previously
846 generated result-set.
851 Sorting in CQL is done within the
852 <literal>searchRetrieve</literal> operation - in v1.1, by an
853 explicit <literal>sort</literal> parameter, but the forthcoming
854 v1.2 or v2.0 will most likely use an extension of the query
855 language, CQL for sorting: see
856 <ulink url="http://zing.z3950.org/cql/sorting.html"/>
861 It can be seen, then, that while Zebra operating as an SRU server
862 does not provide the same set of operations as when operating as a
863 Z39.50 server, it does provide equivalent functionality.
869 <!-- Keep this comment at the end of the file
874 sgml-minimize-attributes:nil
875 sgml-always-quote-attributes:t
878 sgml-parent-document: "zebra.xml"
879 sgml-local-catalogs: nil
880 sgml-namecase-general:t