1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
5 <!ENTITY % local SYSTEM "local.ent">
7 <!ENTITY % entities SYSTEM "entities.ent">
9 <!ENTITY % idcommon SYSTEM "common/common.ent">
12 <refentry id="pazpar2_conf">
14 <productname>Pazpar2</productname>
15 <productnumber>&version;</productnumber>
18 <refentrytitle>Pazpar2 conf</refentrytitle>
19 <manvolnum>5</manvolnum>
23 <refname>pazpar2_conf</refname>
24 <refpurpose>Pazpar2 Configuration</refpurpose>
29 <command>pazpar2.conf</command>
33 <refsect1><title>DESCRIPTION</title>
35 The Pazpar2 configuration file, together with any referenced XSLT files,
36 govern Pazpar2's behavior as a client, and control the normalization and
37 extraction of data elements from incoming result records, for the
38 purposes of merging, sorting, facet analysis, and display.
42 The file is specified using the option -f on the Pazpar2 command line.
43 There is not presently a way to reload the configuration file without
44 restarting Pazpar2, although this will most likely be added some time
49 <refsect1><title>FORMAT</title>
51 The configuration file is XML-structured. It must be valid XML. All
52 elements specific to Pazpar2 should belong to the namespace
53 <literal>http://www.indexdata.com/pazpar2/1.0</literal>
54 (this is assumed in the
55 following examples). The root element is named <literal>pazpar2</literal>.
56 Under the root element are a number of elements which group categories of
57 information. The categories are described below.
60 <refsect2 id="config-server"><title>server</title>
62 This section governs overall behavior of the server. The data
63 elements are described below. From Pazpar2 version 1.2 this is
66 <variablelist> <!-- level 1 -->
71 Configures the webservice -- this controls how you can connect
72 to Pazpar2 from your browser or server-side code. The
73 attributes 'host' and 'port' control the binding of the
74 server. The 'host' attribute can be used to bind the server to
75 a secondary IP address of your system, enabling you to run
76 Pazpar2 on port 80 alongside a conventional web server. You
77 can override this setting on the command line using the option -h.
86 If this item is given, Pazpar2 will forward all incoming HTTP
87 requests that do not contain the filename 'search.pz2' to the
88 host and port specified using the 'host' and 'port'
89 attributes. The 'myurl' attribute is required, and should provide
90 the base URL of the server. Generally, the HTTP URL for the host
91 specified in the 'listen' parameter. This functionality is
92 crucial if you wish to use
93 Pazpar2 in conjunction with browser-based code (JS, Flash,
94 applets, etc.) which operates in a security sandbox. Such code
95 can only connect to the same server from which the enclosing
96 HTML page originated. Pazpar2s proxy functionality enables you
97 to host all of the main pages (plus images, CSS, etc) of your
98 application on a conventional webserver, while efficiently
99 processing webservice requests for metasearch status, results,
106 <term>relevance / sort / mergekey</term>
109 Specifies character set normalization for relevancy / sorting
110 and the mergekey - for the server. These definitions serves as
111 default for services that don't have these given. For the meaning
112 of these settings refer to the "relevance" element inside service.
118 <term>settings</term>
121 Specifies target settings for the server.. These settings serves
122 as default for all services which don't have these given.
123 The settings element requires one attribute 'src' which specifies
124 a settings file or a directory . If a directory is given all
125 files with suffix <filename>.xml</filename> is read from this
127 <xref linkend="target_settings"/> for more information.
136 This nested element controls the behavior of Pazpar2 with
137 respect to your data model. In Pazpar2, incoming records are
138 normalized, using XSLT, into an internal representation.
139 The 'service' section controls the further processing and
140 extraction of data from the internal representation, primarily
141 through the 'metadata' sub-element.
144 Pazpar2 version 1.2 and later allows multiple service elements.
145 Multiple services must be given a unique ID by specifying
146 attribute <literal>id</literal>.
147 A single service may be unnamed (service ID omitted). The
148 service ID is referred to in the
149 <link linkend="command-init"><literal>init</literal></link> webservice
150 command's <literal>service</literal> parameter.
153 <variablelist> <!-- Level 2 -->
154 <varlistentry><term>metadata</term>
157 One of these elements is required for every data element in
158 the internal representation of the record (see
159 <xref linkend="data_model"/>. It governs
160 subsequent processing as pertains to sorting, relevance
161 ranking, merging, and display of data elements. It supports
162 the following attributes:
165 <variablelist> <!-- level 3 -->
166 <varlistentry><term>name</term>
169 This is the name of the data element. It is matched
170 against the 'type' attribute of the
172 in the normalized record. A warning is produced if
173 metadata elements with an unknown name are
175 normalized record. This name is also used to
177 data elements in the records returned by the
178 webservice API, and to name sort lists and browse
184 <varlistentry><term>type</term>
187 The type of data element. This value governs any
188 normalization or special processing that might take
189 place on an element. Possible values are 'generic'
190 (basic string), 'year' (a range is computed if
191 multiple years are found in the record). Note: This
192 list is likely to increase in the future.
197 <varlistentry><term>brief</term>
200 If this is set to 'yes', then the data element is
201 includes in brief records in the webservice API. Note
202 that this only makes sense for metadata elements that
203 are merged (see below). The default value is 'no'.
208 <varlistentry><term>sortkey</term>
211 Specifies that this data element is to be used for
212 sorting. The possible values are 'numeric' (numeric
213 value), 'skiparticle' (string; skip common, leading
214 articles), and 'no' (no sorting). The default value is
220 <varlistentry><term>rank</term>
223 Specifies that this element is to be used to
225 records against the user's query (when ranking is
226 requested). The value is an integer, used as a
227 multiplier against the basic TF*IDF score. A value of
228 1 is the base, higher values give additional
230 elements of this type. The default is '0', which
231 excludes this element from the rank calculation.
236 <varlistentry><term>termlist</term>
239 Specifies that this element is to be used as a
240 termlist, or browse facet. Values are tabulated from
241 incoming records, and a highscore of values (with
242 their associated frequency) is made available to the
243 client through the webservice API.
245 are 'yes' and 'no' (default).
250 <varlistentry><term>merge</term>
253 This governs whether, and how elements are extracted
254 from individual records and merged into cluster
255 records. The possible values are: 'unique' (include
256 all unique elements), 'longest' (include only the
257 longest element (strlen), 'range' (calculate a range
258 of values across all matching records), 'all' (include
259 all elements), or 'no' (don't merge; this is the
265 <varlistentry><term>mergekey</term>
268 If set to <literal>yes</literal>, the value of this
269 metadata element is appended to the resulting mergekey.
270 By default metadata is not part of a mergekey.
275 <varlistentry><term>setting</term>
278 This attribute allows you to make use of static database
279 settings in the processing of records. Three possible values
280 are allowed. 'no' is the default and doesn't do anything.
281 'postproc' copies the value of a setting with the same name
282 into the output of the normalization stylesheet(s). 'parameter'
283 makes the value of a setting with the same name available
284 as a parameter to the normalization stylesheet, so you
285 can further process the value inside of the stylesheet, or use
286 the value to decide how to deal with other data values.
289 The purpose of using settings in this way can either be to
290 control the behavior of normalization stylesheet in a database-
291 dependent way, or to easily make database-dependent values
292 available to display-logic in your user interface, without having
293 to implement complicated interactions between the user interface
294 and your configuration system.
299 </variablelist> <!-- attributes to metadata -->
305 <term>relevance</term>
308 Specifies ICU tokenization and transformation rules
309 for tokens that are used in Pazpar2's relevance ranking.
310 The 'id' attribute is currently not used, and the 'locale'
311 attribute must be set to one of the locale strings
312 defined in ICU. The child elements listed below can be
313 in any order, except the 'index' element which logically
314 belongs to the end of the list. The stated tokenization,
315 transformation and charmapping instructions are performed
316 in order from top to bottom.
318 <variablelist> <!-- Level 2 -->
319 <varlistentry><term>casemap</term>
322 The attribute 'rule' defines the direction of the
323 per-character casemapping, allowed values are "l"
324 (lower), "u" (upper), "t" (title).
328 <varlistentry><term>transform</term>
331 Normalization and transformation of tokens follows
332 the rules defined in the 'rule' attribute. For
333 possible values we refer to the extensive ICU
334 documentation found at the
335 <ulink url="&url.icu.transform;">ICU
336 transformation</ulink> home page. Set filtering
337 principles are explained at the
338 <ulink url="&url.icu.unicode.set;">ICU set and
339 filtering</ulink> page.
343 <varlistentry><term>tokenize</term>
346 Tokenization is the only rule in the ICU chain
347 which splits one token into multiple tokens. The
348 'rule' attribute may have the following values:
349 "s" (sentence), "l" (line-break), "w" (word), and
350 "c" (character), the later probably not being
351 very useful in a pruning Pazpar2 installation.
357 From Pazpar2 version 1.1 the ICU wrapper from YAZ is used.
358 Refer to the <ulink url="&url.yaz.yaz-icu;">yaz-icu</ulink>
359 utility for more information.
368 Specifies ICU tokenization and transformation rules
369 for tokens that are used in Pazpar2's sorting. The contents
370 is similar to that of <literal>relevance</literal>.
376 <term>mergekey</term>
379 Specifies ICU tokenization and transformation rules
380 for tokens that are used in Pazpar2's mergekey. The contents
381 is similar to that of <literal>relevance</literal>.
387 <term>settings</term>
390 Specifies target settings for this service. Refer to
391 <xref linkend="target_settings"/>.
400 Specifies timeout parameters for this service.
401 The <literal>timeout</literal>
402 element supports the following attributes:
403 <literal>session</literal>, <literal>z3950_operation</literal>,
404 <literal>z3950_session</literal> which specifies
405 'session timeout', 'Z39.50 operation timeout',
406 'Z39.50 session timeout' respectively. The Z39.50 operation
407 timeout is the time Pazpar2 will wait for an active Z39.50/SRU
408 operation before it gives up (times out). The Z39.50 session
409 time out is the time Pazpar2 will keep the session alive for
410 an idle session (no operation).
413 The following is recommended but not required:
414 z3950_operation (30) < session (60) < z3950_session (180) .
415 The default values are given in parantheses.
420 </variablelist> <!-- Data elements in service directive -->
424 </variablelist> <!-- Data elements in server directive -->
429 <refsect1><title>EXAMPLE</title>
430 <para>Below is a working example configuration:
432 <?xml version="1.0" encoding="UTF-8"?>
433 <pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
436 <listen port="9004"/>
438 <metadata name="title" brief="yes" sortkey="skiparticle"
439 merge="longest" rank="6"/>
440 <metadata name="isbn" merge="unique"/>
441 <metadata name="date" brief="yes" sortkey="numeric"
442 type="year" merge="range" termlist="yes"/>
443 <metadata name="author" brief="yes" termlist="yes"
444 merge="longest" rank="2"/>
445 <metadata name="subject" merge="unique" termlist="yes" rank="3"/>
446 <metadata name="url" merge="unique"/>
448 <icu_chain id="relevance" locale="el">
449 <transform rule="[:Control:] Any-Remove"/>
451 <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
455 <settings src="mysettings"/>
456 <timeout session="60"/>
464 <refsect1 id="config-include"><title>INCLUDE FACILITY</title>
466 The XML configuration may be partitioned into multiple files by using
467 the <literal>include</literal> element which takes a single attribute,
468 <literal>src</literal>. The of the <literal>src</literal> attribute is
469 regular Shell like glob-pattern. For example,
471 <include src="/etc/pazpar2/conf.d/*.xml"/>
475 The include facility requires Pazpar2 version 1.2.
479 <refsect1 id="target_settings"><title>TARGET SETTINGS</title>
481 Pazpar2 features a cunning scheme by which you can associate various
482 kinds of attributes, or settings with search targets. This can be done
483 through XML files which are read at startup; each file can associate
484 one or more settings with one or more targets. The file format is generic
485 in nature, designed to support a wide range of application requirements. The
486 settings can be purely technical things, like, how to perform a title
487 search against a given target, or it can associate arbitrary name=value
488 pairs with groups of targets -- for instance, if you would like to
489 place all commercial full-text bases in one group for selection
490 purposes, or you would like to control what targets are accessible
491 to users by default. Per-database settings values can even be used
492 to drive sorting, facet/termlist generation, or end-user interface display
497 During startup, Pazpar2 will recursively read a specified directory
498 (can be identified in the pazpar2.cfg file or on the command line), and
499 process any settings files found therein.
503 Clients of the Pazpar2 webservice interface can selectively override
504 settings for individual targets within the scope of one session. This
505 can be used in conjunction with an external authentication system to
506 determine which resources are to be accessible to which users. Pazpar2
507 itself has no notion of end-users, and so can be used in conjunction
508 with any type of authentication system. Similarly, the authentication
509 tokens submitted to access-controlled search targets can similarly be
510 overridden, to allow use of Pazpar2 in a consortial or multi-library
511 environment, where different end-users may need to be represented to
512 some search targets in different ways. This, again, can be managed
513 using an external database or other lookup mechanism. Setting overrides
514 can be performed either using the
515 <link linkend="command-init">init</link> or the
516 <link linkend="command-settings">settings</link> webservice
521 In fact, every setting that applies to a database (except pz:id, which
522 can only be used for filtering targets to use for a search) can be overridden
523 on a per-session basis. This allows the client to override specific CCL fields
524 for searching, etc., to meet the needs of a session or user.
528 Finally, as an extreme case of this, the webservice client can
529 introduce entirely new targets, on the fly, as part of the
530 <link linkend="command-init">init</link> or
531 <link linkend="command-settings">settings</link> command.
532 This is useful if you desire to manage information
533 about your search targets in a separate application such as a database.
534 You do not need any static settings file whatsoever to run Pazpar2 -- as
535 long as the webservice client is prepared to supply the necessary
536 information at the beginning of every session.
541 The following discussion of practical issues related to session and settings
542 management are cast in terms of a user interface based on Ajax/Javascript
543 technology. It would apply equally well to many other kinds of browser-based logic.
548 Typically, a Javascript client is not allowed to directly alter the parameters
549 of a session. There are two reasons for this. One has to do with access
550 to information; typically, information about a user will be stored in a
551 system on the server side, or it will be accessible in some way from the server.
552 However, since the Javascript client cannot be entirely trusted (some hostile
553 agent might in fact 'pretend' to be a regular ws client), it is more robust
554 to control session settings from scripting that you run as part of your
555 webserver. Typically, this can be handled during the session initialization,
560 Step 1: The Javascript client loads, and asks the webserver for a new Pazpar2
561 session ID. This can be done using a Javascript call, for instance. Note that
562 it is possible to submit Ajax HTTPXmlRequest calls either to Pazpar2 or to the
563 webserver that Pazpar2 is proxying for. See (XXX Insert link to Pazpar2 protocol).
567 Step 2: Code on the webserver authenticates the user, by database lookup,
568 LDAP access, NCIP, etc. Determines which resources the user has access to,
569 and any user-specific parameters that are to be applied during this session.
573 Step 3: The webserver initializes a new Pazpar2 settings, and sets user-specific
574 parameters as necessary, using the init webservice command. A new session ID is
579 Step 4: The webserver returns this session ID to the Javascript client, which then
580 uses the session ID to submit searches, show results, etc.
584 Step 5: When the Javascript client ceases to use the session, Pazpar2 destroys
585 any session-specific information.
588 <refsect2><title>SETTINGS FILE FORMAT</title>
590 Each file contains a root element named <settings>. It may
591 contain one or more <set> elements. The settings and set
592 elements may contain the following attributes. Attributes in the set node
593 overrides those in the setting root element. Each set node must
594 specify (directly, or inherited from the parent node) at least a
595 target, name, and value.
603 This specifies the search target to which this setting should be
604 applied. Targets are identified by their Z39.50 URL, generally
605 including the host, port, and database name, (e.g.
606 <literal>bagel.indexdata.com:210/marc</literal>).
607 Two wildcard forms are accepted:
608 * (asterisk) matches all known targets;
609 <literal>bagel.indexdata.com:210/*</literal> matches all
610 known databases on the given host.
613 A precedence system determines what happens if there are
614 overlapping values for the same setting name for the same
615 target. A setting for a specific target name overrides a
616 setting which specifies target using a wildcard. This makes it
617 easy to set defaults for all targets, and then override them
618 for specific targets or hosts. If there are
619 multiple overlapping settings with the same name and target
620 value, the 'precedence' attribute determines what happens.
628 The name of the setting. This can be anything you like.
629 However, Pazpar2 reserves a number of setting names for
630 specific purposes, all starting with 'pz:', and it is a good
631 idea to avoid that prefix if you make up your own setting
632 names. See below for a list of reserved variables.
640 The value of the setting. Generally, this can be anything you
641 want -- however, some of the reserved settings may expect
642 specific kinds of values.
647 <term>precedence</term>
650 This should be an integer. If not provided, the default value
651 is 0. If two (or more) settings have the same content for
652 target and name, the precedence value determines the outcome.
653 If both settings have the same precedence value, they are both
654 applied to the target(s). If one has a higher value, then the
655 value of that setting is applied, and the other one is ignored.
662 By setting defaults for target, name, or value in the root
663 settings node, you can use the settings files in many different
664 ways. For instance, you can use a single file to set defaults for
665 many different settings, like search fields, retrieval syntaxes,
666 etc. You can have one file per server, which groups settings for
667 that server or target. You could also have one file which associates
668 a number of targets with a given setting, for instance, to associate
669 many databases with a given category or class that makes sense
670 within your application.
674 The following examples illustrate uses of the settings system to
675 associate settings with targets to meet different requirements.
679 The example below associates a set of default values that can be
680 used across many targets. Note the wildcard for targets.
681 This associates the given settings with all targets for which no
682 other information is provided.
684 <settings target="*">
686 <!-- This file introduces default settings for pazpar2 -->
688 <!-- mapping for unqualified search -->
689 <set name="pz:cclmap:term" value="u=1016 t=l,r s=al"/>
691 <!-- field-specific mappings -->
692 <set name="pz:cclmap:ti" value="u=4 s=al"/>
693 <set name="pz:cclmap:su" value="u=21 s=al"/>
694 <set name="pz:cclmap:isbn" value="u=7"/>
695 <set name="pz:cclmap:issn" value="u=8"/>
696 <set name="pz:cclmap:date" value="u=30 r=r"/>
698 <!-- Retrieval settings -->
700 <set name="pz:requestsyntax" value="marc21"/>
701 <set name="pz:elements" value="F"/>
703 <!-- Query encoding -->
704 <set name="pz:queryencoding" value="iso-8859-1"/>
706 <!-- Result normalization settings -->
708 <set name="pz:nativesyntax" value="iso2709"/>
709 <set name="pz:xslt" value="../etc/marc21.xsl"/>
717 The next example shows certain settings overridden for one target,
718 one which returns XML records containing DublinCore elements, and
719 which furthermore requires a username/password.
721 <settings target="funkytarget.com:210/db1">
722 <set name="pz:requestsyntax" value="xml"/>
723 <set name="pz:nativesyntax" value="xml"/>
724 <set name="pz:xslt" value="../etc/dublincore.xsl"/>
726 <set name="pz:authentication" value="myuser/password"/>
732 The following example associates a specific name/value combination
733 with a number of targets. The targets below are access-restricted,
734 and can only be used by users with special credentials.
736 <settings name="pz:allow" value="0">
737 <set target="funkytarget.com:210/*"/>
738 <set target="commercial.com:2100/expensiveDb"/>
745 <refsect2><title>RESERVED SETTING NAMES</title>
747 The following setting names are reserved by Pazpar2 to control the
748 behavior of the client function.
753 <term>pz:cclmap:xxx</term>
756 This establishes a CCL field definition or other setting, for
757 the purpose of mapping end-user queries. XXX is the field or
758 setting name, and the value of the setting provides parameters
759 (e.g. parameters to send to the server, etc.). Please consult
760 the YAZ manual for a full overview of the many capabilities of
761 the powerful and flexible CCL parser.
764 Note that it is easy to establish a set of default parameters,
765 and then override them individually for a given target.
770 <term>pz:requestsyntax</term>
773 This specifies the record syntax to use when requesting
774 records from a given server. The value can be a symbolic name like
775 marc21 or xml, or it can be a Z39.50-style dot-separated OID.
780 <term>pz:elements</term>
783 The element set name to be used when retrieving records from a
789 <term>pz:piggyback</term>
792 Piggybacking enables the server to retrieve records from the
793 server as part of the search response in Z39.50. Almost all
794 servers support this (or fail it gracefully), but a few
795 servers will produce undesirable results.
796 Set to '1' to enable piggybacking, '0' to disable it. Default
797 is 1 (piggybacking enabled).
802 <term>pz:nativesyntax</term>
805 The representation (syntax) of the retrieval records. Currently
806 recognized values are iso2709 and xml.
809 For iso2709, can also specify a native character set, e.g. "iso2709;latin-1".
810 If no character set is provided, MARC-8 is assumed.
813 If pz:nativesyntax is not specified, pazpar2 will attempt to determine
814 the value based on the response from the server.
820 <term>pz:queryencoding</term>
823 The encoding of the search terms that a target accepts. Most
824 targets do not honor UTF-8 in which case this needs to be specified.
825 Each term in a query will be converted if this setting is given.
834 Provides the path of an XSLT stylesheet which will be used to
835 map incoming records to the internal representation.
838 When mapping MARC XML records, XSLT can be bypassed for increased
839 performance with the alternate "MARC map" format. Provide the
840 path of a file with extension ".mmap" containing on each line:
842 <field> <subfield> <metadata element></programlisting>
847 773 * citation</programlisting>
848 To map the field value specify a subfield of '$'. To store a
849 concatenation of all subfields, specify a subfield of '*'.
854 <term>pz:authentication</term>
857 Sets an authentication string for a given server. See the section on
858 authorization and authentication for discussion.
863 <term>pz:allow</term>
866 Allows or denies access to the resources it is applied to. Possible
867 values are '0' and '1'. The default is '1' (allow access to this resource).
868 See the manual section on authorization and authentication for discussion
869 about how to use this setting.
874 <term>pz:maxrecs</term>
877 Controls the maximum number of records to be retrieved from a
878 server. The default is 100.
886 This setting can't be 'set' -- it contains the ID (normally
887 ZURL) for a given target, and is useful for filtering --
888 specifically when you want to select one or more specific
889 targets in the search command.
894 <term>pz:zproxy</term>
897 The 'pz:zproxy' setting has the value syntax
898 'host.internet.adress:port', it is used to tunnel Z39.50
899 requests through the named Z39.50 proxy.
905 <term>pz:apdulog</term>
908 If the 'pz:apdulog' setting is defined and has other value than 0,
909 then Z39.50 APDUs are written to the log.
918 This setting enables SRU/SRW support. It has three possible settings.
919 'get', enables SRU access through GET requests. 'post' enables SRU/POST
920 support, less commonly supported, but useful if very large requests are
921 to be submitted. 'srw' enables the SRW variation of the protocol.
927 <term>pz:sru_version</term>
930 This allows SRU version to be specified. If unset Pazpar2
931 will the default of YAZ (currently 1.2). Should be set
938 <term>pz:pqf_prefix</term>
941 Allows you to specify an arbitrary PQF query language substring. The provided
942 string is prefixed the user's query after it has been normalized to PQF
943 internally in pazpar2. This allows you to attach complex 'filters' to
944 queries for a gien target, sometimes necessary to select sub-catalogs
945 in union catalog systems, etc.
954 Specifies sort criteria to be applied to the result set. Only works for targets
955 which support the sort service.
963 <refsect1><title>SEE ALSO</title>
966 <refentrytitle>pazpar2</refentrytitle>
967 <manvolnum>8</manvolnum>
970 <refentrytitle>yaz-icu</refentrytitle>
971 <manvolnum>1</manvolnum>
974 <refentrytitle>pazpar2_protocol</refentrytitle>
975 <manvolnum>7</manvolnum>
980 <!-- Keep this comment at the end of the file
985 sgml-minimize-attributes:nil
986 sgml-always-quote-attributes:t
989 sgml-parent-document:nil
990 sgml-local-catalogs: nil
991 sgml-namecase-general:t