1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.4//EN"
3 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"
5 <!ENTITY % local SYSTEM "local.ent">
7 <!ENTITY % entities SYSTEM "entities.ent">
9 <!ENTITY % idcommon SYSTEM "common/common.ent">
12 <refentry id="pazpar2_conf">
14 <productname>Pazpar2</productname>
15 <productnumber>&version;</productnumber>
16 <info><orgname>Index Data</orgname></info>
20 <refentrytitle>Pazpar2 conf</refentrytitle>
21 <manvolnum>5</manvolnum>
22 <refmiscinfo class="manual">File formats and conventions</refmiscinfo>
26 <refname>pazpar2_conf</refname>
27 <refpurpose>Pazpar2 Configuration</refpurpose>
32 <command>pazpar2.conf</command>
37 <title>DESCRIPTION</title>
39 The Pazpar2 configuration file, together with any referenced XSLT files,
40 govern Pazpar2's behavior as a client, and control the normalization and
41 extraction of data elements from incoming result records, for the
42 purposes of merging, sorting, facet analysis, and display.
46 The file is specified using the option -f on the Pazpar2 command line.
47 There is not presently a way to reload the configuration file without
48 restarting Pazpar2, although this will most likely be added some time
56 The configuration file is XML-structured. It must be well-formed XML. All
57 elements specific to Pazpar2 should belong to the namespace
58 <literal>http://www.indexdata.com/pazpar2/1.0</literal>
59 (this is assumed in the
60 following examples). The root element is named "<literal>pazpar2</literal>".
61 Under the root element are a number of elements which group categories of
62 information. The categories are described below.
65 <refsect2 id="config-threads">
66 <title>threads</title>
68 This section is optional and is supported for Pazpar2 version 1.3.1 and
69 later . It is identified by element "<literal>threads</literal>" which
70 may include one attribute "<literal>number</literal>" which specifies
71 the number of worker-threads that the Pazpar2 instance is to use.
72 A value of 0 (zero) disables worker-threads (all work is carried out
76 <refsect2 id="config-file">
79 This configuration takes one attribute <literal>path</literal> which
80 specifies a path to search for local files, such as XSLTs and settings.
81 The path is a colon separated list of directories. Its default value
82 is "<literal>.</literal>" which is equivalent to the location of the
83 main configuration file (where indeed the file element is given).
86 <refsect2 id="config-server">
89 This section governs overall behavior of a server endpoint. It is identified
90 by the element "server" which takes an optional attribute, "id", which
91 identifies this particular Pazpar2 server. Any string value for "id"
96 elements are described below. From Pazpar2 version 1.2 this is
99 <variablelist> <!-- level 1 -->
104 Configures the webservice -- this controls how you can connect
105 to Pazpar2 from your browser or server-side code. The
106 attributes 'host' and 'port' control the binding of the
107 server. The 'host' attribute can be used to bind the server to
108 a secondary IP address of your system, enabling you to run
109 Pazpar2 on port 80 alongside a conventional web server. You
110 can override this setting on the command line using the option -h.
119 If this item is given, Pazpar2 will forward all incoming HTTP
120 requests that do not contain the filename 'search.pz2' to the
121 host and port specified using the 'host' and 'port'
122 attributes. The 'myurl' attribute is required, and should provide
123 the base URL of the server. Generally, the HTTP URL for the host
124 specified in the 'listen' parameter. This functionality is
125 crucial if you wish to use
126 Pazpar2 in conjunction with browser-based code (JS, Flash,
127 applets, etc.) which operates in a security sandbox. Such code
128 can only connect to the same server from which the enclosing
129 HTML page originated. Pazpar2s proxy functionality enables you
130 to host all of the main pages (plus images, CSS, etc) of your
131 application on a conventional webserver, while efficiently
132 processing webservice requests for metasearch status, results,
139 <term>icu_chain</term>
142 Specifies character set normalization for relevancy / sorting /
143 mergekey and facets - for the server. These definitions serves as
144 default for services that don't have these given. For the meaning
145 of these settings refer to the
146 <xref linkend="icuchain"/> element inside service.
152 <term>relevance / sort / mergekey / facet</term>
155 Obsolete. Use element icu_chain instead.
161 <term>settings</term>
164 Specifies target settings for the server.. These settings serves
165 as default for all services which don't have these given.
166 The settings element requires one attribute 'src' which specifies
167 a settings file or a directory . If a directory is given all
168 files with suffix <filename>.xml</filename> is read from this
170 <xref linkend="target_settings"/> for more information.
176 <term id="service_conf">service</term>
179 This nested element controls the behavior of Pazpar2 with
180 respect to your data model. In Pazpar2, incoming records are
181 normalized, using XSLT, into an internal representation.
182 The 'service' section controls the further processing and
183 extraction of data from the internal representation, primarily
184 through the 'metadata' sub-element.
187 Pazpar2 version 1.2 and later allows multiple service elements.
188 Multiple services must be given a unique ID by specifying
189 attribute <literal>id</literal>.
190 A single service may be unnamed (service ID omitted). The
191 service ID is referred to in the
192 <link linkend="command-init"><literal>init</literal></link> webservice
193 command's <literal>service</literal> parameter.
196 <variablelist> <!-- Level 2 -->
198 <term>metadata</term>
201 One of these elements is required for every data element in
202 the internal representation of the record (see
203 <xref linkend="data_model"/>. It governs
204 subsequent processing as pertains to sorting, relevance
205 ranking, merging, and display of data elements. It supports
206 the following attributes:
209 <variablelist> <!-- level 3 -->
214 This is the name of the data element. It is matched
215 against the 'type' attribute of the
217 in the normalized record. A warning is produced if
218 metadata elements with an unknown name are
220 normalized record. This name is also used to
222 data elements in the records returned by the
223 webservice API, and to name sort lists and browse
233 The type of data element. This value governs any
234 normalization or special processing that might take
235 place on an element. Possible values are 'generic'
236 (basic string), 'year' (a range is computed if
237 multiple years are found in the record). Note: This
238 list is likely to increase in the future.
247 If this is set to 'yes', then the data element is
248 includes in brief records in the webservice API. Note
249 that this only makes sense for metadata elements that
250 are merged (see below). The default value is 'no'.
259 Specifies that this data element is to be used for
260 sorting. The possible values are 'numeric' (numeric
261 value), 'skiparticle' (string; skip common, leading
262 articles), and 'no' (no sorting). The default value is
266 When 'skiparticle' is used, some common articles from the
267 English and German languages are ignored. At present the
268 list is: 'the', 'den', 'der', 'die', 'des', 'an', 'a'.
274 <term id="metadata-rank">rank</term>
277 Specifies that this element is to be used to
279 records against the user's query (when ranking is
281 The valus is of the form
285 where M is an integer, used as a
286 weight against the basic TF*IDF score. A value of
287 1 is the base, higher values give additional weight to
288 elements of this type. The default is '0', which
289 excludes this element from the rank calculation.
292 F is a CCL field and N is the multipler for terms
293 that matches those part of the CCL field in search.
294 The F+N combo allows the system to use a different
295 multipler for a certain field. For example, a rank value of
296 "<literal>1 au 3</literal>" gives a multipler of 3 for
297 all terms part of the au(thor) terms and 1 for everything else.
300 For Pazpar2 1.6.13 and later, the rank may also defined
301 "per-document", by the normalization stylesheet.
304 The per field rank was introduced in Pazpar2 1.6.15. Earlier
305 releases only allowed a rank value M (simple integer).
307 See <xref linkend="relevance_ranking"/> for more
313 <term>termlist</term>
316 Specifies that this element is to be used as a
317 termlist, or browse facet. Values are tabulated from
318 incoming records, and a highscore of values (with
319 their associated frequency) is made available to the
320 client through the webservice API.
322 are 'yes' and 'no' (default).
331 This governs whether, and how elements are extracted
332 from individual records and merged into cluster
333 records. The possible values are: 'unique' (include
334 all unique elements), 'longest' (include only the
335 longest element (strlen), 'range' (calculate a range
336 of values across all matching records), 'all' (include
337 all elements), or 'no' (don't merge; this is the
341 Pazpar 1.6.24 also offers a new value for merge, 'first', which
342 is like 'all' but only takes all from first database that returns
343 the particular metadata field.
349 <term>mergekey</term>
352 If set to '<literal>required</literal>', the value of this
353 metadata element is appended to the resulting mergekey if
354 the metadata is present in a record instance.
355 If the metadata element is not present, the a unique mergekey
356 will be generated instead.
359 If set to '<literal>optional</literal>', the value of this
360 metadata element is appended to the resulting mergekey if the
361 the metadata is present in a record instance. If the metadata
362 is not present, it will be empty.
365 If set to '<literal>no</literal>' or the mergekey attribute is
366 omitted, the metadata will not be used in the creation of a
373 <term id="facetrule">facetrule</term>
376 Specifies the ICU rule set to be used for normalizing
377 facets. If facetrule is omitted from metadata, the
378 rule set 'facet' is used.
384 <term id="limitcluster">limitcluster</term>
387 Allow a limit on merged metadata. The value of this attribute
388 is the name of actual metadata content to be used for matching
389 (most often same name as metadata name).
393 Requires Pazpar2 1.6.23 or later.
400 <term id="metadata_limitmap">limitmap</term>
403 Specifies a default limitmap for this field. This is to avoid mass
404 configuring of targets. However it is important to review/do
405 this on a per target since it is usually target-specific.
406 See limitmap for format.
412 <term id="metadata_facetmap">facetmap</term>
415 Specifies a default facetmap for this field. This is to avoid mass
416 configuring of targets. However it is important to review/do
417 this on a per target since it is usually target-specific.
418 See facetmap for format.
427 This attribute allows you to make use of static database
428 settings in the processing of records. Three possible values
429 are allowed. 'no' is the default and doesn't do anything.
430 'postproc' copies the value of a setting with the same name
431 into the output of the normalization stylesheet(s). 'parameter'
432 makes the value of a setting with the same name available
433 as a parameter to the normalization stylesheet, so you
434 can further process the value inside of the stylesheet, or use
435 the value to decide how to deal with other data values.
438 The purpose of using settings in this way can either be to
439 control the behavior of normalization stylesheet in a database-
440 dependent way, or to easily make database-dependent values
441 available to display-logic in your user interface, without having
442 to implement complicated interactions between the user interface
443 and your configuration system.
448 </variablelist> <!-- attributes to metadata -->
454 <term id="servicexslt" xreflabel="xslt">xslt</term>
457 Defines a XSLT stylesheet. The <literal>xslt</literal>
458 element takes exactly one attribute <literal>id</literal>
459 which names the stylesheet. This can be referred to in target
460 settings <xref linkend="pzxslt"/>.
463 The content of the xslt element is the embedded stylesheet XML
468 <term id="icuchain" xreflabel="icu_chain">icu_chain</term>
471 Specifies a named ICU rule set. The icu_chain element must include
472 attribute 'id' which specifies the identifier (name) for the ICU
474 Pazpar2 uses the particular rule sets for particular purposes.
475 Rule set 'relevance' is used to normalize
476 terms for relevance ranking. Rule set 'sort' is used to
477 normalize terms for sorting. Rule set 'mergekey' is used to
478 normalize terms for making a mergekey and, finally. Rule set 'facet'
479 is normally used to normalize facet terms, unless
480 <xref linkend="facetrule">facetrule</xref> is given for a
484 The icu_chain element must also include a 'locale'
485 attribute which must be set to one of the locale strings
486 defined in ICU. The child elements listed below can be
487 in any order, except the 'index' element which logically
488 belongs to the end of the list. The stated tokenization,
489 transformation and charmapping instructions are performed
490 in order from top to bottom.
492 <variablelist> <!-- Level 2 -->
497 The attribute 'rule' defines the direction of the
498 per-character casemapping, allowed values are "l"
499 (lower), "u" (upper), "t" (title).
504 <term>transform</term>
507 Normalization and transformation of tokens follows
508 the rules defined in the 'rule' attribute. For
509 possible values we refer to the extensive ICU
510 documentation found at the
511 <ulink url="&url.icu.transform;">ICU
512 transformation</ulink> home page. Set filtering
513 principles are explained at the
514 <ulink url="&url.icu.unicode.set;">ICU set and
515 filtering</ulink> page.
520 <term>tokenize</term>
523 Tokenization is the only rule in the ICU chain
524 which splits one token into multiple tokens. The
525 'rule' attribute may have the following values:
526 "s" (sentence), "l" (line-break), "w" (word), and
527 "c" (character), the later probably not being
528 very useful in a pruning Pazpar2 installation.
534 From Pazpar2 version 1.1 the ICU wrapper from YAZ is used.
535 Refer to the <ulink url="&url.yaz.yaz-icu;">yaz-icu</ulink>
536 utility for more information.
542 <term>relevance</term>
545 Specifies the ICU rule set used for relevance ranking.
546 The child element of 'relevance' must be 'icu_chain' and the
547 'id' attribute of the icu_chain is ignored. This
548 definition is obsolete and should be replaced by the equivalent
551 <icu_chain id="relevance" locale="en">..<icu_chain>
561 Specifies the ICU rule set used for sorting.
562 The child element of 'sort' must be 'icu_chain' and the
563 'id' attribute of the icu_chain is ignored. This
564 definition is obsolete and should be replaced by the equivalent
567 <icu_chain id="sort" locale="en">..<icu_chain>
574 <term>mergekey</term>
577 Specifies ICU tokenization and transformation rules
578 for tokens that are used in Pazpar2's mergekey.
579 The child element of 'mergekey' must be 'icu_chain' and the
580 'id' attribute of the icu_chain is ignored. This
581 definition is obsolete and should be replaced by the equivalent
584 <icu_chain id="mergekey" locale="en">..<icu_chain>
594 Specifies ICU tokenization and transformation rules
595 for tokens that are used in Pazpar2's facets.
596 The child element of 'facet' must be 'icu_chain' and the
597 'id' attribute of the icu_chain is ignored. This
598 definition is obsolete and should be replaced by the equivalent
601 <icu_chain id="facet" locale="en">..<icu_chain>
608 <term>ccldirective</term>
611 Customizes the CCL parsing (interpretation of query parameter
613 The name and value of the CCL directive is gigen by attributes
614 'name' and 'value' respectively. Refer to possible list of names
617 url="http://www.indexdata.com/yaz/doc/tools.html#ccl.directives.table">
624 <varlistentry id="service-rank">
628 Customizes the ranking (relevance) algorithm. Also known as
629 rank tweaks. The rank element
630 accepts the following attributes - all being optional:
637 Attribute 'cluster' is a boolean
638 that controls whether Pazpar2 should boost ranking for merged
639 records. Is 'yes' by default. A value of 'no' will make
640 Pazpar2 average ranking of each record in a cluster.
648 Attribute 'debug' is a boolean
649 that controls whether Pazpar2 should include details
650 about ranking for each document in the show command's
651 response. Enable by using value "yes", disable by using
652 value "no" (default).
660 Attribute 'follow' is a a floating point number greater than
661 or equal to 0. A positive number will boost weight for terms
662 that occur close to each other (proximity, distance).
663 A value of 1, will double the weight if two terms are in
664 proximity distance of 1 (next to each other). The default
665 value of 'follow' is 0 (order will not affect weight).
673 Attribute 'lead' is a floating point number.
674 It controls if term weight should be reduced by position
675 from start in a metadata field. A positive value of 'lead'
676 will reduce weight as it apperas further away from the lead
677 of the field. Default value is 0 (no reduction of weight by
686 Attribute 'length' determines how/if term weight should be
687 divided by lenght of metadata field. A value of "linear"
688 divide by length. A value of "log" will divide by log2(length).
689 A value of "none" will leave term weight as is (no division).
690 Default value is "linear".
696 Refer to <xref linkend="relevance_ranking"/> to see how
697 these tweaks are used in computation of score.
700 Customization of ranking algorithm was introduced with
701 Pazpar2 1.6.18. The semantics of some of the fields changed
702 in versions up to 1.6.22.
707 <varlistentry id="sort-default">
708 <term>sort-default</term>
711 Specifies the default sort criteria (default 'relevance'),
712 which previous was hard-coded as default criteria in search.
713 This is a fix/work-around to avoid re-searching when using
714 target-based sorting. In order for this to work efficient,
715 the search must also have the sort critera parameter; otherwise
716 pazpar2 will do re-searching on search criteria changes, if
717 changed between search and show command.
720 This configuration was added in pazpar2 1.6.20.
730 Specifies a variable that will be inherited by all targets defined in settings
732 <set name="test" value="en"..<set>
739 <term>settings</term>
742 Specifies target settings for this service. Refer to
743 <xref linkend="target_settings"/>.
748 <varlistentry id="service-timeout">
752 Specifies timeout parameters for this service.
753 The <literal>timeout</literal>
754 element supports the following attributes:
755 <literal>session</literal>, <literal>z3950_operation</literal>,
756 <literal>z3950_session</literal> which specifies
757 'session timeout', 'Z39.50 operation timeout',
758 'Z39.50 session timeout' respectively. The Z39.50 operation
759 timeout is the time Pazpar2 will wait for an active Z39.50/SRU
760 operation before it gives up (times out). The Z39.50 session
761 time out is the time Pazpar2 will keep the session alive for
762 an idle session (no operation).
765 The following is recommended but not required:
766 z3950_operation (30) < session (60) < z3950_session (180) .
767 The default values are given in parantheses.
770 The Z39.50 operation timeout may be set per database. Refer to
771 <xref linkend="pztimeout"/>.
775 </variablelist> <!-- Data elements in service directive -->
778 </variablelist> <!-- Data elements in server directive -->
783 <title>EXAMPLE</title>
785 Below is a working example configuration:
789 <?xml version="1.0" encoding="UTF-8"?>
790 <pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
792 <threads number="10"/>
794 <listen port="9004"/>
797 <metadata name="title" brief="yes" sortkey="skiparticle"
798 merge="longest" rank="6"/>
799 <metadata name="isbn" merge="unique"/>
800 <metadata name="date" brief="yes" sortkey="numeric"
801 type="year" merge="range" termlist="yes"/>
802 <metadata name="author" brief="yes" termlist="yes"
803 merge="longest" rank="2"/>
804 <metadata name="subject" merge="unique" termlist="yes" rank="3" limitmap="local:"/>
805 <metadata name="url" merge="unique"/>
806 <icu_chain id="relevance" locale="el">
807 <transform rule="[:Control:] Any-Remove"/>
809 <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
812 <settings src="mysettings"/>
813 <timeout session="60"/>
821 <refsect1 id="config-include">
822 <title>INCLUDE FACILITY</title>
824 The XML configuration may be partitioned into multiple files by using
825 the <literal>include</literal> element which takes a single attribute,
826 <literal>src</literal>. The <literal>src</literal> attribute is
827 regular Shell like glob-pattern. For example,
829 <include src="/etc/pazpar2/conf.d/*.xml"/>
833 The include facility requires Pazpar2 version 1.2.
837 <refsect1 id="target_settings">
838 <title>TARGET SETTINGS</title>
840 Pazpar2 features a cunning scheme by which you can associate various
841 kinds of attributes, or settings with search targets. This can be done
842 through XML files which are read at startup; each file can associate
843 one or more settings with one or more targets. The file format is generic
844 in nature, designed to support a wide range of application requirements.
845 The settings can be purely technical things, like, how to perform a title
846 search against a given target, or it can associate arbitrary name=value
847 pairs with groups of targets -- for instance, if you would like to
848 place all commercial full-text bases in one group for selection
849 purposes, or you would like to control what targets are accessible
850 to users by default. Per-database settings values can even be used
851 to drive sorting, facet/termlist generation, or end-user interface display
856 During startup, Pazpar2 will recursively read a specified directory
857 (can be identified in the pazpar2.cfg file or on the command line), and
858 process any settings files found therein.
862 Clients of the Pazpar2 webservice interface can selectively override
863 settings for individual targets within the scope of one session. This
864 can be used in conjunction with an external authentication system to
865 determine which resources are to be accessible to which users. Pazpar2
866 itself has no notion of end-users, and so can be used in conjunction
867 with any type of authentication system. Similarly, the authentication
868 tokens submitted to access-controlled search targets can similarly be
869 overridden, to allow use of Pazpar2 in a consortial or multi-library
870 environment, where different end-users may need to be represented to
871 some search targets in different ways. This, again, can be managed
872 using an external database or other lookup mechanism. Setting overrides
873 can be performed either using the
874 <link linkend="command-init">init</link> or the
875 <link linkend="command-settings">settings</link> webservice
880 In fact, every setting that applies to a database (except pz:id, which
881 can only be used for filtering targets to use for a search) can be overridden
882 on a per-session basis.
883 This allows the client to override specific CCL fields for
884 searching, etc., to meet the needs of a session or user.
888 Finally, as an extreme case of this, the webservice client can
889 introduce entirely new targets, on the fly, as part of the
890 <link linkend="command-init">init</link> or
891 <link linkend="command-settings">settings</link> command.
892 This is useful if you desire to manage information
893 about your search targets in a separate application such as a database.
894 You do not need any static settings file whatsoever to run Pazpar2 -- as
895 long as the webservice client is prepared to supply the necessary
896 information at the beginning of every session.
901 The following discussion of practical issues related to session
902 and settings management are cast in terms of a user interface based on
903 Ajax/Javascript technology. It would apply equally well to many other
904 kinds of browser-based logic.
909 Typically, a Javascript client is not allowed to directly alter the
910 parameters of a session. There are two reasons for this. One has to do
911 with access to information; typically, information about a user will
912 be stored in a system on the server side, or it will be accessible in
913 some way from the server. However, since the Javascript client cannot
914 be entirely trusted (some hostile agent might in fact 'pretend' to be
915 a regular ws client), it is more robust to control session settings
916 from scripting that you run as part of your webserver. Typically, this
917 can be handled during the session initialization, as follows:
921 Step 1: The Javascript client loads, and asks the webserver for a
922 new Pazpar2 session ID. This can be done using a Javascript call, for
923 instance. Note that it is possible to submit Ajax HTTPXmlRequest calls
924 either to Pazpar2 or to the webserver that Pazpar2 is proxying
925 for. See (XXX Insert link to Pazpar2 protocol).
929 Step 2: Code on the webserver authenticates the user, by database lookup,
930 LDAP access, NCIP, etc. Determines which resources the user has access to,
931 and any user-specific parameters that are to be applied during this session.
935 Step 3: The webserver initializes a new Pazpar2 settings, and sets
936 user-specific parameters as necessary, using the init webservice
937 command. A new session ID is returned.
941 Step 4: The webserver returns this session ID to the Javascript
942 client, which then uses the session ID to submit searches, show
947 Step 5: When the Javascript client ceases to use the session,
948 Pazpar2 destroys any session-specific information.
952 <title>SETTINGS FILE FORMAT</title>
954 Each file contains a root element named <settings>. It may
955 contain one or more <set> elements. The settings and set
956 elements may contain the following attributes. Attributes in the set
957 node overrides those in the setting root element. Each set node must
958 specify (directly, or inherited from the parent node) at least a
959 target, name, and value.
967 This specifies the search target to which this setting should be
968 applied. Targets are identified by their Z39.50 URL, generally
969 including the host, port, and database name, (e.g.
970 <literal>bagel.indexdata.com:210/marc</literal>).
971 Two wildcard forms are accepted:
972 * (asterisk) matches all known targets;
973 <literal>bagel.indexdata.com:210/*</literal> matches all
974 known databases on the given host.
977 A precedence system determines what happens if there are
978 overlapping values for the same setting name for the same
979 target. A setting for a specific target name overrides a
980 setting which specifies target using a wildcard. This makes it
981 easy to set defaults for all targets, and then override them
982 for specific targets or hosts. If there are
983 multiple overlapping settings with the same name and target
984 value, the 'precedence' attribute determines what happens.
987 For Pazpar2 1.6.4 or later, the target ID may be user-defined, in
988 which case, the actual host, port, etc is given by setting
989 <xref linkend="pzurl"/>.
997 The name of the setting. This can be anything you like.
998 However, Pazpar2 reserves a number of setting names for
999 specific purposes, all starting with 'pz:', and it is a good
1000 idea to avoid that prefix if you make up your own setting
1001 names. See below for a list of reserved variables.
1009 The value of the setting. Generally, this can be anything you
1010 want -- however, some of the reserved settings may expect
1011 specific kinds of values.
1016 <term>precedence</term>
1019 This should be an integer. If not provided, the default value
1020 is 0. If two (or more) settings have the same content for
1021 target and name, the precedence value determines the outcome.
1022 If both settings have the same precedence value, they are both
1023 applied to the target(s). If one has a higher value, then the
1024 value of that setting is applied, and the other one is ignored.
1031 By setting defaults for target, name, or value in the root
1032 settings node, you can use the settings files in many different
1033 ways. For instance, you can use a single file to set defaults for
1034 many different settings, like search fields, retrieval syntaxes,
1035 etc. You can have one file per server, which groups settings for
1036 that server or target. You could also have one file which associates
1037 a number of targets with a given setting, for instance, to associate
1038 many databases with a given category or class that makes sense
1039 within your application.
1043 The following examples illustrate uses of the settings system to
1044 associate settings with targets to meet different requirements.
1048 The example below associates a set of default values that can be
1049 used across many targets. Note the wildcard for targets.
1050 This associates the given settings with all targets for which no
1051 other information is provided.
1053 <settings target="*">
1055 <!-- This file introduces default settings for pazpar2 -->
1057 <!-- mapping for unqualified search -->
1058 <set name="pz:cclmap:term" value="u=1016 t=l,r s=al"/>
1060 <!-- field-specific mappings -->
1061 <set name="pz:cclmap:ti" value="u=4 s=al"/>
1062 <set name="pz:cclmap:su" value="u=21 s=al"/>
1063 <set name="pz:cclmap:isbn" value="u=7"/>
1064 <set name="pz:cclmap:issn" value="u=8"/>
1065 <set name="pz:cclmap:date" value="u=30 r=r"/>
1067 <set name="pz:limitmap:title" value="rpn:@attr 1=4 @attr 6=3"/>
1068 <set name="pz:limitmap:date" value="ccl:date"/>
1070 <!-- Retrieval settings -->
1072 <set name="pz:requestsyntax" value="marc21"/>
1073 <set name="pz:elements" value="F"/>
1075 <!-- Query encoding -->
1076 <set name="pz:queryencoding" value="iso-8859-1"/>
1078 <!-- Result normalization settings -->
1080 <set name="pz:nativesyntax" value="iso2709"/>
1081 <set name="pz:xslt" value="../etc/marc21.xsl"/>
1089 The next example shows certain settings overridden for one target,
1090 one which returns XML records containing DublinCore elements, and
1091 which furthermore requires a username/password.
1093 <settings target="funkytarget.com:210/db1">
1094 <set name="pz:requestsyntax" value="xml"/>
1095 <set name="pz:nativesyntax" value="xml"/>
1096 <set name="pz:xslt" value="../etc/dublincore.xsl"/>
1098 <set name="pz:authentication" value="myuser/password"/>
1104 The following example associates a specific name/value combination
1105 with a number of targets. The targets below are access-restricted,
1106 and can only be used by users with special credentials.
1108 <settings name="pz:allow" value="0">
1109 <set target="funkytarget.com:210/*"/>
1110 <set target="commercial.com:2100/expensiveDb"/>
1118 <title>RESERVED SETTING NAMES</title>
1120 The following setting names are reserved by Pazpar2 to control the
1121 behavior of the client function.
1127 <term>pz:allow</term>
1130 Allows or denies access to the resources it is applied to. Possible
1131 values are '0' and '1'.
1132 The default is '1' (allow access to this resource).
1138 <term>pz:apdulog</term>
1141 If the 'pz:apdulog' setting is defined and has other value than 0,
1142 then Z39.50 APDUs are written to the log.
1148 <term>pz:authentication</term>
1151 Sets an authentication string for a given database. For Z39.50,
1152 this is carried as part of the Initialize Request. In order to carry
1153 the information in the "open" elements, separate
1154 username and password with a slash (In Z39.50 it is a VisibleString).
1155 In order to carry the information in the idPass elements, separate
1156 username term, password term and, optionally, a group term with a
1158 If three terms are given, the order is
1159 <emphasis>user, group, password</emphasis>.
1160 If only two terms are given, the order is
1161 <emphasis>user, password</emphasis>.
1164 For HTTP based procotols, such as SRU and Solr, the authentication
1165 string includes a username term and, optionally, a password term.
1166 Each term is separated by a single blank. The
1167 authentication information is passed either by HTTP basic
1168 authentication or via URL parameters. The mode is operation is
1169 determined by <literal>pz:authentication_mode</literal> setting.
1175 <term>pz:authentication_mode</term>
1178 Determines how authentication is carried in HTTP based protocols.
1179 Value may be "<literal>basic</literal>" or "<literal>url</literal>".
1185 <term>pz:block_timeout</term>
1188 (Not yet implemented).
1189 Specifies the time for which a block should be released anyway.
1195 <term>pz:cclmap:xxx</term>
1198 This establishes a CCL field definition or other setting, for
1199 the purpose of mapping end-user queries. XXX is the field or
1200 setting name, and the value of the setting provides parameters
1201 (e.g. parameters to send to the server, etc.). Please consult
1202 the YAZ manual for a full overview of the many capabilities of
1203 the powerful and flexible CCL parser.
1206 Note that it is easy to establish a set of default parameters,
1207 and then override them individually for a given target.
1213 <term>pz:elements</term>
1216 The element set name to be used when retrieving records from a
1223 <term>pz:extendrecs</term>
1226 If a show command goes to the boundary of a result set for a
1227 database - depends on sorting - and pz:extendrecs is set to a positive
1228 value. then Pazpar2 wait for show to fetch pz:extendrecs more
1229 records. This setting is best used if a database does native
1230 sorting, because the result set otherwise may be completely
1231 re-sorted during extended fetch.
1232 The default value of pz:extendrecs is 0 (no extended fetch).
1236 The pz:extendrecs setting appeared in Pazpar2 version 1.6.26.
1237 But the bahavior changed with the release of Pazpar2 1.6.29.
1244 <term>pz:facetmap:<replaceable>name</replaceable></term>
1247 Specifies that for field <replaceable>name</replaceable>, the target
1248 supports (native) facets. The value is the name of the
1249 field on the target.
1253 At this point only Solr targets have been tested with this
1264 This setting can't be 'set' -- it contains the ID (normally
1265 ZURL) for a given target, and is useful for filtering --
1266 specifically when you want to select one or more specific
1267 targets in the search command.
1272 <varlistentry id="limitmap">
1273 <term>pz:limitmap:<replaceable>name</replaceable></term>
1276 Specifies attributes for limiting a search to a field - using
1277 the limit parameter for search. It can be used to filter locally
1278 or remotely (search in a target). In some cases the mapping of
1279 a field to a value is identical to an existing cclmap field; in
1280 other cases the field must be specified in a different way - for
1281 example to match a complete field (rather than parts of a subfield).
1284 The value of limitmap may have one of three forms: referral to
1285 an existing CCL field, a raw PQF string or a local limit. Leading string
1286 determines type; either <literal>ccl:</literal> for CCL field,
1287 <literal>rpn:</literal> for PQF/RPN, or <literal>local:</literal>
1288 for filtering in Pazpar2. The local filtering may be followed
1289 by a field a metadata field (default is to use the name of the
1293 For Pazpar2 version 1.6.23 and later the limitmap may include multiple
1294 specifications, separated by <literal>,</literal> (comma).
1296 <literal>ccl:title,local:ltitle,rpn:@attr 1=4</literal>.
1300 The limitmap facility is supported for Pazpar2 version 1.6.0.
1301 Local filtering is supported in Pazpar2 1.6.6.
1308 <term>pz:maxrecs</term>
1311 Controls the maximum number of records to be retrieved from a
1312 server. The default is 100.
1318 <term>pz:memcached</term>
1321 If set and non-empty,
1322 <ulink url="&url.libmemcached;">libMemcached</ulink> will
1323 configured and enabled for the target.
1324 The value of this setting is same as the ZOOM option
1325 <literal>memcached</literal>, which in turn is the configuration
1326 string passed to the <function>memcached</function> function
1327 of <ulink url="&url.libmemcached;">libMemcached</ulink>.
1330 This setting is honored in Pazpar2 1.6.39 or later. Pazpar2 must
1331 be using YAZ version 5.0.13 or later.
1337 <term>pz:redis</term>
1340 If set and non-empty,
1341 <ulink url="&url.redis;">redis</ulink> will
1342 configured and enabled for the target.
1343 The value of this setting is exactly as the redis option for
1347 This setting is honored in Pazpar2 1.6.43 or later. Pazpar2 must
1348 be using YAZ version 5.2.0 or later.
1354 <term>pz:nativesyntax</term>
1357 Specifies how Pazpar2 shoule map retrieved records to XML. Currently
1358 supported values are <literal>xml</literal>,
1359 <literal>iso2709</literal> and <literal>txml</literal>.
1362 The value <literal>iso2709</literal> makes Pazpar2 convert retrieved
1363 MARC records to MARCXML. In order to convert to XML, the exact
1364 chacater set of the MARC must be known (if not, the resulting
1365 XML is probably not well-formed). The character set may be
1366 specified by adding:
1367 <literal>;</literal><replaceable>charset</replaceable> to
1368 <literal>iso2709</literal>. If omitted, a charset of
1369 MARC-8 is assumed. This is correct for most MARC21/USMARC records.
1372 The value <literal>txml</literal> is like <literal>iso2709</literal>
1373 except that records are converted to TurboMARC instead of MARCXML.
1376 The value <literal>xml</literal> is used if Pazpar2 retrieves
1377 records that are already XML (no conversion takes place).
1383 <term>pz:negotiation_charset</term>
1386 Sets character set for Z39.50 negotiation. Most targets do not support
1387 this, and some will even close connection if set (crash on server
1388 side or similar). If set, you probably want to set it to
1389 <literal>UTF-8</literal>.
1395 <term>pz:piggyback</term>
1398 Piggybacking enables the server to retrieve records from the
1399 server as part of the search response in Z39.50. Almost all
1400 servers support this (or fail it gracefully), but a few
1401 servers will produce undesirable results.
1402 Set to '1' to enable piggybacking, '0' to disable it. Default
1403 is 1 (piggybacking enabled).
1408 <term>pz:pqf_prefix</term>
1411 Allows you to specify an arbitrary PQF query language substring.
1412 The provided string is prefixed to the user's query after it has been
1413 normalized to PQF internally in pazpar2.
1414 This allows you to attach complex 'filters' to queries for a given
1415 target, sometimes necessary to select sub-catalogs
1416 in union catalog systems, etc.
1422 <term>pz:pqf_strftime</term>
1425 Allows you to extend a query with dates and operators.
1426 The provided string allows certain substitutions and serves as a
1428 The special two character sequence '%%' gets converted to the
1429 original query. Other characters leading with the percent sign are
1430 conversions supported by strftime.
1431 All other characters are copied verbatim. For example, the string
1432 <literal>@and @attr 1=30 @attr 2=3 %Y %%</literal>
1433 would search for current year combined with the original PQF (%%).
1436 This setting can also be used as more general alternative to
1437 pz:pqf_prefix -- a way of embedding the submitted query
1438 anywhere in the string rather than appending it to prefix. For
1439 example, if it is desired to omit all records satisfying the
1440 query <literal>@attr 1=pica.bib 0007</literal> then this
1441 subquery can be combined with the submitted query as the second
1442 argument of <literal>@andnot</literal> by using the
1443 pz:pqf_strftime value <literal>@not %% @attr 1=pica.bib
1450 <term>pz:preferred</term>
1453 Specifies that a target is preferred, e.g. possible local, faster
1454 target. Using block=preferred on <link linkend="command-show">
1455 show command</link> will wait for all these
1456 targets to return records before releasing the block.
1457 If no target is preferred, the block=preferred will identical to
1458 block=1, which release when one target has returned records.
1464 <term>pz:present_chunk</term>
1467 Controls the chunk size in present requests. Pazpar2 will
1468 make (maxrecs / chunk) request(s). The default is 20.
1474 <term>pz:queryencoding</term>
1477 The encoding of the search terms that a target accepts. Most
1478 targets do not honor UTF-8 in which case this needs to be specified.
1479 Each term in a query will be converted if this setting is given.
1485 <term>pz:recordfilter</term>
1488 Specifies a filter which allows Pazpar2 to only include
1489 records that meet a certain criteria in a result.
1490 Unmatched records will be ignored.
1491 The filter takes the form name, name~value, or name=value, which
1492 will include only records with metadata element (name) that has the
1493 substring (~value) given, or matches exactly (=value).
1494 If value is omitted all records with the named metadata element
1495 present will be included.
1500 <varlistentry id="requestsyntax">
1501 <term>pz:requestsyntax</term>
1504 This specifies the record syntax to use when requesting
1505 records from a given server. The value can be a symbolic name like
1506 marc21 or xml, or it can be a Z39.50-style dot-separated OID.
1512 <term>pz:sort</term>
1515 Specifies sort criteria to be applied to the result set.
1516 Only works for targets which support the sort service.
1521 <varlistentry id="pzsortmap">
1522 <term>pz:sortmap:<replaceable>field</replaceable></term>
1525 Specifies native sorting for a target where
1526 <replaceable>field</replaceable> is a sort criterion (see command
1527 show). The value has two components separated by a colon: strategy and
1528 native-field. Strategy is one of <literal>z3950</literal>,
1529 <literal>type7</literal>, <literal>cql</literal>,
1530 <literal>sru11</literal>, or <literal>embed</literal>.
1531 The second component, native-field, is the field that is recognized
1536 Only supported for Pazpar2 1.6.4 and later.
1546 This setting enables
1547 <ulink url="&url.sru;">SRU</ulink>/<ulink url="&url.solr;">Solr</ulink>
1549 It has four possible settings.
1550 'get', enables SRU access through GET requests. 'post' enables SRU/POST
1551 support, less commonly supported, but useful if very large requests are
1552 to be submitted. 'soap' enables the SRW (SRU over SOAP) variation of
1556 A value of 'solr' enables Solr client support. This is supported
1557 for Pazpar version 1.5.0 and later.
1563 <term>pz:sru_version</term>
1566 This allows SRU version to be specified. If unset Pazpar2
1567 will the default of YAZ (currently 1.2). Should be set
1568 to 1.1 or 1.2. For Solr, the current supported/tested version
1575 <term>pz:termlist_term_count</term>
1578 Specifies number of facet terms to be requested from the target.
1579 The default is unspecified e.g. server-decided. Also see pz:facetmap.
1585 <term>pz:termlist_term_factor</term>
1588 Specifies whether to use a factor for pazpar2 generated facets (1)
1590 When mixing locally generated (by the downloaded (pz:maxrecs) samples)
1591 facet with native (target-generated) facets, the later will
1592 dominated the dominate the facet list since they are generated
1593 based on the complete result set.
1594 By scaling up the facet count using the ratio between total hit
1595 count and the sample size,
1596 the total facet count can be approximated and thus better compared
1597 with native facets. This is not enabled by default.
1603 <varlistentry id="pztimeout">
1604 <term>pz:timeout</term>
1607 Specifies timeout for operation (eg search, and fetch) for
1608 a database. This overrides the z3650_operation timeout
1609 that is given for a service. See <xref linkend="service-timeout"/>.
1613 The timeout facility is supported for Pazpar2 version 1.8.4 and later.
1619 <varlistentry id="pzurl">
1623 Specifies URL for the target and overrides the target ID.
1627 <literal>pz:url</literal> is only recognized for
1628 Pazpar2 1.6.4 and later.
1635 <term id="pzxslt" xreflabel="pz:xslt">pz:xslt</term>
1638 Is a comma separated list of of stylesheet names that specifies
1639 how to convert incoming records to the internal representation.
1642 For each name, the embedded stylesheets (XSL) that comes with the
1643 service definition are consulted first and takes precedence over
1644 external files; see <xref linkend="servicexslt"/>
1645 of service definition).
1646 If the name does not match an embedded stylesheet it is
1647 considered a filename.
1650 The suffix of each file specifies the kind of tranformation.
1651 Suffix "<literal>.xsl</literal>" makes an XSL transform. Suffix
1652 "<literal>.mmap</literal>" will use the MMAP transform (described below).
1655 The special value "<literal>auto</literal>" will use a file
1656 which is the <link linkend="requestsyntax">pz:requestsyntax's</link>
1658 <literal>'.xsl'</literal>.
1661 When mapping MARC records, XSLT can be bypassed for increased
1662 performance with the alternate "MARC map" format. Provide the
1663 path of a file with extension ".mmap" containing on each line:
1665 <field> <subfield> <metadata element></programlisting>
1672 To map the field value specify a subfield of '$'. To store a
1673 concatenation of all subfields, specify a subfield of '*'.
1679 <term>pz:zproxy</term>
1682 The 'pz:zproxy' setting has the value syntax
1683 'host.internet.adress:port', it is used to tunnel Z39.50
1684 requests through the named Z39.50 proxy.
1694 <title>SEE ALSO</title>
1697 <refentrytitle>pazpar2</refentrytitle>
1698 <manvolnum>8</manvolnum>
1701 <refentrytitle>yaz-icu</refentrytitle>
1702 <manvolnum>1</manvolnum>
1705 <refentrytitle>pazpar2_protocol</refentrytitle>
1706 <manvolnum>7</manvolnum>
1711 <!-- Keep this comment at the end of the file
1714 nxml-child-indent: 1