1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.4//EN"
3 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"
5 <!ENTITY % local SYSTEM "local.ent">
7 <!ENTITY % entities SYSTEM "entities.ent">
9 <!ENTITY % idcommon SYSTEM "common/common.ent">
12 <refentry id="pazpar2_conf">
14 <productname>Pazpar2</productname>
15 <productnumber>&version;</productnumber>
16 <info><orgname>Index Data</orgname></info>
20 <refentrytitle>Pazpar2 conf</refentrytitle>
21 <manvolnum>5</manvolnum>
22 <refmiscinfo class="manual">File formats and conventions</refmiscinfo>
26 <refname>pazpar2_conf</refname>
27 <refpurpose>Pazpar2 Configuration</refpurpose>
32 <command>pazpar2.conf</command>
37 <title>DESCRIPTION</title>
39 The Pazpar2 configuration file, together with any referenced XSLT files,
40 govern Pazpar2's behavior as a client, and control the normalization and
41 extraction of data elements from incoming result records, for the
42 purposes of merging, sorting, facet analysis, and display.
46 The file is specified using the option -f on the Pazpar2 command line.
47 There is not presently a way to reload the configuration file without
48 restarting Pazpar2, although this will most likely be added some time
56 The configuration file is XML-structured. It must be well-formed XML. All
57 elements specific to Pazpar2 should belong to the namespace
58 <literal>http://www.indexdata.com/pazpar2/1.0</literal>
59 (this is assumed in the
60 following examples). The root element is named "<literal>pazpar2</literal>".
61 Under the root element are a number of elements which group categories of
62 information. The categories are described below.
65 <refsect2 id="config-threads">
66 <title>threads</title>
68 This section is optional and is supported for Pazpar2 version 1.3.1 and
69 later . It is identified by element "<literal>threads</literal>" which
70 may include one attribute "<literal>number</literal>" which specifies
71 the number of worker-threads that the Pazpar2 instance is to use.
72 A value of 0 (zero) disables worker-threads (all work is carried out
76 <refsect2 id="config-server">
79 This section governs overall behavior of a server endpoint. It is identified
80 by the element "server" which takes an optional attribute, "id", which
81 identifies this particular Pazpar2 server. Any string value for "id"
86 elements are described below. From Pazpar2 version 1.2 this is
89 <variablelist> <!-- level 1 -->
94 Configures the webservice -- this controls how you can connect
95 to Pazpar2 from your browser or server-side code. The
96 attributes 'host' and 'port' control the binding of the
97 server. The 'host' attribute can be used to bind the server to
98 a secondary IP address of your system, enabling you to run
99 Pazpar2 on port 80 alongside a conventional web server. You
100 can override this setting on the command line using the option -h.
109 If this item is given, Pazpar2 will forward all incoming HTTP
110 requests that do not contain the filename 'search.pz2' to the
111 host and port specified using the 'host' and 'port'
112 attributes. The 'myurl' attribute is required, and should provide
113 the base URL of the server. Generally, the HTTP URL for the host
114 specified in the 'listen' parameter. This functionality is
115 crucial if you wish to use
116 Pazpar2 in conjunction with browser-based code (JS, Flash,
117 applets, etc.) which operates in a security sandbox. Such code
118 can only connect to the same server from which the enclosing
119 HTML page originated. Pazpar2s proxy functionality enables you
120 to host all of the main pages (plus images, CSS, etc) of your
121 application on a conventional webserver, while efficiently
122 processing webservice requests for metasearch status, results,
129 <term>icu_chain</term>
132 Specifies character set normalization for relevancy / sorting /
133 mergekey and facets - for the server. These definitions serves as
134 default for services that don't have these given. For the meaning
135 of these settings refer to the
136 <xref linkend="icuchain"/> element inside service.
142 <term>relevance / sort / mergekey / facet</term>
145 Obsolete. Use element icu_chain instead.
151 <term>settings</term>
154 Specifies target settings for the server.. These settings serves
155 as default for all services which don't have these given.
156 The settings element requires one attribute 'src' which specifies
157 a settings file or a directory . If a directory is given all
158 files with suffix <filename>.xml</filename> is read from this
160 <xref linkend="target_settings"/> for more information.
166 <term id="service_conf">service</term>
169 This nested element controls the behavior of Pazpar2 with
170 respect to your data model. In Pazpar2, incoming records are
171 normalized, using XSLT, into an internal representation.
172 The 'service' section controls the further processing and
173 extraction of data from the internal representation, primarily
174 through the 'metadata' sub-element.
177 Pazpar2 version 1.2 and later allows multiple service elements.
178 Multiple services must be given a unique ID by specifying
179 attribute <literal>id</literal>.
180 A single service may be unnamed (service ID omitted). The
181 service ID is referred to in the
182 <link linkend="command-init"><literal>init</literal></link> webservice
183 command's <literal>service</literal> parameter.
186 <variablelist> <!-- Level 2 -->
188 <term>metadata</term>
191 One of these elements is required for every data element in
192 the internal representation of the record (see
193 <xref linkend="data_model"/>. It governs
194 subsequent processing as pertains to sorting, relevance
195 ranking, merging, and display of data elements. It supports
196 the following attributes:
199 <variablelist> <!-- level 3 -->
204 This is the name of the data element. It is matched
205 against the 'type' attribute of the
207 in the normalized record. A warning is produced if
208 metadata elements with an unknown name are
210 normalized record. This name is also used to
212 data elements in the records returned by the
213 webservice API, and to name sort lists and browse
223 The type of data element. This value governs any
224 normalization or special processing that might take
225 place on an element. Possible values are 'generic'
226 (basic string), 'year' (a range is computed if
227 multiple years are found in the record). Note: This
228 list is likely to increase in the future.
237 If this is set to 'yes', then the data element is
238 includes in brief records in the webservice API. Note
239 that this only makes sense for metadata elements that
240 are merged (see below). The default value is 'no'.
249 Specifies that this data element is to be used for
250 sorting. The possible values are 'numeric' (numeric
251 value), 'skiparticle' (string; skip common, leading
252 articles), and 'no' (no sorting). The default value is
259 <term id="metadata-rank">rank</term>
262 Specifies that this element is to be used to
264 records against the user's query (when ranking is
266 The valus is of the form
270 where M is an integer, used as a
271 weight against the basic TF*IDF score. A value of
272 1 is the base, higher values give additional weight to
273 elements of this type. The default is '0', which
274 excludes this element from the rank calculation.
277 F is a CCL field and N is the multipler for terms
278 that matches those part of the CCL field in search.
279 The F+N combo allows the system to use a different
280 multipler for a certain field. For example, a rank value of
281 "<literal>1 au 3</literal>" gives a multipler of 3 for
282 all terms part of the au(thor) terms and 1 for everything else.
285 For Pazpar2 1.6.13 and later, the rank may also defined
286 "per-document", by the normalization stylesheet.
289 The per field rank was introduced in Pazpar2 1.6.15. Earlier
290 releases only allowed a rank value M (simple integer).
292 See <xref linkend="relevance_ranking"/> for more
298 <term>termlist</term>
301 Specifies that this element is to be used as a
302 termlist, or browse facet. Values are tabulated from
303 incoming records, and a highscore of values (with
304 their associated frequency) is made available to the
305 client through the webservice API.
307 are 'yes' and 'no' (default).
316 This governs whether, and how elements are extracted
317 from individual records and merged into cluster
318 records. The possible values are: 'unique' (include
319 all unique elements), 'longest' (include only the
320 longest element (strlen), 'range' (calculate a range
321 of values across all matching records), 'all' (include
322 all elements), or 'no' (don't merge; this is the
326 Pazpar 1.6.24 also offers a new value for merge, 'first', which
327 is like 'all' but only takes all from first database that returns
328 the particular metadata field.
334 <term>mergekey</term>
337 If set to '<literal>required</literal>', the value of this
338 metadata element is appended to the resulting mergekey if
339 the metadata is present in a record instance.
340 If the metadata element is not present, the a unique mergekey
341 will be generated instead.
344 If set to '<literal>optional</literal>', the value of this
345 metadata element is appended to the resulting mergekey if the
346 the metadata is present in a record instance. If the metadata
347 is not present, it will be empty.
350 If set to '<literal>no</literal>' or the mergekey attribute is
351 omitted, the metadata will not be used in the creation of a
358 <term id="facetrule">facetrule</term>
361 Specifies the ICU rule set to be used for normalizing
362 facets. If facetrule is omitted from metadata, the
363 rule set 'facet' is used.
369 <term id="limitcluster">limitcluster</term>
372 Allow a limit on merged metadata. The value of this attribute
373 is the name of actual metadata content to be used for matching
374 (most often same name as metadata name).
378 Requires Pazpar2 1.6.23 or later.
385 <term id="metadata_limitmap">limitmap</term>
388 Specifies a default limitmap for this field. This is to avoid mass
389 configuring of targets. However it is important to review/do
390 this on a per target since it is usually target-specific.
391 See limitmap for format.
397 <term id="metadata_facetmap">facetmap</term>
400 Specifies a default facetmap for this field. This is to avoid mass
401 configuring of targets. However it is important to review/do
402 this on a per target since it is usually target-specific.
403 See facetmap for format.
412 This attribute allows you to make use of static database
413 settings in the processing of records. Three possible values
414 are allowed. 'no' is the default and doesn't do anything.
415 'postproc' copies the value of a setting with the same name
416 into the output of the normalization stylesheet(s). 'parameter'
417 makes the value of a setting with the same name available
418 as a parameter to the normalization stylesheet, so you
419 can further process the value inside of the stylesheet, or use
420 the value to decide how to deal with other data values.
423 The purpose of using settings in this way can either be to
424 control the behavior of normalization stylesheet in a database-
425 dependent way, or to easily make database-dependent values
426 available to display-logic in your user interface, without having
427 to implement complicated interactions between the user interface
428 and your configuration system.
433 </variablelist> <!-- attributes to metadata -->
439 <term id="servicexslt" xreflabel="xslt">xslt</term>
442 Defines a XSLT stylesheet. The <literal>xslt</literal>
443 element takes exactly one attribute <literal>id</literal>
444 which names the stylesheet. This can be referred to in target
445 settings <xref linkend="pzxslt"/>.
448 The content of the xslt element is the embedded stylesheet XML
453 <term id="icuchain" xreflabel="icu_chain">icu_chain</term>
456 Specifies a named ICU rule set. The icu_chain element must include
457 attribute 'id' which specifies the identifier (name) for the ICU
459 Pazpar2 uses the particular rule sets for particular purposes.
460 Rule set 'relevance' is used to normalize
461 terms for relevance ranking. Rule set 'sort' is used to
462 normalize terms for sorting. Rule set 'mergekey' is used to
463 normalize terms for making a mergekey and, finally. Rule set 'facet'
464 is normally used to normalize facet terms, unless
465 <xref linkend="facetrule">facetrule</xref> is given for a
469 The icu_chain element must also include a 'locale'
470 attribute which must be set to one of the locale strings
471 defined in ICU. The child elements listed below can be
472 in any order, except the 'index' element which logically
473 belongs to the end of the list. The stated tokenization,
474 transformation and charmapping instructions are performed
475 in order from top to bottom.
477 <variablelist> <!-- Level 2 -->
482 The attribute 'rule' defines the direction of the
483 per-character casemapping, allowed values are "l"
484 (lower), "u" (upper), "t" (title).
489 <term>transform</term>
492 Normalization and transformation of tokens follows
493 the rules defined in the 'rule' attribute. For
494 possible values we refer to the extensive ICU
495 documentation found at the
496 <ulink url="&url.icu.transform;">ICU
497 transformation</ulink> home page. Set filtering
498 principles are explained at the
499 <ulink url="&url.icu.unicode.set;">ICU set and
500 filtering</ulink> page.
505 <term>tokenize</term>
508 Tokenization is the only rule in the ICU chain
509 which splits one token into multiple tokens. The
510 'rule' attribute may have the following values:
511 "s" (sentence), "l" (line-break), "w" (word), and
512 "c" (character), the later probably not being
513 very useful in a pruning Pazpar2 installation.
519 From Pazpar2 version 1.1 the ICU wrapper from YAZ is used.
520 Refer to the <ulink url="&url.yaz.yaz-icu;">yaz-icu</ulink>
521 utility for more information.
527 <term>relevance</term>
530 Specifies the ICU rule set used for relevance ranking.
531 The child element of 'relevance' must be 'icu_chain' and the
532 'id' attribute of the icu_chain is ignored. This
533 definition is obsolete and should be replaced by the equivalent
536 <icu_chain id="relevance" locale="en">..<icu_chain>
546 Specifies the ICU rule set used for sorting.
547 The child element of 'sort' must be 'icu_chain' and the
548 'id' attribute of the icu_chain is ignored. This
549 definition is obsolete and should be replaced by the equivalent
552 <icu_chain id="sort" locale="en">..<icu_chain>
559 <term>mergekey</term>
562 Specifies ICU tokenization and transformation rules
563 for tokens that are used in Pazpar2's mergekey.
564 The child element of 'mergekey' must be 'icu_chain' and the
565 'id' attribute of the icu_chain is ignored. This
566 definition is obsolete and should be replaced by the equivalent
569 <icu_chain id="mergekey" locale="en">..<icu_chain>
579 Specifies ICU tokenization and transformation rules
580 for tokens that are used in Pazpar2's facets.
581 The child element of 'facet' must be 'icu_chain' and the
582 'id' attribute of the icu_chain is ignored. This
583 definition is obsolete and should be replaced by the equivalent
586 <icu_chain id="facet" locale="en">..<icu_chain>
593 <term>ccldirective</term>
596 Customizes the CCL parsing (interpretation of query parameter
598 The name and value of the CCL directive is gigen by attributes
599 'name' and 'value' respectively. Refer to possible list of names
602 url="http://www.indexdata.com/yaz/doc/tools.html#ccl.directives.table">
613 Customizes the ranking (relevance) algorithm. Also known as
614 rank tweaks. The rank element
615 accepts the following attributes - all being optional:
622 Attribute 'cluster' is a boolean
623 that controls whether Pazpar2 should boost ranking for merged
624 records. Is 'yes' by default. A value of 'no' will make
625 Pazpar2 average ranking of each record in a cluster.
633 Attribute 'debug' is a boolean
634 that controls whether Pazpar2 should include details
635 about ranking for each document in the show command's
636 response. Enable by using value "yes", disable by using
637 value "no" (default).
645 Attribute 'follow' is a a floating point number greater than
646 or equal to 0. A positive number will boost weight for terms
647 that occur close to each other (proximity, distance).
648 A value of 1, will double the weight if two terms are in
649 proximity distance of 1 (next to each other). The default
650 value of 'follow' is 0 (order will not affect weight).
658 Attribute 'lead' is a floating point number.
659 It controls if term weight should be reduced by position
660 from start in a metadata field. A positive value of 'lead'
661 will reduce weight as it apperas further away from the lead
662 of the field. Default value is 0 (no reduction of weight by
671 Attribute 'length' determines how/if term weight should be
672 divided by lenght of metadata field. A value of "linear"
673 divide by length. A value of "log" will divide by log2(length).
674 A value of "none" will leave term weight as is (no division).
675 Default value is "linear".
681 Refer to <xref linkend="relevance_ranking"/> to see how
682 these tweaks are used in computation of score.
685 Customization of ranking algorithm was introduced with
686 Pazpar2 1.6.18. The semantics of some of the fields changed
687 in versions up to 1.6.22.
692 <varlistentry id="sort-default">
693 <term>sort-default</term>
696 Specifies the default sort criteria (default 'relevance'),
697 which previous was hard-coded as default criteria in search.
698 This is a fix/work-around to avoid re-searching when using
699 target-based sorting. In order for this to work efficient,
700 the search must also have the sort critera parameter; otherwise
701 pazpar2 will do re-searching on search criteria changes, if
702 changed between search and show command.
705 This configuration was added in pazpar2 1.6.20.
715 Specifies a variable that will be inherited by all targets defined in settings
717 <set name="test" value="en"..<set>
724 <term>settings</term>
727 Specifies target settings for this service. Refer to
728 <xref linkend="target_settings"/>.
737 Specifies timeout parameters for this service.
738 The <literal>timeout</literal>
739 element supports the following attributes:
740 <literal>session</literal>, <literal>z3950_operation</literal>,
741 <literal>z3950_session</literal> which specifies
742 'session timeout', 'Z39.50 operation timeout',
743 'Z39.50 session timeout' respectively. The Z39.50 operation
744 timeout is the time Pazpar2 will wait for an active Z39.50/SRU
745 operation before it gives up (times out). The Z39.50 session
746 time out is the time Pazpar2 will keep the session alive for
747 an idle session (no operation).
750 The following is recommended but not required:
751 z3950_operation (30) < session (60) < z3950_session (180) .
752 The default values are given in parantheses.
756 </variablelist> <!-- Data elements in service directive -->
759 </variablelist> <!-- Data elements in server directive -->
764 <title>EXAMPLE</title>
766 Below is a working example configuration:
770 <?xml version="1.0" encoding="UTF-8"?>
771 <pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
773 <threads number="10"/>
775 <listen port="9004"/>
777 <metadata name="title" brief="yes" sortkey="skiparticle"
778 merge="longest" rank="6"/>
779 <metadata name="isbn" merge="unique"/>
780 <metadata name="date" brief="yes" sortkey="numeric"
781 type="year" merge="range" termlist="yes"/>
782 <metadata name="author" brief="yes" termlist="yes"
783 merge="longest" rank="2"/>
784 <metadata name="subject" merge="unique" termlist="yes" rank="3" limitmap="local:"/>
785 <metadata name="url" merge="unique"/>
786 <icu_chain id="relevance" locale="el">
787 <transform rule="[:Control:] Any-Remove"/>
789 <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
792 <settings src="mysettings"/>
793 <timeout session="60"/>
801 <refsect1 id="config-include">
802 <title>INCLUDE FACILITY</title>
804 The XML configuration may be partitioned into multiple files by using
805 the <literal>include</literal> element which takes a single attribute,
806 <literal>src</literal>. The <literal>src</literal> attribute is
807 regular Shell like glob-pattern. For example,
809 <include src="/etc/pazpar2/conf.d/*.xml"/>
813 The include facility requires Pazpar2 version 1.2.
817 <refsect1 id="target_settings">
818 <title>TARGET SETTINGS</title>
820 Pazpar2 features a cunning scheme by which you can associate various
821 kinds of attributes, or settings with search targets. This can be done
822 through XML files which are read at startup; each file can associate
823 one or more settings with one or more targets. The file format is generic
824 in nature, designed to support a wide range of application requirements.
825 The settings can be purely technical things, like, how to perform a title
826 search against a given target, or it can associate arbitrary name=value
827 pairs with groups of targets -- for instance, if you would like to
828 place all commercial full-text bases in one group for selection
829 purposes, or you would like to control what targets are accessible
830 to users by default. Per-database settings values can even be used
831 to drive sorting, facet/termlist generation, or end-user interface display
836 During startup, Pazpar2 will recursively read a specified directory
837 (can be identified in the pazpar2.cfg file or on the command line), and
838 process any settings files found therein.
842 Clients of the Pazpar2 webservice interface can selectively override
843 settings for individual targets within the scope of one session. This
844 can be used in conjunction with an external authentication system to
845 determine which resources are to be accessible to which users. Pazpar2
846 itself has no notion of end-users, and so can be used in conjunction
847 with any type of authentication system. Similarly, the authentication
848 tokens submitted to access-controlled search targets can similarly be
849 overridden, to allow use of Pazpar2 in a consortial or multi-library
850 environment, where different end-users may need to be represented to
851 some search targets in different ways. This, again, can be managed
852 using an external database or other lookup mechanism. Setting overrides
853 can be performed either using the
854 <link linkend="command-init">init</link> or the
855 <link linkend="command-settings">settings</link> webservice
860 In fact, every setting that applies to a database (except pz:id, which
861 can only be used for filtering targets to use for a search) can be overridden
862 on a per-session basis.
863 This allows the client to override specific CCL fields for
864 searching, etc., to meet the needs of a session or user.
868 Finally, as an extreme case of this, the webservice client can
869 introduce entirely new targets, on the fly, as part of the
870 <link linkend="command-init">init</link> or
871 <link linkend="command-settings">settings</link> command.
872 This is useful if you desire to manage information
873 about your search targets in a separate application such as a database.
874 You do not need any static settings file whatsoever to run Pazpar2 -- as
875 long as the webservice client is prepared to supply the necessary
876 information at the beginning of every session.
881 The following discussion of practical issues related to session
882 and settings management are cast in terms of a user interface based on
883 Ajax/Javascript technology. It would apply equally well to many other
884 kinds of browser-based logic.
889 Typically, a Javascript client is not allowed to directly alter the
890 parameters of a session. There are two reasons for this. One has to do
891 with access to information; typically, information about a user will
892 be stored in a system on the server side, or it will be accessible in
893 some way from the server. However, since the Javascript client cannot
894 be entirely trusted (some hostile agent might in fact 'pretend' to be
895 a regular ws client), it is more robust to control session settings
896 from scripting that you run as part of your webserver. Typically, this
897 can be handled during the session initialization, as follows:
901 Step 1: The Javascript client loads, and asks the webserver for a
902 new Pazpar2 session ID. This can be done using a Javascript call, for
903 instance. Note that it is possible to submit Ajax HTTPXmlRequest calls
904 either to Pazpar2 or to the webserver that Pazpar2 is proxying
905 for. See (XXX Insert link to Pazpar2 protocol).
909 Step 2: Code on the webserver authenticates the user, by database lookup,
910 LDAP access, NCIP, etc. Determines which resources the user has access to,
911 and any user-specific parameters that are to be applied during this session.
915 Step 3: The webserver initializes a new Pazpar2 settings, and sets
916 user-specific parameters as necessary, using the init webservice
917 command. A new session ID is returned.
921 Step 4: The webserver returns this session ID to the Javascript
922 client, which then uses the session ID to submit searches, show
927 Step 5: When the Javascript client ceases to use the session,
928 Pazpar2 destroys any session-specific information.
932 <title>SETTINGS FILE FORMAT</title>
934 Each file contains a root element named <settings>. It may
935 contain one or more <set> elements. The settings and set
936 elements may contain the following attributes. Attributes in the set
937 node overrides those in the setting root element. Each set node must
938 specify (directly, or inherited from the parent node) at least a
939 target, name, and value.
947 This specifies the search target to which this setting should be
948 applied. Targets are identified by their Z39.50 URL, generally
949 including the host, port, and database name, (e.g.
950 <literal>bagel.indexdata.com:210/marc</literal>).
951 Two wildcard forms are accepted:
952 * (asterisk) matches all known targets;
953 <literal>bagel.indexdata.com:210/*</literal> matches all
954 known databases on the given host.
957 A precedence system determines what happens if there are
958 overlapping values for the same setting name for the same
959 target. A setting for a specific target name overrides a
960 setting which specifies target using a wildcard. This makes it
961 easy to set defaults for all targets, and then override them
962 for specific targets or hosts. If there are
963 multiple overlapping settings with the same name and target
964 value, the 'precedence' attribute determines what happens.
967 For Pazpar2 1.6.4 or later, the target ID may be user-defined, in
968 which case, the actual host, port, etc is given by setting
969 <xref linkend="pzurl"/>.
977 The name of the setting. This can be anything you like.
978 However, Pazpar2 reserves a number of setting names for
979 specific purposes, all starting with 'pz:', and it is a good
980 idea to avoid that prefix if you make up your own setting
981 names. See below for a list of reserved variables.
989 The value of the setting. Generally, this can be anything you
990 want -- however, some of the reserved settings may expect
991 specific kinds of values.
996 <term>precedence</term>
999 This should be an integer. If not provided, the default value
1000 is 0. If two (or more) settings have the same content for
1001 target and name, the precedence value determines the outcome.
1002 If both settings have the same precedence value, they are both
1003 applied to the target(s). If one has a higher value, then the
1004 value of that setting is applied, and the other one is ignored.
1011 By setting defaults for target, name, or value in the root
1012 settings node, you can use the settings files in many different
1013 ways. For instance, you can use a single file to set defaults for
1014 many different settings, like search fields, retrieval syntaxes,
1015 etc. You can have one file per server, which groups settings for
1016 that server or target. You could also have one file which associates
1017 a number of targets with a given setting, for instance, to associate
1018 many databases with a given category or class that makes sense
1019 within your application.
1023 The following examples illustrate uses of the settings system to
1024 associate settings with targets to meet different requirements.
1028 The example below associates a set of default values that can be
1029 used across many targets. Note the wildcard for targets.
1030 This associates the given settings with all targets for which no
1031 other information is provided.
1033 <settings target="*">
1035 <!-- This file introduces default settings for pazpar2 -->
1037 <!-- mapping for unqualified search -->
1038 <set name="pz:cclmap:term" value="u=1016 t=l,r s=al"/>
1040 <!-- field-specific mappings -->
1041 <set name="pz:cclmap:ti" value="u=4 s=al"/>
1042 <set name="pz:cclmap:su" value="u=21 s=al"/>
1043 <set name="pz:cclmap:isbn" value="u=7"/>
1044 <set name="pz:cclmap:issn" value="u=8"/>
1045 <set name="pz:cclmap:date" value="u=30 r=r"/>
1047 <set name="pz:limitmap:title" value="rpn:@attr 1=4 @attr 6=3"/>
1048 <set name="pz:limitmap:date" value="ccl:date"/>
1050 <!-- Retrieval settings -->
1052 <set name="pz:requestsyntax" value="marc21"/>
1053 <set name="pz:elements" value="F"/>
1055 <!-- Query encoding -->
1056 <set name="pz:queryencoding" value="iso-8859-1"/>
1058 <!-- Result normalization settings -->
1060 <set name="pz:nativesyntax" value="iso2709"/>
1061 <set name="pz:xslt" value="../etc/marc21.xsl"/>
1069 The next example shows certain settings overridden for one target,
1070 one which returns XML records containing DublinCore elements, and
1071 which furthermore requires a username/password.
1073 <settings target="funkytarget.com:210/db1">
1074 <set name="pz:requestsyntax" value="xml"/>
1075 <set name="pz:nativesyntax" value="xml"/>
1076 <set name="pz:xslt" value="../etc/dublincore.xsl"/>
1078 <set name="pz:authentication" value="myuser/password"/>
1084 The following example associates a specific name/value combination
1085 with a number of targets. The targets below are access-restricted,
1086 and can only be used by users with special credentials.
1088 <settings name="pz:allow" value="0">
1089 <set target="funkytarget.com:210/*"/>
1090 <set target="commercial.com:2100/expensiveDb"/>
1098 <title>RESERVED SETTING NAMES</title>
1100 The following setting names are reserved by Pazpar2 to control the
1101 behavior of the client function.
1106 <term>pz:cclmap:xxx</term>
1109 This establishes a CCL field definition or other setting, for
1110 the purpose of mapping end-user queries. XXX is the field or
1111 setting name, and the value of the setting provides parameters
1112 (e.g. parameters to send to the server, etc.). Please consult
1113 the YAZ manual for a full overview of the many capabilities of
1114 the powerful and flexible CCL parser.
1117 Note that it is easy to establish a set of default parameters,
1118 and then override them individually for a given target.
1122 <varlistentry id="requestsyntax">
1123 <term>pz:requestsyntax</term>
1126 This specifies the record syntax to use when requesting
1127 records from a given server. The value can be a symbolic name like
1128 marc21 or xml, or it can be a Z39.50-style dot-separated OID.
1133 <term>pz:elements</term>
1136 The element set name to be used when retrieving records from a
1142 <term>pz:piggyback</term>
1145 Piggybacking enables the server to retrieve records from the
1146 server as part of the search response in Z39.50. Almost all
1147 servers support this (or fail it gracefully), but a few
1148 servers will produce undesirable results.
1149 Set to '1' to enable piggybacking, '0' to disable it. Default
1150 is 1 (piggybacking enabled).
1155 <term>pz:nativesyntax</term>
1158 Specifies how Pazpar2 shoule map retrieved records to XML. Currently
1159 supported values are <literal>xml</literal>,
1160 <literal>iso2709</literal> and <literal>txml</literal>.
1163 The value <literal>iso2709</literal> makes Pazpar2 convert retrieved
1164 MARC records to MARCXML. In order to convert to XML, the exact
1165 chacater set of the MARC must be known (if not, the resulting
1166 XML is probably not well-formed). The character set may be
1167 specified by adding:
1168 <literal>;</literal><replaceable>charset</replaceable> to
1169 <literal>iso2709</literal>. If omitted, a charset of
1170 MARC-8 is assumed. This is correct for most MARC21/USMARC records.
1173 The value <literal>txml</literal> is like <literal>iso2709</literal>
1174 except that records are converted to TurboMARC instead of MARCXML.
1177 The value <literal>xml</literal> is used if Pazpar2 retrieves
1178 records that are already XML (no conversion takes place).
1184 <term>pz:queryencoding</term>
1187 The encoding of the search terms that a target accepts. Most
1188 targets do not honor UTF-8 in which case this needs to be specified.
1189 Each term in a query will be converted if this setting is given.
1195 <term>pz:negotiation_charset</term>
1198 Sets character set for Z39.50 negotiation. Most targets do not support
1199 this, and some will even close connection if set (crash on server
1200 side or similar). If set, you probably want to set it to
1201 <literal>UTF-8</literal>.
1207 <term id="pzxslt" xreflabel="pz:xslt">pz:xslt</term>
1210 Is a comma separated list of of stylesheet names that specifies
1211 how to convert incoming records to the internal representation.
1214 For each name, the embedded stylesheets (XSL) that comes with the
1215 service definition are consulted first and takes precedence over
1216 external files; see <xref linkend="servicexslt"/>
1217 of service definition).
1218 If the name does not match an embedded stylesheet it is
1219 considered a filename.
1222 The suffix of each file specifies the kind of tranformation.
1223 Suffix "<literal>.xsl</literal>" makes an XSL transform. Suffix
1224 "<literal>.mmap</literal>" will use the MMAP transform (described below).
1227 The special value "<literal>auto</literal>" will use a file
1228 which is the <link linkend="requestsyntax">pz:requestsyntax's</link>
1230 <literal>'.xsl'</literal>.
1233 When mapping MARC records, XSLT can be bypassed for increased
1234 performance with the alternate "MARC map" format. Provide the
1235 path of a file with extension ".mmap" containing on each line:
1237 <field> <subfield> <metadata element></programlisting>
1244 To map the field value specify a subfield of '$'. To store a
1245 concatenation of all subfields, specify a subfield of '*'.
1250 <term>pz:authentication</term>
1253 Sets an authentication string for a given database. For Z39.50,
1254 this is carried as part of the Initialize Request. In order to carry
1255 the information in the "open" elements, separate
1256 username and password with a slash (In Z39.50 it is a VisibleString).
1257 In order to carry the information in the idPass elements, separate
1258 username term, password term and, optionally, a group term with a
1260 If three terms are given, the order is
1261 <emphasis>user, group, password</emphasis>.
1262 If only two terms are given, the order is
1263 <emphasis>user, password</emphasis>.
1266 For HTTP based procotols, such as SRU and Solr, the authentication
1267 string includes a username term and, optionally, a password term.
1268 Each term is separated by a single blank. The
1269 authentication information is passed either by HTTP basic
1270 authentication or via URL parameters. The mode is operation is
1271 determined by <literal>pz:authentication_mode</literal> setting.
1277 <term>pz:authentication_mode</term>
1280 Determines how authentication is carried in HTTP based protocols.
1281 Value may be "<literal>basic</literal>" or "<literal>url</literal>".
1286 <term>pz:allow</term>
1289 Allows or denies access to the resources it is applied to. Possible
1290 values are '0' and '1'.
1291 The default is '1' (allow access to this resource).
1296 <term>pz:maxrecs</term>
1299 Controls the maximum number of records to be retrieved from a
1300 server. The default is 100.
1305 <term>pz:extendrecs</term>
1308 If a show command goes to the boundary of a result set for a
1309 database - depends on sorting - and pz:extendrecs is set to a positive
1310 value. then Pazpar2 wait for show to fetch pz:extendrecs more
1311 records. This setting is best used if a database does native
1312 sorting, because the result set otherwise may be completely
1313 re-sorted during extended fetch.
1314 The default value of pz:extendrecs is 0 (no extended fetch).
1318 The pz:extendrecs setting appeared in Pazpar2 version 1.6.26.
1319 But the bahavior changed with the release of Pazpar2 1.6.29.
1325 <term>pz:presentchunk</term>
1328 Controls the chunk size in present requests. Pazpar2 will
1329 make (maxrecs / chunk) request(s). The default is 20.
1337 This setting can't be 'set' -- it contains the ID (normally
1338 ZURL) for a given target, and is useful for filtering --
1339 specifically when you want to select one or more specific
1340 targets in the search command.
1345 <term>pz:zproxy</term>
1348 The 'pz:zproxy' setting has the value syntax
1349 'host.internet.adress:port', it is used to tunnel Z39.50
1350 requests through the named Z39.50 proxy.
1356 <term>pz:apdulog</term>
1359 If the 'pz:apdulog' setting is defined and has other value than 0,
1360 then Z39.50 APDUs are written to the log.
1369 This setting enables
1370 <ulink url="&url.sru;">SRU</ulink>/<ulink url="&url.solr;">Solr</ulink>
1372 It has four possible settings.
1373 'get', enables SRU access through GET requests. 'post' enables SRU/POST
1374 support, less commonly supported, but useful if very large requests are
1375 to be submitted. 'soap' enables the SRW (SRU over SOAP) variation of
1379 A value of 'solr' enables Solr client support. This is supported
1380 for Pazpar version 1.5.0 and later.
1386 <term>pz:sru_version</term>
1389 This allows SRU version to be specified. If unset Pazpar2
1390 will the default of YAZ (currently 1.2). Should be set
1391 to 1.1 or 1.2. For Solr, the current supported/tested version
1398 <term>pz:pqf_prefix</term>
1401 Allows you to specify an arbitrary PQF query language substring.
1402 The provided string is prefixed to the user's query after it has been
1403 normalized to PQF internally in pazpar2.
1404 This allows you to attach complex 'filters' to queries for a given
1405 target, sometimes necessary to select sub-catalogs
1406 in union catalog systems, etc.
1412 <term>pz:pqf_strftime</term>
1415 Allows you to extend a query with dates and operators.
1416 The provided string allows certain substitutions and serves as a
1418 The special two character sequence '%%' gets converted to the
1419 original query. Other characters leading with the percent sign are
1420 conversions supported by strftime.
1421 All other characters are copied verbatim. For example, the string
1422 <literal>@and @attr 1=30 @attr 2=3 %Y %%</literal>
1423 would search for current year combined with the original PQF (%%).
1426 This setting can also be used as more general alternative to
1427 pz:pqf_prefix -- a way of embedding the submitted query
1428 anywhere in the string rather than appending it to prefix. For
1429 example, if it is desired to omit all records satisfying the
1430 query <literal>@attr 1=pica.bib 0007</literal> then this
1431 subquery can be combined with the submitted query as the second
1432 argument of <literal>@andnot</literal> by using the
1433 pz:pqf_strftime value <literal>@not %% @attr 1=pica.bib
1440 <term>pz:sort</term>
1443 Specifies sort criteria to be applied to the result set.
1444 Only works for targets which support the sort service.
1450 <term>pz:recordfilter</term>
1453 Specifies a filter which allows Pazpar2 to only include
1454 records that meet a certain criteria in a result.
1455 Unmatched records will be ignored.
1456 The filter takes the form name, name~value, or name=value, which
1457 will include only records with metadata element (name) that has the
1458 substring (~value) given, or matches exactly (=value).
1459 If value is omitted all records with the named metadata element
1460 present will be included.
1466 <term>pz:preferred</term>
1469 Specifies that a target is preferred, e.g. possible local, faster
1470 target. Using block=pref on show command will wait for all these
1471 targets to return records before releasing the block.
1472 If no target is preferred, the block=pref will identical to block=1,
1473 which release when one target has returned records.
1478 <term>pz:block_timeout</term>
1481 (Not yet implemented).
1482 Specifies the time for which a block should be released anyway.
1487 <term>pz:termlist_term_count</term>
1490 Specifies number of facet terms to be requested from the target.
1491 The default is unspecified e.g. server-decided. Also see pz:facetmap.
1496 <term>pz:termlist_term_factor</term>
1499 Specifies whether to use a factor for pazpar2 generated facets (1)
1501 When mixing locally generated (by the downloaded (pz:maxrecs) samples)
1502 facet with native (target-generated) facets, the later will
1503 dominated the dominate the facet list since they are generated
1504 based on the complete result set.
1505 By scaling up the facet count using the ratio between total hit
1506 count and the sample size,
1507 the total facet count can be approximated and thus better compared
1508 with native facets. This is not enabled by default.
1514 <term>pz:facetmap:<replaceable>name</replaceable></term>
1517 Specifies that for field <replaceable>name</replaceable>, the target
1518 supports (native) facets. The value is the name of the
1519 field on the target.
1523 At this point only Solr targets have been tested with this
1530 <varlistentry id="limitmap">
1531 <term>pz:limitmap:<replaceable>name</replaceable></term>
1534 Specifies attributes for limiting a search to a field - using
1535 the limit parameter for search. It can be used to filter locally
1536 or remotely (search in a target). In some cases the mapping of
1537 a field to a value is identical to an existing cclmap field; in
1538 other cases the field must be specified in a different way - for
1539 example to match a complete field (rather than parts of a subfield).
1542 The value of limitmap may have one of three forms: referral to
1543 an existing CCL field, a raw PQF string or a local limit. Leading string
1544 determines type; either <literal>ccl:</literal> for CCL field,
1545 <literal>rpn:</literal> for PQF/RPN, or <literal>local:</literal>
1546 for filtering in Pazpar2. The local filtering may be followed
1547 by a field a metadata field (default is to use the name of the
1551 For Pazpar2 version 1.6.23 and later the limitmap may include multiple
1552 specifications, separated by <literal>,</literal> (comma).
1554 <literal>ccl:title,local:ltitle,rpn:@attr 1=4</literal>.
1558 The limitmap facility is supported for Pazpar2 version 1.6.0.
1559 Local filtering is supported in Pazpar2 1.6.6.
1565 <varlistentry id="pzurl">
1569 Specifies URL for the target and overrides the target ID.
1573 <literal>pz:url</literal> is only recognized for
1574 Pazpar2 1.6.4 and later.
1580 <varlistentry id="pzsortmap">
1581 <term>pz:sortmap:<replaceable>field</replaceable></term>
1584 Specifies native sorting for a target where
1585 <replaceable>field</replaceable> is a sort criteria (see command
1586 show). The value has to components separated by colon: strategy and
1587 native-field. Strategy is one of <literal>z3950</literal>,
1588 <literal>type7</literal>, <literal>cql</literal>,
1589 <literal>sru11</literal>, or <literal>embed</literal>.
1590 The second component, native-field, is the field that is recognized
1595 Only supported for Pazpar2 1.6.4 and later.
1607 <title>SEE ALSO</title>
1610 <refentrytitle>pazpar2</refentrytitle>
1611 <manvolnum>8</manvolnum>
1614 <refentrytitle>yaz-icu</refentrytitle>
1615 <manvolnum>1</manvolnum>
1618 <refentrytitle>pazpar2_protocol</refentrytitle>
1619 <manvolnum>7</manvolnum>
1624 <!-- Keep this comment at the end of the file
1627 nxml-child-indent: 1