1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
2 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % common SYSTEM "common/common.ent">
11 <!-- $Id: pazpar2_conf.xml,v 1.11 2007-03-30 01:07:37 quinn Exp $ -->
12 <refentry id="pazpar2_conf">
14 <productname>Pazpar2</productname>
15 <productnumber>&version;</productnumber>
18 <refentrytitle>Pazpar2 conf</refentrytitle>
19 <manvolnum>5</manvolnum>
23 <refname>pazpar2_conf</refname>
24 <refpurpose>Pazpar2 Configuration</refpurpose>
29 <command>pazpar2.conf</command>
33 <refsect1><title>DESCRIPTION</title>
35 The pazpar2 configuration file, together with any referenced XSLT files,
36 govern pazpar2's behavior as a client, and control the normalization and
37 extraction of data elements from incoming result records, for the
38 purposes of merging, sorting, facet analysis, and display.
42 The file is specified using the option -f on the pazpar2 command line.
43 There is not presently a way to reload the configuration file without
44 restarting pazpar2, although this will most likely be added some time
49 <refsect1><title>FORMAT</title>
51 The configuration file is XML-structured. It must be valid XML. All
52 elements specific to pazpar2 should belong to the namespace
53 "http://www.indexdata.com/pazpar2/1.0" (this is assumed in the
54 following examples). The root element is named 'pazpar2'. Under the
55 root element are a number of elements which group categories of
56 information. The categories are described below.
59 <refsect2 id="config-server"><title>server</title>
61 This section governs overall behavior of the client. The data
62 elements are described below.
64 <variablelist> <!-- level 1 -->
69 Configures the webservice -- this controls how you can connect
70 to pazpar2 from your browser or server-side code. The
71 attributes 'host' and 'port' control the binding of the
72 server. The 'host' attribute can be used to bind the server to
73 a secondary IP address of your system, enabling you to run
74 pazpar2 on port 80 alongside a conventional web server. You
75 can override this setting on the command lineusing the option -h.
84 If this item is given, pazpar2 will forward all incoming HTTP
85 requests that do not contain the filename 'search.pz2' to the
86 host and port specified using the 'host' and 'port'
87 attributes. The 'myurl' attribute is required, and should provide
88 the base URL of the server. Generally, the HTTP URL for the host
89 specified in the 'listen' parameter. This functionality is
90 crucial if you wish to use
91 pazpar2 in conjunction with browser-based code (JS, Flash,
92 applets, etc.) which operates in a security sandbox. Such code
93 can only connect to the same server from which the enclosing
94 HTML page originated. Pazpar2s proxy functionality enables you
95 to host all of the main pages (plus images, CSS, etc) of your
96 application on a conventional webserver, while efficiently
97 processing webservice requests for metasearch status, results,
107 If this item is given, pazpar2 will send all Z39.50
108 packages through this Z39.50 proxy server.
109 At least one of the 'host' and 'post' attributes is required.
110 The 'host' attribute may contain both host name and port
111 number, seperated by a colon ':', or only the host name.
112 An empty 'host' attribute sets the Z39.50 host address
122 This nested element controls the behavior of pazpar2 with
123 respect to your data model. In pazpar2, incoming records are
124 normalized, using XSLT, into an internal representation (see
126 linkend="config-retrievalprofile">retrievalprofile</link> secion.
127 The 'service' section controls the further processing and
128 extraction of data from the internal representation, primarily
129 through the 'metdata' sub-element.
132 <variablelist> <!-- Level 2 -->
133 <varlistentry><term>metadata</term>
136 One of these elements is required for every data element in
137 the internal representation of the record (see
138 <xref linkend="data_model"/>. It governs
139 subsequent processing as pertains to sorting, relevance
140 ranking, merging, and display of data elements. It supports
141 the following attributes:
144 <variablelist> <!-- level 3 -->
145 <varlistentry><term>name</term>
148 This is the name of the data element. It is matched
149 against the 'type' attribute of the 'metadata' element
150 in the normalized record. A warning is produced if
151 metdata elements with an unknown name are found in the
152 normalized record. This name is also used to represent
153 data elements in the records returned by the
154 webservice API, and to name sort lists and browse
160 <varlistentry><term>type</term>
163 The type of data element. This value governs any
164 normalization or special processing that might take
165 place on an element. Possible values are 'generic'
166 (basic string), 'year' (a range is computed if
167 multiple years are found in the record). Note: This
168 list is likely to increase in the future.
173 <varlistentry><term>brief</term>
176 If this is set to 'yes', then the data element is
177 includes in brief records in the webservice API. Note
178 that this only makes sense for metadata elements that
179 are merged (see below). The default value is 'no'.
184 <varlistentry><term>sortkey</term>
187 Specifies that this data element is to be used for
188 sorting. The possible values are 'numeric' (numeric
189 value), 'skiparticle' (string; skip common, leading
190 articles), and 'no' (no sorting). The default value is
196 <varlistentry><term>rank</term>
199 Specifies that this element is to be used to help rank
200 records against the user's query (when ranking is
201 requested). The value is an integer, used as a
202 multiplier against the basic TF*IDF score. A value of
203 1 is the base, higher values give additional weight to
204 elements of this type. The default is '0', which
205 excludes this element from the rank calculation.
210 <varlistentry><term>termlist</term>
213 Specifies that this element is to be used as a
214 termlist, or browse facet. Values are tabulated from
215 incoming records, and a highscore of values (with
216 their associated frequency) is made available to the
217 client through the webservice API. The possible values
218 are 'yes' and 'no' (default).
223 <varlistentry><term>merge</term>
226 This governs whether, and how elements are extracted
227 from individual records and merged into cluster
228 records. The possible values are: 'unique' (include
229 all unique elements), 'longest' (include only the
230 longest element (strlen), 'range' (calculate a range
231 of values across al matching records), 'all' (include
232 all elements), or 'no' (don't merge; this is the
237 </variablelist> <!-- attributes to metadata -->
241 </variablelist> <!-- Data elements in service directive -->
244 </variablelist> <!-- Data elements in server directive -->
247 <refsect2 id="config-queryprofile"><title>queryprofile</title>
249 At the moment, this directive is ignored; there is one global
250 CCL-mapping file which governs the mapping of queries to Z39.50
251 type-1. This file is located in etc/default.bib. This will change
256 <refsect2 id="config_retrievalprofile"><title>retrievalprofile</title>
258 Note: In the present version, there is a single retrieval
259 profile. However, in a future release, it will be possible to
260 associate unique retrieval profiles with different targets, or to
261 generate retrieval profiles using XSLT from the ZeeRex description of
266 The following data elements are recognized for the retrievalprofile
271 <varlistentry><term>requestsyntax</term>
274 This element specifies the request syntax to be used in queries. It only
275 makes sense for Z39.50-type targets.
280 <varlistentry><term>nativesyntax</term>
283 This element specifies the native syntax and encoding of the
284 result records. The default is XML. The following attributes
288 <varlistentry><term>name</term>
291 The name of the syntax. Currently recognized values are
292 'iso2709' (MARC), and 'xml'.
297 <varlistentry><term>format</term>
300 The format, or schema, to be expected. Default is
306 <varlistentry><term>encoding</term>
309 The encoding of the response record. Typical values for
310 MARC records are 'marc8' (general MARC-8), 'marc8s'
311 (MARC-8, but maps to precomposed UTF-8 characters, more
312 suitable for use in web browsers), 'latin1'.
317 <varlistentry><term>mapto</term>
320 Specifies the flavor of MARCXML to map results to.
321 Default is 'marcxml'. 'marcxchange' is also possible, and
322 useful for Danish DANMARC records.
326 </variablelist> <!-- parameters to nativesyntax directive -->
329 </variablelist> <!-- sub-elements in retrievalprofile -->
334 <refsect1><title>EXAMPLE</title>
335 <para>Below is a working example configuration:
337 <?xml version="1.0" encoding="UTF-8"?>
338 <pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
341 <listen port="9004"/>
342 <proxy host="us1.indexdata.com" myurl="us1.indexdata.com"/>
344 <!-- <zproxy host="localhost" port="9000"/> -->
345 <!-- <zproxy host="localhost:9000"/> -->
346 <!-- <zproxy port="9000"/> -->
349 <metadata name="title" brief="yes" sortkey="skiparticle" merge="longest" rank="6"/>
350 <metadata name="isbn" merge="unique"/>
351 <metadata name="date" brief="yes" sortkey="numeric" type="year" merge="range"
353 <metadata name="author" brief="yes" termlist="yes" merge="longest" rank="2"/>
354 <metadata name="subject" merge="unique" termlist="yes" rank="3"/>
355 <metadata name="url" merge="unique"/>
359 <queryprofile/> <!-- Like a CCL profile++ . Can optionally refer to XSLT to
360 convert ZeeRex into queryprofile. Multiple profiles can exist. -->
363 <requestsyntax>marc21</requestsyntax>
364 <nativesyntax name="iso2709" format="marc21" encoding="marc8s" mapto="marcxml"/>
365 <map type="xslt" stylesheet="marc21.xsl"/>
373 <refsect1><title>TARGET SETTINGS</title>
375 Pazpar2 features a cunning scheme by which you can associate various
376 kinds of attributes, or settings with search targets. This is done
377 through XML files; each file can associate one or more settings
378 with one or more targets. The file format is generic in nature,
379 designed to support a wide range of application requirements. The
380 settings can be purely technical things, like, how to perform a title
381 search against a given target, or it can associate arbitrary name=value
382 pairs with groups of targets -- for instance, if you would like to
383 place all commercial full-text bases in one group for selection
384 purposes, or you would like to control what targets are accessible to a
389 During startup, pazpar2 will recursively read a specified directory
390 (can be identified in the pazpar2.cfg file or on the command line), and
391 process any settings files found therein.
394 <refsect2><title>SETTINGS FILE FORMAT</title>
396 Each file contains a root element named <settings>. It may
397 contain one or more <set> elements. The settings and set
398 elements may contain the following attributes. Attributes in set
399 overrides those in the setting root element. Each set node must
400 specify (directly, or inherited from the parent node) at least a
401 target, name, and value.
409 This specifies the search target to which this setting should be
410 applied. Targets are identified by their Z39.50 URL, generally
411 including the host, port, and database name, (e.g.
412 bagel.indexdata.com:210/marc). Two wildcard forms are accepted:
413 * (asterisk) matches all known targets;
414 bagel.indexdata.com:210/* matches all known databases on the given
418 A precedence system determines what happens if there are
419 overlapping values for the same setting name for the same
420 target. A setting for a specific target name overrides a
421 setting whch specifies target using a wildcard. This makes it
422 easy to set defaults for all targets, and then override them
423 for specific targets or hosts. If there are
424 multiple overlapping settings with the same name and target
425 value, the 'precedence' attribute determines what happens.
433 This specifies the user ID to which this setting applies. A
434 given setting may have values for any number of users, or it
435 may have a 'default' value which is applied when no user is
436 specified, or when no user-specific value is available.
444 The name of the setting. This can be anything you like.
445 However, pazpar2 reserves a number of setting names for
446 specific purposes, all starting with 'pz:', and it is a good
447 idea to avoid that prefix if you make up your own setting
448 names. See below for a list of reserved variables.
456 The value of the setting. Generally, this can be anything you
457 want -- however, some of the reserved settings may expect
458 specific kinds of values.
463 <term>precedence</term>
466 This should be an integer. If not provided, the default value
467 is 0. If two (or more) settings have the same content for
468 target and name, the precedence value determines the outcome.
469 If both settings have the same precedence value, they are both
470 applied to the target(s). If one has a higher value, then the
471 value of that setting is applied, and the other one is ignored.
478 By setting defaults for user, target, name, or value in the root
479 settings node, you can use the settings files in many different
480 ways. For instance, you can use a single file to set defaults for
481 many different settings, like search fields, retrieval syntaxes,
482 etc. You can have one file per server, which groups settings for
483 that server or target. You could also have one file which associates
484 a number of targets with a given setting, for instance, to associate
485 many databases with a given category or class that makes sense
486 within your application.
491 <refsect2><title>RESERVED SETTING NAMES</title>
493 The following setting names are reserved by pazpar2 to control the
494 behavior of the client function.
499 <term>pz:cclmap:xxx</term>
502 This establishes a CCL field definition or other setting, for
503 the purpose of mapping end-user queries. XXX is the field or
504 setting name, and the value of the setting provides parameters
505 (e.g. parameters to send to the server, etc.). Please consult
506 the YAZ manual for a full overview of the many capabilities of
507 the powerful and flexible CCL parser.
510 Note that it is easy to etablish a set of default parameters,
511 and then override them individually for a given target.
516 <term>pz:syntax</term>
519 This specifies the record syntax to use when requesting
520 records from a given server.
525 <term>pz:elements</term>
528 The element set name to be used when retrieving records from a
538 <!-- Keep this comment at the end of the file
543 sgml-minimize-attributes:nil
544 sgml-always-quote-attributes:t
547 sgml-parent-document:nil
548 sgml-local-catalogs: nil
549 sgml-namecase-general:t