X-Git-Url: http://lists.indexdata.com/cgi-bin?a=blobdiff_plain;ds=sidebyside;f=doc%2Fpazpar2_conf.xml;h=1eb6fde446bdeefe65cc77f497f4c4d8c11f1ab9;hb=9bf6cd2cfa9d558b77fef6cc320964d7e7f5fc6e;hp=f81464a8c66ef9e509c5ea8751293833398ab247;hpb=f8892da7570d4e365d36d2de128cc581f5240980;p=pazpar2-moved-to-github.git
diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml
index f81464a..1eb6fde 100644
--- a/doc/pazpar2_conf.xml
+++ b/doc/pazpar2_conf.xml
@@ -1,6 +1,6 @@
-
%local;
@@ -13,10 +13,13 @@
Pazpar2&version;
+ Index Data
+
Pazpar2 conf5
+ File formats and conventions
@@ -48,19 +51,35 @@
FORMAT
- The configuration file is XML-structured. It must be valid XML. All
+ The configuration file is XML-structured. It must be well-formed XML. All
elements specific to Pazpar2 should belong to the namespace
http://www.indexdata.com/pazpar2/1.0
(this is assumed in the
- following examples). The root element is named pazpar2.
+ following examples). The root element is named "pazpar2".
Under the root element are a number of elements which group categories of
information. The categories are described below.
+ threads
+
+ This section is optional and is supported for Pazpar2 version 1.3.1 and
+ later . It is identified by element "threads" which
+ may include one attribute "number" which specifies
+ the number of worker-threads that the Pazpar2 instance is to use.
+ A value of 0 (zero) disables worker-threads (all work is carried out
+ in main thread).
+
+ server
- This section governs overall behavior of the client. The data
- elements are described below.
+ This section governs overall behavior of a server endpoint. It is identified
+ by the element "server" which takes an optional attribute, "id", which
+ identifies this particular Pazpar2 server. Any string value for "id"
+ may be given.
+
+ The data
+ elements are described below. From Pazpar2 version 1.2 this is
+ a repeatable element.
@@ -100,91 +119,30 @@
-
-
- relevance
-
-
- Specifies ICU tokenization and normalization rules
- for tokens that are used in Pazpar2's relevance ranking. The 'id'
- attribute is currently not used, and the 'locale'
- attribute must be set to one of the locale strings
- defined in ICU. The child elements listed below can be
- in any order, except the 'index' element which logically
- belongs to the end of the list. The stated tokenization,
- normalization and charmapping instructions are performed
- in order from top to bottom.
-
-
- casemap
-
-
- The attribute 'rule' defines the direction of the
- per-character casemapping, allowed values are "l"
- (lower), "u" (upper), "t" (title).
-
-
-
- normalize
-
-
- Normalization and transformation of tokens follows
- the rules defined in the 'rule' attribute. For
- possible values we refer to the extensive ICU
- documentation found at the
- ICU
- transformation home page. Set filtering
- principles are explained at the
- ICU set and
- filtering page.
-
-
-
- tokenize
-
-
- Tokenization is the only rule in the ICU chain
- which splits one token into multiple tokens. The
- 'rule' attribute may have the following values:
- "s" (sentence), "l" (line-break), "w" (word), and
- "c" (character), the later probably not being
- very useful in a pruning Pazpar2 installation.
-
-
-
- index
-
-
- Finally the 'index' element instruction - without
- any 'rule' attribute - is used to store the tokens
- after chain processing in the relevance ranking
- unit of Pazpar2. It will always be the last
- instruction in the chain.
-
-
-
-
-
-
- sort
+ relevance / sort / mergekey
- Specifies ICU tokenization and normalization rules
- for tokens that are used in Pazpar2's sorting. The contents
- is similar to that of relevance.
+ Specifies character set normalization for relevancy / sorting
+ and the mergekey - for the server. These definitions serves as
+ default for services that don't have these given. For the meaning
+ of these settings refer to the "relevance" element inside service.
- mergekey
+ settings
- Specifies ICU tokenization and normalization rules
- for tokens that are used in Pazpar2's mergekey. The contents
- is similar to that of relevance.
+ Specifies target settings for the server.. These settings serves
+ as default for all services which don't have these given.
+ The settings element requires one attribute 'src' which specifies
+ a settings file or a directory . If a directory is given all
+ files with suffix .xml is read from this
+ directory. Refer to
+ for more information.
@@ -200,7 +158,16 @@
extraction of data from the internal representation, primarily
through the 'metadata' sub-element.
-
+
+ Pazpar2 version 1.2 and later allows multiple service elements.
+ Multiple services must be given a unique ID by specifying
+ attribute id.
+ A single service may be unnamed (service ID omitted). The
+ service ID is referred to in the
+ init webservice
+ command's service parameter.
+
+
metadata
@@ -208,9 +175,9 @@
One of these elements is required for every data element in
the internal representation of the record (see
. It governs
- subsequent processing as pertains to sorting, relevance
- ranking, merging, and display of data elements. It supports
- the following attributes:
+ subsequent processing as pertains to sorting, relevance
+ ranking, merging, and display of data elements. It supports
+ the following attributes:
@@ -308,7 +275,30 @@
longest element (strlen), 'range' (calculate a range
of values across all matching records), 'all' (include
all elements), or 'no' (don't merge; this is the
- default);
+ default);
+
+
+
+
+ mergekey
+
+
+ If set to 'required', the value of this
+ metadata element is appended to the resulting mergekey if
+ the metadata is present in a record instance.
+ If the metadata element is not present, the a unique mergekey
+ will be generated instead.
+
+
+ If set to 'optional', the value of this
+ metadata element is appended to the resulting mergekey if the
+ the metadata is present in a record instance. If the metadata
+ is not present, it will be empty.
+
+
+ If set to 'no' or the mergekey attribute is
+ omitted, the metadata will not be used in the creation of a
+ mergekey.
@@ -316,75 +306,208 @@
setting
- This attribute allows you to make use of static database
- settings in the processing of records. Three possible values
- are allowed. 'no' is the default and doesn't do anything.
- 'postproc' copies the value of a setting with the same name
- into the output of the normalization stylesheet(s). 'parameter'
- makes the value of a setting with the same name available
- as a parameter to the normalization stylesheet, so you
- can further process the value inside of the stylesheet, or use
- the value to decide how to deal with other data values.
+ This attribute allows you to make use of static database
+ settings in the processing of records. Three possible values
+ are allowed. 'no' is the default and doesn't do anything.
+ 'postproc' copies the value of a setting with the same name
+ into the output of the normalization stylesheet(s). 'parameter'
+ makes the value of a setting with the same name available
+ as a parameter to the normalization stylesheet, so you
+ can further process the value inside of the stylesheet, or use
+ the value to decide how to deal with other data values.
+ The purpose of using settings in this way can either be to
+ control the behavior of normalization stylesheet in a database-
+ dependent way, or to easily make database-dependent values
+ available to display-logic in your user interface, without having
+ to implement complicated interactions between the user interface
+ and your configuration system.
- The purpose of using settings in this way can either be to
- control the behavior of normalization stylesheet in a database-
- dependent way, or to easily make database-dependent values
- available to display-logic in your user interface, without having
- to implement complicated interactions between the user interface
- and your configuration system.
+
+
+
+ relevance
+
+
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's relevance ranking.
+ The 'id' attribute is currently not used, and the 'locale'
+ attribute must be set to one of the locale strings
+ defined in ICU. The child elements listed below can be
+ in any order, except the 'index' element which logically
+ belongs to the end of the list. The stated tokenization,
+ transformation and charmapping instructions are performed
+ in order from top to bottom.
+
+
+ casemap
+
+
+ The attribute 'rule' defines the direction of the
+ per-character casemapping, allowed values are "l"
+ (lower), "u" (upper), "t" (title).
+
+
+
+ transform
+
+
+ Normalization and transformation of tokens follows
+ the rules defined in the 'rule' attribute. For
+ possible values we refer to the extensive ICU
+ documentation found at the
+ ICU
+ transformation home page. Set filtering
+ principles are explained at the
+ ICU set and
+ filtering page.
+
+
+
+ tokenize
+
+
+ Tokenization is the only rule in the ICU chain
+ which splits one token into multiple tokens. The
+ 'rule' attribute may have the following values:
+ "s" (sentence), "l" (line-break), "w" (word), and
+ "c" (character), the later probably not being
+ very useful in a pruning Pazpar2 installation.
+
+
+
+
+
+ From Pazpar2 version 1.1 the ICU wrapper from YAZ is used.
+ Refer to the yaz-icu
+ utility for more information.
+
+
+
+
+
+ sort
+
+
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's sorting. The contents
+ is similar to that of relevance.
+
+
+
+
+
+ mergekey
+
+
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's mergekey. The contents
+ is similar to that of relevance.
+
+
+
+
+
+ settings
+
+
+ Specifies target settings for this service. Refer to
+ .
+
+
+
+
+
+ timeout
+
+
+ Specifies timeout parameters for this service.
+ The timeout
+ element supports the following attributes:
+ session, z3950_operation,
+ z3950_session which specifies
+ 'session timeout', 'Z39.50 operation timeout',
+ 'Z39.50 session timeout' respectively. The Z39.50 operation
+ timeout is the time Pazpar2 will wait for an active Z39.50/SRU
+ operation before it gives up (times out). The Z39.50 session
+ time out is the time Pazpar2 will keep the session alive for
+ an idle session (no operation).
+
+
+ The following is recommended but not required:
+ z3950_operation (30) < session (60) < z3950_session (180) .
+ The default values are given in parantheses.
+
+
+
+
+
-
+
EXAMPLEBelow is a working example configuration:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ]]>
-
+
+ INCLUDE FACILITY
+
+ The XML configuration may be partitioned into multiple files by using
+ the include element which takes a single attribute,
+ src. The of the src attribute is
+ regular Shell like glob-pattern. For example,
+
+ ]]>
+
+
+ The include facility requires Pazpar2 version 1.2.
+
+
+
TARGET SETTINGS
Pazpar2 features a cunning scheme by which you can associate various
@@ -420,7 +543,9 @@
environment, where different end-users may need to be represented to
some search targets in different ways. This, again, can be managed
using an external database or other lookup mechanism. Setting overrides
- can be performed either using the 'init' or the 'settings' webservice
+ can be performed either using the
+ init or the
+ settings webservice
command.
@@ -433,8 +558,10 @@
Finally, as an extreme case of this, the webservice client can
- introduce entirely new targets, on the fly, as part of the init or
- settings command. This is useful if you desire to manage information
+ introduce entirely new targets, on the fly, as part of the
+ init or
+ settings command.
+ This is useful if you desire to manage information
about your search targets in a separate application such as a database.
You do not need any static settings file whatsoever to run Pazpar2 -- as
long as the webservice client is prepared to supply the necessary
@@ -671,7 +798,7 @@
-
+ pz:requestsyntax
@@ -707,12 +834,27 @@
pz:nativesyntax
- The representation (syntax) of the retrieval records. Currently
- recognized values are iso2709 and xml.
+ Specifies how Pazpar2 shoule map retrieved records to XML. Currently
+ supported values are xml,
+ iso2709 and txml.
- For iso2709, can also specify a native character set, e.g. "iso2709;latin-1".
- If no character set is provided, MARC-8 is assumed.
+ The value iso2709 makes Pazpar2 convert retrieved
+ MARC records to MARCXML. In order to convert to XML, the exact
+ chacater set of the MARC must be known (if not, the resulting
+ XML is probably not well-formed). The character set may be
+ specified by adding:
+ ;charset=charset to
+ iso2709. If omitted, a charset of
+ MARC-8 is assumed. This is correct for most MARC21/USMARC records.
+
+
+ The value txml is like iso2709
+ except that records are converted to TurboMARC instead of MARCXML.
+
+
+ The value xml is used if Pazpar2 retrieves
+ records that are already XML (no conversion takes place).
@@ -729,11 +871,48 @@
+ pz:negotiation_charset
+
+
+ Sets character set for Z39.50 negotiation. Most targets do not support
+ this, and some will even close connection if set (crash on server
+ side or similar). If set, you probably want to set it to
+ UTF-8.
+
+
+
+
+ pz:xslt
- Provides the path of an XSLT stylesheet which will be used to
- map incoming records to the internal representation.
+ Is a comma separated list of of files that specifies
+ how to convert incoming records to the internal representation.
+
+
+ The suffix of each file specifies the kind of tranformation.
+ Suffix ".xsl" makes an XSL transform. Suffix
+ ".mmap" will use the MMAP transform (described below).
+
+
+ The special value "auto" will use a file
+ which is the pz:requestsyntax's
+ value followed by
+ '.xsl'.
+
+
+ When mapping MARC records, XSLT can be bypassed for increased
+ performance with the alternate "MARC map" format. Provide the
+ path of a file with extension ".mmap" containing on each line:
+
+ <field> <subfield> <metadata element>
+ For example:
+
+ 245 a title
+ 500 $ description
+ 773 * citation
+ To map the field value specify a subfield of '$'. To store a
+ concatenation of all subfields, specify a subfield of '*'.
@@ -787,7 +966,7 @@
-
+
pz:apdulog
@@ -797,56 +976,149 @@
+
+
+ pz:sru
+
+
+ This setting enables
+ SRU/SOLR
+ support.
+ It has four possible settings.
+ 'get', enables SRU access through GET requests. 'post' enables SRU/POST
+ support, less commonly supported, but useful if very large requests are
+ to be submitted. 'srw' enables the SRW (SRU over SOAP) variation of
+ the protocol.
+
+
+ A value of 'solr' anables SOLR client support. This is supported
+ for Pazpar version 1.5.0 and later.
+
+
+
+
+
+ pz:sru_version
+
+
+ This allows SRU version to be specified. If unset Pazpar2
+ will the default of YAZ (currently 1.2). Should be set
+ to 1.1 or 1.2. For SOLR, the current supported/tested version is 1.4
+
+
+
+
+
+ pz:pqf_prefix
+
+
+ Allows you to specify an arbitrary PQF query language substring.
+ The provided string is prefixed the user's query after it has been
+ normalized to PQF internally in pazpar2.
+ This allows you to attach complex 'filters' to queries for a given
+ target, sometimes necessary to select sub-catalogs
+ in union catalog systems, etc.
+
+
+
+
+
+ pz:pqf_strftime
+
+
+ Allows you to extend a query with dates and operators.
+ The provided string allows certain substitutions and serves as a
+ format string.
+ The special two character sequence '%%' gets converted to the
+ original query. Other characters leading with the percent sign are
+ conversions supported by strftime.
+ All other characters are copied verbatim. For example, the string
+ @and @attr 1=30 @attr 2=3 %Y %%
+ would search for current year combined with the original PQF (%%).
+
+
+
+
+
+ pz:sort
+
+
+ Specifies sort criteria to be applied to the result set.
+ Only works for targets which support the sort service.
+
+
+
+
+
+ pz:recordfilter
+
+
+ Specifies a filter which allows Pazpar2 to only include
+ records that meet a certain criteria in a result. Unmatched records
+ will be ignored. The filter takes the form name, name~value, or name=value, which
+ will include only records with metadata element (name) that has the
+ substring (~value) given, or matches exactly (=value). If value is omitted all records
+ with the named
+ metadata element present will be included.
+
+
+
+
+
+ pz:termlist_term_count
+
+
+ Specifies that the target should return up to n terms for each facets (where termlist="yes"). This implies
+ that the target can return facets on the search command. Requesting facets on targets that doesn't,
+ will return unpredictable or error result.
+
+
+
- pz:sru
-
-
- This setting enables SRU/SRW support. It has three possible settings.
- 'get', enables SRU access through GET requests. 'post' enables SRU/POST
- support, less commonly supported, but useful if very large requests are
- to be submitted. 'srw' enables the SRW variation of the protocol.
-
-
+ pz:termlist_term_sort
+
+
+ Specifies how the terms should be sorted. (Not yet implemented)
+
+
- pz:sru_version
-
-
- This allows SRU version to be specified. If unset Pazpar2
- will the default of YAZ (currently 1.2). Should be set
- to 1.1 or 1.2.
-
-
+ pz:preferred
+
+
+ Specifies that a target is preferred, e.g. possible local, faster target. Using block=pref on show command
+ will wait for all these targets to return records before releasing the block. If no target is preferred,
+ the block=pref will identical to block=1, which release when one target has returned records.
+
+
- pz:pqf_prefix
-
-
- Allows you to specify an arbitrary PQF query language substring. The provided
- string is prefixed the user's query after it has been normalized to PQF
- internally in pazpar2. This allows you to attach complex 'filters' to
- queries for a gien target, sometimes necessary to select sub-catalogs
- in union catalog systems, etc.
-
-
+ pz:block_timeout
+
+
+ (Not yet implemented). Specifies the time for which a block should be released anyway.
+
+
+
-
+
+
SEE ALSO
- Pazpar2:
pazpar28
-
-
- Pazpar2 protocol:
+
+ yaz-icu
+ 1
+ pazpar2_protocol7