- As you can see, there isn't much to it. There are really only a few
- important elements to this file.
- </para>
-
- <para>
- Elements should belong to the namespace
- http://www.indexdata.com/pazpar2/1.0. If the root node contains the
- attribute 'mergekey', then every record that generates the same
- merge key (normalized for case differences, white space, and
- truncation) will be joined into a cluster. In other words, you
- decide how records are merged. If you don't include a merge key,
- records are never merged. The 'metadata' elements provide the meat
- of the elements -- the content. the 'type' attribute is used to
- match each element against processing rules that determine what
- happens to the data element next.
- </para>
-
- <para>
- The next processing step is the extraction of metadata from the
- intermediate representation of the record. This is governed by the
- 'metadata' elements in the 'service' section of the configuration
- file. See <xref linkend="config-server"/> for details. The metadata
- in the retrieval record ultimately drives merging, sorting, ranking,
- the extraction of browse facets, and display, all configurable.
- </para>
- </section>
-
- <section id="client">
- <title>Client development</title>
- <para>
- You can use pazpar2 from any environment that allows you to use
- webservices. The initial goal of the software was to support
- Ajax-based applications, but there literally are no limits to what
- you can do. You can use pazpar2 from Javascript, Flash, Java, etc.,
- on the browser side, and from any development environment on the
- server side, and you can pass session tokens and record IDs freely
- around between these environments to build sophisticated applications.
- Use your imagination.
- </para>
-
- <para>
- The webservice API of pazpar2 is described in detail in <xref
- linkend="pazpar2_protocol"/>.
- </para>
-
- <para>
- In brief, you use the 'init' command to create a session, a
- temporary workspace which carries information about the current
- search. You start a new search using the 'search' command. Once the
- search has been started, you can follow its progress using the
- 'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
- can be fetched using the 'record' command.
- </para>
- </section>
- </chapter> <!-- Using pazpar2 -->
-
- <reference id="reference">
- <title>Reference</title>
- <partintro>
+ As you can see, there isn't much to it. There are really only a few
+ important elements to this file.
+ </para>
+
+ <para>
+ Elements should belong to the namespace
+ <literal>http://www.indexdata.com/pazpar2/1.0</literal>.
+ If the root node contains the
+ attribute 'mergekey', then every record that generates the same
+ merge key (normalized for case differences, white space, and
+ truncation) will be joined into a cluster. In other words, you
+ decide how records are merged. If you don't include a merge key,
+ records are never merged. The 'metadata' elements provide the meat
+ of the elements -- the content. the 'type' attribute is used to
+ match each element against processing rules that determine what
+ happens to the data element next. The attribute, 'rank' specifies
+ specifies a multipler for ranking for this element.
+ </para>
+
+ <para>
+ The next processing step is the extraction of metadata from the
+ intermediate representation of the record. This is governed by the
+ 'metadata' elements in the 'service' section of the configuration
+ file. See <xref linkend="config-server"/> for details. The metadata
+ in the retrieval record ultimately drives merging, sorting, ranking,
+ the extraction of browse facets, and display, all configurable.
+ </para>
+
+ <para>
+ Pazpar2 1.6.37 and later also allows already clustered records to
+ be ingested. Suppose a database already clusters for us and we would like
+ to keep that cluster for Pazpar2. In that case we can generate a
+ <literal>cluster</literal> wrapper element that holds individual
+ <literal>record</literal> elements.
+ </para>
+ <para>
+ Cluster record example:
+ <screen><![CDATA[
+ <cluster xmlns="http://www.indexdata.com/pazpar2/1.0">
+ <record>
+ <metadata type="title" rank="2">The Shining</metadata>
+ <metadata type="author">King, Stephen</metadata>
+ <metadata type="kind">ebook</metadata>
+ </record>
+ <record>
+ <metadata type="title" rank="2">The Shining</metadata>
+ <metadata type="author">King, Stephen</metadata>
+ <metadata type="kind">audio</metadata>
+ </record>
+ </cluster>
+ ]]></screen>
+ </para>
+ </section>
+
+ <section id="client">
+ <title>Client development overview</title>
+ <para>
+ You can use Pazpar2 from any environment that allows you to use
+ webservices. The initial goal of the software was to support
+ Ajax-based applications, but there literally are no limits to what
+ you can do. You can use Pazpar2 from Javascript, Flash, Java, etc.,
+ on the browser side, and from any development environment on the
+ server side, and you can pass session tokens and record IDs freely
+ around between these environments to build sophisticated applications.
+ Use your imagination.
+ </para>
+
+ <para>
+ The webservice API of Pazpar2 is described in detail in <xref
+ linkend="pazpar2_protocol"/>.
+ </para>
+
+ <para>
+ In brief, you use the 'init' command to create a session, a
+ temporary workspace which carries information about the current
+ search. You start a new search using the 'search' command. Once the
+ search has been started, you can follow its progress using the
+ 'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
+ can be fetched using the 'record' command.
+ </para>
+ </section>
+
+ §-ajaxdev;
+
+ <section id="unicode">
+ <title>Unicode Compliance</title>
+ <para>
+ Pazpar2 is Unicode compliant and language and locale aware but relies
+ on character encoding for the targets to be specified correctly if
+ the targets themselves are not UTF-8 based (most aren't).
+ Just a few bad behaving targets can spoil the search experience
+ considerably if for example Greek, Russian or otherwise non 7-bit ASCII
+ search terms are entered. In these cases some targets return
+ records irrelevant to the query, and the result screens will be
+ cluttered with noise.
+ </para>
+ <para>
+ While noise from misbehaving targets can not be removed, it can
+ be reduced using truly Unicode based ranking. This is an
+ option which is available to the system administrator if ICU
+ support is compiled into YAZ, see
+ <xref linkend="installation"/> for details.
+ </para>
+ <para>
+ In addition, the ICU tokenization and normalization rules must
+ be defined in the master configuration file described in
+ <xref linkend="config-server"/>.
+ </para>
+ </section>
+
+ <section id="load_balancing">
+ <title>Load balancing</title>
+ <para>
+ Just like any web server, Pazpar2, can be load balanced by a standard
+ hardware or software load balancer as long as the session stickiness
+ is ensured. If you are already running the Apache2 web server in front
+ of Pazpar2 and use the apache mod_proxy module to 'relay' client
+ requests to Pazpar2, this set up can be easily extended to include
+ load balancing capabilites.
+ To do so you need to enable the
+ <ulink url="http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html">
+ mod_proxy_balance
+ </ulink>
+ module in your Apache2 installation.
+ </para>
+
+ <para>
+ On a Debian based Apache 2 system, the relevant modules can
+ be enabled with:
+ <screen>
+ sudo a2enmod proxy_http
+ </screen>
+ </para>
+
+ <para>
+ The mod_proxy_balancer can pass all 'sessionsticky' requests to the
+ same backend worker as long as the requests are marked with the
+ originating worker's ID (called 'route'). If the Pazpar2 serverID is
+ configured (by setting an 'id' attribute on the 'server' element in
+ the Pazpar2 configuration file) Pazpar2 will append it to the
+ 'session' element returned during the 'init' in a mod_proxy_balancer
+ compatible manner.
+ Since the 'session' is then re-sent by the client (for all pazpar2
+ request besides 'init'), the balancer can use the marker to pass
+ the request to the right route. To do so the balancer needs to be
+ configured to inspect the 'session' parameter.
+ </para>
+
+ <example id="load_balancing.example">
+ <title>Apache 2 load balancing configuration</title>