From: Sebastian Hammer <quinn@indexdata.com>
Date: Fri, 19 Jan 2007 18:28:08 +0000 (+0000)
Subject: Updated documentation. This update may be unstable, as I can't presently test on... 
X-Git-Tag: stable.27032007~52
X-Git-Url: http://lists.indexdata.com/cgi-bin?a=commitdiff_plain;h=8f48376798d4b43d962726ef68f547cbd471d670;p=pazpar2-moved-to-github.git

Updated documentation. This update may be unstable, as I can't presently test on my laptop.
---

diff --git a/doc/book.xml b/doc/book.xml
index 7d28253..4ec781e 100644
--- a/doc/book.xml
+++ b/doc/book.xml
@@ -9,165 +9,369 @@
      <!ENTITY % common SYSTEM "common/common.ent">
      %common;
 ]>
-<!-- $Id: book.xml,v 1.4 2007-01-13 05:48:41 quinn Exp $ -->
+<!-- $Id: book.xml,v 1.5 2007-01-19 18:28:08 quinn Exp $ -->
 <book id="book">
- <bookinfo>
-  <title>Pazpar2 - User's Guide and Reference</title>
-  <author>
-   <firstname>Sebastian</firstname><surname>Hammer</surname>
-  </author>
-  <copyright>
-   <year>&copyright-year;</year>
-   <holder>Index Data</holder>
-  </copyright>
-  <abstract>
-   <simpara>
-    Pazpar2 - High-performance, user-interface independent, metasearching
-	  middleware featuring record merging, relevance ranking, and faceted search
-	  results.
-   </simpara>
-   <simpara>
-    This document is a guide and reference to Pazpar version &version;.
-   </simpara>
-   <simpara>
-    <inlinemediaobject>
-     <imageobject>
-      <imagedata fileref="common/id.png" format="PNG"/>
-     </imageobject>
-     <imageobject>
-      <imagedata fileref="common/id.eps" format="EPS"/>
-     </imageobject>
-    </inlinemediaobject>
-   </simpara>
-  </abstract>
- </bookinfo>
-
- <chapter id="introduction">
-  <title>Introduction</title>
-  <para>
-    Pazpar2 is a stand-alone package which implements
-    the best we know to do in terms of the core metasearching
-    functionality; that is, searching a number of databases in parallel,
-    merging, and analyzing the results. Additional functionality such as
-    user management, attractive displays are expected to be implemented by
-    applications that use pazpar2. Pazpar2 is user interface independent.
-    Its functionality is exposed through a simple REST-style webservice API,
-    designed to be simple to use from an Ajax-anbled browser, from a
-    higher-level server-side language like PHP or Java, or even from a Flash
-    application.
-  </para>
-  <para>
-    Once you launch a search in pazpar2, the operation continues behind the
-    scenes. Pazpar2 connects to servers, carries out searches, and
-    retrieves, deduplicates, and stores results internally. Your application
-    code may periodically inquire about the status of an ongoing operation,
-    and ask to see records or other result set facets.
-  </para>
-  <para>
-    Pazpar2 is designed to be highly configurable. Incoming records are
-    normalized to XML/UTF-8, and then further normalized using XSLT to a
-    simple internal representation that is suitable for analysis. By
-    providing XSLT stylesheets for different kinds of result records, you
-    can tune pazpar2 to work against different kinds of information
-    retrieval servers. Finally, metadata is extracted, in a configurable
-    way, from this internal record, to support display, merging, ranking,
-    result set facets, and sorting. Pazpar2 is not bound to a specific model
-    of metadata, such as DublinCore or MARC -- by providing the right
-    configuration, it can work with a number of different kinds of data in
-    support of many different applications.
-  </para>
-  <para>
-    Pazpar2 is designed to be efficient and scalable. You can set it up to
-    search several hundred targets in parallel, or you can use it to support
-    hundreds of concurrent users. It is implemented with the same attention
-    to performance and economy that we use in our indexing engines, so that
-    you can focus on building your application. You can devote all of your
-    attention to usability and let pazpar2 do what it does best -- search.
-   </para>
- </chapter>
-
- <chapter id="license">
-  <title>Pazpar2 License</title>
-  <para>To be decided and written.</para>
- </chapter>
- 
- <chapter id="installation">
-  <title>Installation</title>
-  <para>
-   Pazpar2 depends on the following tools/libraries:
-   <variablelist>
-    <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
-     <listitem>
-      <para>
-       The popular Z39.50 toolkit for the C language. YAZ must be
-       compiled with Libxml2/Libxslt support.
-      </para>
-     </listitem>
-    </varlistentry>
-   </variablelist>
-  </para>
-  <para>
-   In order to compile Pazpar2 an ANSI C compiler is
-   required. The requirements should be the same as for YAZ.
-  </para>
-
-  <section id="installation.unix">
-   <title>Installation on Unix (from Source)</title>
+  <bookinfo>
+   <title>Pazpar2 - User's Guide and Reference</title>
+   <author>
+    <firstname>Sebastian</firstname><surname>Hammer</surname>
+   </author>
+   <copyright>
+    <year>&copyright-year;</year>
+    <holder>Index Data</holder>
+   </copyright>
+   <abstract>
+    <simpara>
+       Pazpar2 is a high-performance, user interface-independent, data
+       model-independent metasearching
+       middleware featuring merging, relevance ranking, record sorting, 
+       and faceted results.
+    </simpara>
+    <simpara>
+     This document is a guide and reference to Pazpar version &version;.
+    </simpara>
+    <simpara>
+     <inlinemediaobject>
+      <imageobject>
+       <imagedata fileref="common/id.png" format="PNG"/>
+      </imageobject>
+      <imageobject>
+       <imagedata fileref="common/id.eps" format="EPS"/>
+      </imageobject>
+     </inlinemediaobject>
+    </simpara>
+   </abstract>
+  </bookinfo>
+
+  <chapter id="introduction">
+   <title>Introduction</title>
    <para>
-    Here is a quick step-by-step guide on how to compile the
-    tools that Pazpar2 uses. Only few systems have none of the required
-    tools binary packages. If, for example, Libxml2/libxslt are already
-    installed as development packages use these.
+     Pazpar2 is a stand-alone metasearch client with a webservice API, designed
+     to be used either from a browser-based client (JavaScript, Flash, Java,
+     etc.), from from server-side code, or any combination of the two.
+     Pazpar2 is a highly optimized client designed to
+     search many resources in parallel. It implements record merging,
+     relevance-ranking and sorting by arbitrary data content, and facet
+     analysis for browsing purposes. It is designed to be data model
+     independent, and is capable of working with MARC, DublinCore, or any
+     other XML-structured response format -- XSLT is used to normalize and extract
+     data from retrieval records for display and analysis. It can be used
+     against any server which supports the Z39.50 protocol. Proprietary
+     backend modules can be used to support a large number of other protocols
+     (please contact Index Data for further information about this).
    </para>
-   
    <para>
-    Ensure that the development libraries + header files are
-    available on your system before compiling Pazpar2. For installation
-    of YAZ, refer to the YAZ installation chapter.
+      Additional functionality such as
+     user management, attractive displays are expected to be implemented by
+     applications that use pazpar2. Pazpar2 is user interface independent.
+     Its functionality is exposed through a simple REST-style webservice API,
+     designed to be simple to use from an Ajax-enbled browser, Flash
+     animation, Java applet, etc., or from a higher-level server-side language
+     like PHP or Java. Because session information can be shared between
+     browser-based logic and your server-side scripting, there is tremendous
+     flexibility in how you implement your business logic on top of pazpar2.
    </para>
-   <screen>
-    gunzip -c pazpar2-version.tar.gz|tar xf -
-    cd pazpar2-version
-    ./configure
-    make
-    su
-    make install
-   </screen>
-  </section>
-
-  <section id="installation.debian">
-   <title>Installation on Debian GNU/Linux</title>
    <para>
-    All dependencies for Pazpar2 are available as 
-    <ulink url="&url.debian;">Debian</ulink>
-    packages for the sarge (stable in 2005) and etch (testing in 2005)
-    distributions.
+     Once you launch a search in pazpar2, the operation continues behind the
+     scenes. Pazpar2 connects to servers, carries out searches, and
+     retrieves, deduplicates, and stores results internally. Your application
+     code may periodically inquire about the status of an ongoing operation,
+     and ask to see records or other result set facets. Result become
+     available immediately, and it is easy to build end-user interfaces which
+     feel extremely responsive, even when searching more than 100 servers
+     concurrently.
    </para>
    <para>
-    The procedures for Debian based systems, such as
-    <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
+     Pazpar2 is designed to be highly configurable. Incoming records are
+     normalized to XML/UTF-8, and then further normalized using XSLT to a
+     simple internal representation that is suitable for analysis. By
+     providing XSLT stylesheets for different kinds of result records, you
+     can tune pazpar2 to work against different kinds of information
+     retrieval servers. Finally, metadata is extracted, in a configurable
+     way, from this internal record, to support display, merging, ranking,
+     result set facets, and sorting. Pazpar2 is not bound to a specific model
+     of metadata, such as DublinCore or MARC -- by providing the right
+     configuration, it can work with a number of different kinds of data in
+     support of many different applications.
    </para>
-   <screen>
-    apt-get install libyaz-dev
-   </screen>
    <para>
-    With these packages installed, the usual configure + make
-    procedure can be used for Pazpar2 as outlined in
-    <xref linkend="installation.unix"/>.
+     Pazpar2 is designed to be efficient and scalable. You can set it up to
+     search several hundred targets in parallel, or you can use it to support
+     hundreds of concurrent users. It is implemented with the same attention
+     to performance and economy that we use in our indexing engines, so that
+     you can focus on building your application, without worrying about the
+     details of metasearch logic. You can devote all of your attention to
+     usability and let pazpar2 do what it does best -- metasearch.
+    </para>
+    <para>
+      If you wish to connect to commercial or other databases which do not
+      support open standards, please contact Index Data. We have a licensing
+      agreement with a third party vendor which will enable pazpar2 to access
+      thousands of online databases, in addition the vast number of catalogs
+      and online services that support the Z39.50 protocol.
+    </para>
+    <para>
+      Pazpar2 is our attempt to re-think the traditional paradigms for
+      implementing and deploying metasearch logic, with an uncompromising
+      approach to performance, and attempting to make maximum use of the
+      capabilities of modern browsers. The demo user interface that
+      accompanies the distribution is but one example. If you think of new
+      ways of using pazpar2, we hope you'll share them with us, and if we
+      can provide assistance with regards to training, design, programming,
+      integration with different backends, hosting, or support, please don't
+      hesitate to contact us. If you'd like to see functionality in pazpar2
+      that is not there today, please don't hesitate to contact us. It may
+      already be in our development pipeline, or there might be a
+      possibility for you to help out by sponsoring development time or
+      code. Either way, get in touch and we will give you straight answers.
+    </para>
+    <para>
+      Enjoy!
+    </para>
+  </chapter>
+
+
+  <chapter id="license">
+   <title>Pazpar2 License</title>
+   <para>To be decided and written.</para>
+  </chapter>
+  
+  <chapter id="installation">
+   <title>Installation</title>
+   <para>
+    Pazpar2 depends on the following tools/libraries:
+    <variablelist>
+     <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
+      <listitem>
+       <para>
+	The popular Z39.50 toolkit for the C language. YAZ must be
+	compiled with Libxml2/Libxslt support.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
    </para>
-  </section>
- </chapter>
- 
- <reference id="reference">
-  <title>Reference</title>
-  <partintro>
    <para>
-    The material in this chapter is drawn directly from the individual
-    manual entries.
+    In order to compile Pazpar2 an ANSI C compiler is
+    required. The requirements should be the same as for YAZ.
    </para>
-  </partintro>
-  &manref;
- </reference>
+
+   <section id="installation.unix">
+    <title>Installation on Unix (from Source)</title>
+    <para>
+     Here is a quick step-by-step guide on how to compile the
+     tools that Pazpar2 uses. Only few systems have none of the required
+     tools binary packages. If, for example, Libxml2/libxslt are already
+     installed as development packages use these.
+    </para>
+    
+    <para>
+     Ensure that the development libraries + header files are
+     available on your system before compiling Pazpar2. For installation
+     of YAZ, refer to the YAZ installation chapter.
+    </para>
+    <screen>
+     gunzip -c pazpar2-version.tar.gz|tar xf -
+     cd pazpar2-version
+     ./configure
+     make
+     su
+     make install
+    </screen>
+   </section>
+
+   <section id="installation.debian">
+    <title>Installation on Debian GNU/Linux</title>
+    <para>
+     All dependencies for Pazpar2 are available as 
+     <ulink url="&url.debian;">Debian</ulink>
+     packages for the sarge (stable in 2005) and etch (testing in 2005)
+     distributions.
+    </para>
+    <para>
+     The procedures for Debian based systems, such as
+     <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
+    </para>
+    <screen>
+     apt-get install libyaz-dev
+    </screen>
+    <para>
+     With these packages installed, the usual configure + make
+     procedure can be used for Pazpar2 as outlined in
+     <xref linkend="installation.unix"/>.
+    </para>
+   </section>
+  </chapter>
+
+  <chapter id="using">
+    <title>Using pazpar2</title>
+    <para>
+      This chapter provides a general introduction to the use and deployment of pazpar2.
+    </para>
+
+    <section id="architecture">
+      <title>Pazpar2 and your systems architecture</title>
+      <para>
+	Pazpar2 is designed to provide asynchronous, behind-the-scenes
+	metasearching functionality to your application, exposing this
+	functionality using a simple webservice API that can be accessed
+	from any number of development environments. In particular, it is
+	possible to combine pazpar2 either with your server-side dynamic
+	website scripting, with scripting or code running in the browser, or
+	with any combination of the two. Pazpar2 is an excellent tool for
+	building advanced, Ajax-based user interfaces for metasearch
+	functionality, but it isn't a requirement -- you can choose to use
+	pazpar2 entirely as a backend to your regular server-side scripting.
+	When you do use pazpar2 in conjunction
+	with browser scripting (JavaScript/Ajax, Flash, applets, etc.), there are
+	special considerations.
+      </para>
+
+      <para>
+        Pazpar2 implements a simple but efficient HTTP server, and it is
+	designed to interact directly with scripting running in the browser
+	for the best possible performance, and to limit overhead when
+	several browser clients generate numerous webservice requests.
+	However, it is still desirable to use a conventional webserver,
+	such as Apache, to serve up graphics, HTML documents, and
+	server-side scripting. Because the security sandbox environment of
+	most browser-side programming environments only allows communication
+	with the server from which the enclosing HTML page or object
+	originated, pazpar2 is designed so that it can act as a transparent
+	proxy in front of an existing webserver (see <xref
+	linkend="pazpar2_conf"/> for details). In this mode, all regular
+	HTTP requests are transparently passed through to your webserver,
+	while pazpar2 only intercepts search-related webservice requests.
+      </para>
+
+      <para>
+        If you want to expose your combined service on port 80, you can
+	either run your regular webserver on a different port, a different
+	server, or a different IP address associated with the same server.
+      </para>
+
+      <para>
+        Sometimes, it may be necessary to implement functionality on your
+	regular webserver that makes use of search results, for example to
+	implement data import functionality, emailing results, history
+	lists, personal citation lists, interlibrary loan functionality
+	,etc. Fortunately, it is simple to exchange information between
+	pazpar2, your browser scripting, and backend server-side scripting.
+	You can send a session ID and possibly a record ID from your browser
+	code to your server code, and from there use pazpar2s webservice API
+	to access result sets or individual records. You could even 'hide'
+	all of pazpar2s functionality between your own API implemented on
+	the server-side, and access that from the browser or elsewhere. The
+	possibilities are just about endless.
+      </para>
+    </section>
+
+    <section id="data_model">
+      <title>Your data model</title>
+      <para>
+        Pazpar2 does not have a preconceived model of what makes up a data
+	model. There are no assumption that records have specific fields or
+	that they are organized in any particular way. The only assumption
+	is that data comes packaged in a form that the software can work
+	with (presently, that means XML or MARC), and that you can provide
+	the necessary information to massage it into pazpar2's internal
+	record abstraction.
+      </para>
+
+      <para>
+        Handling retrieval records in pazpar2 is a two-step process. First,
+	you decide which data elements of the source record you are
+	interested in, and you specify any desired massaging or combining of
+	elements using an XSLT stylesheet (MARC records are automatically
+	normalized to MARCXML before this step). If desired, you can run
+	multiple XSLT stylesheets in series to accomplish this, but the
+	output of the last one should be a representation of the record in a
+	schema that pazpar2 understands.
+      </para>
+
+      <para>
+        The intermediate, internal representation of the record looks like
+	this:
+	<screen><![CDATA[
+<record   xmlns="http://www.indexdata.com/pazpar2/1.0"
+	  mergekey="title The Shining author King, Stephen">
+
+    <metadata type="title">The Shining</metadata>
+
+    <metadata type="author">King, Stephen</metadata>
+
+    <metadata type="kind">ebook</metadata>
+
+    <!-- ... and so on -->
+</record>
+]]></screen>
+
+        As you can see, there isn't much to it. There are really only a few
+	important elements to this file.
+      </para>
+
+      <para>
+        Elements should belong to the namespace
+	http://www.indexdata.com/pazpar2/1.0. If the root node contains the
+	attribute 'mergekey', then every record that generates the same
+	merge key (normalized for case differences, white space, and
+	truncation) will be joined into a cluster. In other words, you
+	decide how records are merged. If you don't include a merge key,
+	records are never merged. The 'metadata' elements provide the meat
+	of the elements -- the content. the 'type' attribute is used to
+	match each element against processing rules that determine what
+	happens to the data element next.
+      </para>
+
+      <para>
+        The next processing step is the extraction of metadata from the
+	intermediate representation of the record. This is governed by the
+	'metadata' elements in the 'service' section of the configuration
+	file. See <xref linkend="config-server"/> for details. The metadata
+	in the retrieval record ultimately drives merging, sorting, ranking,
+	the extraction of browse facets, and display, all configurable.
+      </para>
+    </section>
+
+    <section id="client">
+      <title>Client development</title>
+      <para>
+        You can use pazpar2 from any environment that allows you to use
+	webservices. The initial goal of the software was to support
+	Ajax-based applications, but there literally are no limits to what
+	you can do. You can use pazpar2 from Javascript, Flash, Java, etc.,
+	on the browser side, and from any development environment on the
+	server side, and you can pass session tokens and record IDs freely
+	around between these environments to build sophisticated applications.
+	Use your imagination.
+      </para>
+
+      <para>
+        The webservice API of pazpar2 is described in detail in <xref
+	linkend="pazpar2_protocol"/>.
+      </para>
+
+      <para>
+        In brief, you use the 'init' command to create a session, a
+	temporary workspace which carries information about the current
+	search. You start a new search using the 'search' command. Once the
+	search has been started, you can follow its progress using the
+	'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
+	can be fetched using the 'record' command.
+      </para>
+    </section>
+  </chapter> <!-- Using pazpar2 -->
+
+  <reference id="reference">
+   <title>Reference</title>
+   <partintro>
+    <para>
+     The material in this chapter is drawn directly from the individual
+     manual entries.
+    </para>
+   </partintro>
+   &manref;
+  </reference>
 </book>
 
  <!-- Keep this comment at the end of the file
diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml
index 6deafa2..b8e86ea 100644
--- a/doc/pazpar2_conf.xml
+++ b/doc/pazpar2_conf.xml
@@ -8,7 +8,7 @@
      <!ENTITY % common SYSTEM "common/common.ent">
      %common;
 ]>
-<!-- $Id: pazpar2_conf.xml,v 1.2 2007-01-12 15:31:30 adam Exp $ -->
+<!-- $Id: pazpar2_conf.xml,v 1.3 2007-01-19 18:28:08 quinn Exp $ -->
 <refentry id="pazpar2_conf">
  <refentryinfo>
   <productname>Pazpar2</productname>
@@ -31,8 +31,284 @@
  </refsynopsisdiv>
  
  <refsect1><title>DESCRIPTION</title>
-  <para></para>
+   <para>
+     The pazpar2 configuration file, together with any referenced XSLT files,
+     govern pazpar2's behavior as a client, and control the normalization and
+     extraction of data elements from incoming result records, for the
+     purposes of merging, sorting, facet analysis, and display.
+    </para>
+
+    <para>
+      The file is specified using the option -f on the pazpar2 command line.
+      There is not presently a way to reload the configuration file without
+      restarting pazpar2, although this will most likely be added some time
+      in the future.
+    </para>
  </refsect1>
+
+ <refsect1><title>FORMAT</title>
+   <para>
+     The configuration file is XML-structured. It must be valid XML. All
+     elements specific to pazpar2 should belong to the namespace
+     "http://www.indexdata.com/pazpar2/1.0" (this is assumed in the
+     following examples). The root element is named 'pazpar2'. Under the
+     root element are a number of elements which group categories of
+     information. The categories are described below.
+    </para>
+
+    <refsect2 id="config-server"><title>server</title>
+      <para>
+        This section governs overall behavior of the client. The data
+	elements are described below.
+      </para>
+      <variablelist> <!-- level 1 -->
+        <varlistentry>
+	  <term>listen</term>
+	  <listitem>
+	    <para>
+	      Configures the webservice -- this controls how you can connect
+	      to pazpar2 from your browser or server-side code. The
+	      attributes 'host' and 'port' control the binding of the
+	      server. The 'host' attribute can be used to bind the server to
+	      a secondary IP address of your system, enabling you to run
+	      pazpar2 on port 80 alongside a conventional web server. You
+	      can override this setting on the command lineusing the option -h.
+	    </para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>proxy</term>
+	  <listitem>
+	    <para>
+	      If this item is given, pazpar2 will forward all incoming HTTP
+	      requests that do not contain the filename 'search.pz2' to the
+	      host and port specified using the 'host' and 'port'
+	      attributes. This functionality is crucial if you wish to use
+	      pazpar2 in conjunction with browser-based code (JS, Flash,
+	      applets, etc.) which operates in a security sandbox. Such code
+	      can only connect to the same server from which the enclosing
+	      HTML page originated. Pazpar2s proxy functionality enables you
+	      to host all of the main pages (plus images, CSS, etc) of your
+	      application on a conventional webserver, while efficiently
+	      processing webservice requests for metasearch status, results,
+	      etc.
+	    </para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term>service</term>
+	  <listitem>
+	    <para>
+	      This nested element controls the behavior of pazpar2 with
+	      respect to your data model. In pazpar2, incoming records are
+	      normalized, using XSLT, into an internal representation (see
+	      the <link
+	      id="config-retrievalprofile">retrievalprofile</link> secion.
+	      The 'service' section controls the further processing and
+	      extraction of data from the internal representation, primarily
+	      through the 'metdata' sub-element.
+	    </para>
+
+	    <variablelist> <!-- Level 2 -->
+	      <varlistentry><term>metadata</term>
+		<para>
+		  One of these elements is required for every data element in
+		  the internal representation of the record (see
+		  <xref linkend="data_model"/>. It governs
+		  subsequent processing as pertains to sorting, relevance
+		  ranking, merging, and display of data elements. It supports
+		  the following attributes:
+		</para>
+
+		<variablelist> <!-- level 3 -->
+		  <varlistentry><term>name</term>
+		    <listentry>
+		      <para>
+			This is the name of the data element. It is matched
+			against the 'type' attribute of the 'metadata' element
+			in the normalized record. A warning is produced if
+			metdata elements with an unknown name are found in the
+			normalized record. This name is also used to represent
+			data elements in the records returned by the
+			webservice API, and to name sort lists and browse
+			facets.
+		      </para>
+		    </listentry>
+		  </varlistentry>
+
+		  <varlistentry><term>type</term>
+		   <listentry>
+		      <para>
+			The type of data element. This value governs any
+			normalization or special processing that might take
+			place on an element. Possible values are 'generic'
+			(basic string), 'year' (a range is computed if
+			multiple years are found in the record). Note: This
+			list is likely to increase in the future.
+		      </para>
+		    </listentry>
+		  </varlistentry>
+
+		  <varlistentry><term>brief</term>
+		    <listentry>
+		      <para>
+			If this is set to 'yes', then the data element is
+			includes in brief records in the webservice API. Note
+			that this only makes sense for metadata elements that
+			are merged (see below). The default value is 'no'.
+		      </para>
+		    </listentry>
+		  </varlistentry>
+
+		  <varlistentry><term>sortkey</term>
+		    <listentry>
+		      <para>
+			Specifies that this data element is to be used for
+			sorting. The possible values are 'numeric' (numeric
+			value), 'skiparticle' (string; skip common, leading
+			articles), and 'no' (no sorting). The default value is
+			'no'.
+		      </para>
+		    </listentry>
+		  </varlistentry>
+
+		  <varlistentry><term>rank</term>
+		    <listentry>
+		      <para>
+			Specifies that this element is to be used to help rank
+			records against the user's query (when ranking is
+			requested). The value is an integer, used as a
+			multiplier against the basic TF*IDF score. A value of
+			1 is the base, higher values give additional weight to
+			elements of this type. The default is '0', which
+			excludes this element from the rank calculation.
+		      </para>
+		    </listentry>
+		  </varlistentry>
+
+		  <varlistentry><term>termlist</term>
+		    <listentry>
+		      <para>
+			Specifies that this element is to be used as a
+			termlist, or browse facet. Values are tabulated from
+			incoming records, and a highscore of values (with
+			their associated frequency) is made available to the
+			client through the webservice API. The possible values
+			are 'yes' and 'no' (default).
+		      </para>
+		    </listentry>
+		  </varlistentry>
+
+		  <varlistentry><term>merge</term>
+		    <listentry>
+		      <para>
+			This governs whether, and how elements are extracted
+			from individual records and merged into cluster
+			records. The possible values are: 'unique' (include
+			all unique elements), 'longest' (include only the
+			longest element (strlen), 'range' (calculate a range
+			of values across al matching records), 'all' (include
+			all elements), or 'no' (don't merge; this is the
+			default);
+		      </para>
+		    </listentry>
+		  </varlistentry>
+		</variablelist> <!-- attributes to metadata -->
+	      </varlistentry>
+	    </variablelist>     <!-- Data elements in service directive -->
+	  </listitem>
+	</varlistentry>
+      </variablelist>           <!-- Data elements in server directive -->
+    </refsect2>
+
+    <refsect2 id="config-queryprofile">
+      <para>
+        At the moment, this directive is ignored; there is one global
+	CCL-mapping file which governs the mapping of queries to Z39.50
+	type-1. This file is located in etc/default.bib. This will change
+	shortly.
+      </para>
+    </refsect2>
+
+    <refsect2 id="config-retrievalprofile">
+      <para>
+	Note: In the present version, there is a single retrieval
+	profile. However, in a future release, it will be possible to
+	associate unique retrieval profiles with different targets, or to
+	generate retrieval profiles using XSLT from the ZeeRex description of
+	a target.
+      </para>
+      
+      <para>
+        The following data elements are recognized for the retrievalprofile
+	directive:
+      </para>
+      
+      <variablelist>
+        <varlistentry><term>requestsyntax</term>
+	  <listitem>
+	    <para>
+	      This element specifies the request syntax to be used in queries. It only
+	      makes sense for Z39.50-type targets.
+	    </para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry><term>nativesyntax</term>
+	  <listitem>
+	    <para>
+	      This element specifies the native syntax and encoding of the
+	      result records. The default is XML. The following attributes
+	      are defined:
+	    </para>
+	    <variablelist>
+	      <varlistentry><term>name</term>
+	        <listitem>
+		  <para>
+		    The name of the syntax. Currently recognized values are
+		    'iso2709' (MARC), and 'xml'.
+		  </para>
+		</listitem>
+	      </varlistentry>
+
+	      <varlistentry><term>format</term>
+	        <listitem>
+		  <para>
+		    The format, or schema, to be expected. Default is
+		    'marc21'.
+		  </para>
+		</listitem>
+	      </varlistentry>
+
+	      <varlistentry><term>encoding</term>
+	        <listitem>
+		  <para>
+		    The encoding of the response record. Typical values for
+		    MARC records are 'marc8' (general MARC-8), 'marc8s'
+		    (MARC-8, but maps to precomposed UTF-8 characters, more
+		    suitable for use in web browsers), 'latin1'.
+		  </para>
+		</listitem>
+	      </varlistentry>
+
+	      <varlistentry><term>mapto</term>
+	        <listitem>
+		  <para>
+		    Specifies the flavor of MARCXML to map results to.
+		    Default is 'marcxml'. 'marcxchange' is also possible, and
+		    useful for Danish DANMARC records.
+		  </para>
+		</listitem>
+	      </varlistentry>
+	    </variablelist> <!-- parameters to nativesyntax directive -->
+	  </listitem>
+	</varlistentry>
+      </variablelist> <!-- sub-elements in retrievalprofile -->
+    </refsect2>
+
+  </refsect1>
  
  <refsect1><title>OPTIONS</title>
   <para></para>
diff --git a/doc/pazpar2_protocol.xml b/doc/pazpar2_protocol.xml
index 537d98d..404f6c3 100644
--- a/doc/pazpar2_protocol.xml
+++ b/doc/pazpar2_protocol.xml
@@ -8,7 +8,7 @@
      <!ENTITY % common SYSTEM "common/common.ent">
      %common;
 ]>
-<!-- $Id: pazpar2_protocol.xml,v 1.2 2007-01-12 15:21:04 adam Exp $ -->
+<!-- $Id: pazpar2_protocol.xml,v 1.3 2007-01-19 18:28:08 quinn Exp $ -->
 <refentry id="pazpar2_protocol">
  <refentryinfo>
   <productname>Pazpar2</productname>
@@ -27,12 +27,13 @@
  <refsect1><title>DESCRIPTION</title>
   <para>
    Webservice requests are any that refer to filename "search.pz2". Arguments
-   are GET-style parameters. Argument 'command' is required and specifies
-   command. Any request not recognized as a webservice request as described,
-   are forwarded to the HTTP server specified in configuration.
-   This way, the webserver can host the user interface (itself dynamic
-   or static HTML), and AJAX-style calls can be used from JS to interact
-   with the search logic. 
+   are GET-style parameters. Argument 'command' is always required and specifies
+   the operation to perform. Any request not recognized as a webservice
+   request is forwarded to the HTTP server specified in the configuration
+   using the proxy setting.
+   This way, a regular webserver can host the user interface (itself dynamic
+   or static HTML), and AJAX-style calls can be used from JS (or any other client-based
+   scripting environment) to interact with the search logic in pazpar2. 
   </para>
   <para>
    Each command is described in sub sections to follow.
@@ -108,7 +109,7 @@
    <para>
     Example:
     <screen><![CDATA[
-search.pz2?session=2044502273&command=search&query=computer
+search.pz2?session=2044502273&command=search&query=computer+science
 ]]>
      </screen>
     Response:
@@ -123,7 +124,7 @@ search.pz2?session=2044502273&command=search&query=computer
   <refsect2 id="command-stat">
    <title>stat</title>
    <para>
-    Provides status of ongoing search. Parameters:
+    Provides status information about an ongoing search. Parameters:
 
     <variablelist>
      <varlistentry>
@@ -147,7 +148,7 @@ search.pz2?session=2044502273&command=stat
 <stat>
   <activeclients>3</activeclients>
   <hits>7</hits>                   -- Total hitcount
-  <records>7</records>             -- Total number of records fetched
+  <records>7</records>             -- Total number of records fetched in last query
   <clients>1</clients>             -- Total number of associated clients
   <unconnected>0</unconnected>     -- Number of disconnected clients
   <connecting>0</connecting>       -- Number of clients in connecting state
@@ -180,7 +181,7 @@ search.pz2?session=2044502273&command=stat
       <term>start</term>
       <listitem>
        <para>First record to show - 0-indexed.</para>
-      </listitem>
+      </listitem
      </varlistentry>
      
      <varlistentry>
@@ -196,33 +197,47 @@ search.pz2?session=2044502273&command=stat
       <term>block</term>
       <listitem>
        <para>
-	If block is set, the command will hang until there are records ready
+	If block is set to 1, the command will hang until there are records ready
 	to display. Use this to show first records rapidly without
 	requiring rapid polling.
        </para>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term>sort</term>
+       <listitem>
+         <para>
+	   Specifies sort criteria. The argument is a comma-separated list
+	   (no whitespace allowed) of sort fields, with the highest-priority
+	   field first. A sort field may be followed by a colon followed by
+	   the number '0' or '1', indicating whether results should be sorted in
+	   increasing or decreasing order according to that field. 0==Decreasing is
+	   the default.
+	 </para>
+	</listitem>
+      </varlistentry>
+
     </variablelist>
    </para>
    <para>
     Example:
     <screen><![CDATA[
-search.pz2?session=2044502273&command=show&start=0&num=2
+search.pz2?session=2044502273&command=show&start=0&num=2&sort=title:1
 ]]></screen>
     Output:
     <screen><![CDATA[
 <show>
   <status>OK</status>
-  <activeclients>3</activeclients>
-  <merged>6</merged>
-  <total>7</total>
-  <start>0</start>
-  <num>2</num>
+  <activeclients>3</activeclients>     -- How many clients are still working
+  <merged>6</merged>                   -- Number of merged records
+  <total>7</total>                     -- Total of all hitcounts
+  <start>0</start>                     -- The start number you requested
+  <num>2</num>                         -- Number of records retrieved
   <hit>
     <md-title>How to program a computer, by Jack Collins</md-title>
-    <count>2</count> <!-- Number of merged records -->
-    <recid>6</recid>
+    <count>2</count>                   -- Number of merged records 
+    <recid>6</recid>                   -- Record ID for this record
   </hit>
   <hit>
     <md-title>
@@ -243,6 +258,15 @@ search.pz2?session=2044502273&command=show&start=0&num=2
 
     <variablelist>
      <varlistentry>
+      <term>session</term>
+      <listitem>
+       <para>
+	Session ID
+	</para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term>id</term>
       <listitem>
        <para>
@@ -326,14 +350,61 @@ Output:
     <screen><![CDATA[
 <term>
   <name>library2.mcmaster.ca</name>
-  <frequency>11734</frequency>
-  <state>Client_Idle</state>
-  <diagnostic>0</diagnostic>
+  <frequency>11734</frequency>         -- Number of hits
+  <state>Client_Idle</state>           -- See the description of 'bytarget' below
+  <diagnostic>0</diagnostic>           -- Z39.50 diagnostic codes
 </term>
 ]]></screen>
     </para>
   </refsect2>
 
+
+  <refsect2 id="command-bytarget">
+   <title>bytarget</title>
+   <para>
+    Returns information about the status of each active client. Parameters:
+
+    <variablelist>
+     <varlistentry>
+      <term>session</term>
+      <listitem>
+       <para>
+          Session Id.
+	</para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+   <para> 
+    Example:
+    <screen><![CDATA[
+search.pz2?session=605047297&command=record&id=3
+]]></screen>
+
+    Example output:
+    
+    <screen><![CDATA[
+<bytarget>
+  <status>OK</status>
+  <target>
+    <id>z3950.loc.gov/voyager/</id>
+    <hits>10000</hits>
+    <diagnostic>0</diagnostic>
+    <records>65</records>
+    <state>Client_Presenting</state>
+  </target>
+  <!-- ... more target nodes below as necessary -->
+</bytarget>
+    <screen><![CDATA[
+]]></screen>
+
+   The following client states are defined: Client_Connecting,
+   Client_Connected, Client_Idle, Client_Initializing, Client_Searching,
+   Client_Searching, Client_Presenting, Client_Error, Client_Failed,
+   Client_Disconnected, Client_Stopped.
+   </para>
+  </refsect2>
+
  </refsect1>
 </refentry>