doc/book.xml

   1 <?xml version="1.0" standalone="no"?>
   2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1//EN"
   3     "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
   4 [
   5      <!ENTITY % local SYSTEM "local.ent">
   6      %local;
   7      <!ENTITY % entities SYSTEM "entities.ent">
   8      %entities;
   9      <!ENTITY % common SYSTEM "common/common.ent">
  10      %common;
  11 ]>
  12 <!-- $Id: book.xml,v 1.6 2007-01-19 21:50:02 adam Exp $ -->
  13 <book id="book">
  14  <bookinfo>
  15   <title>Pazpar2 - User's Guide and Reference</title>
  16   <author>
  17    <firstname>Sebastian</firstname><surname>Hammer</surname>
  18   </author>
  19   <releaseinfo>&version;</releaseinfo>
  20   <copyright>
  21    <year>&copyright-year;</year>
  22    <holder>Index Data</holder>
  23   </copyright>
  24   <abstract>
  25    <simpara>
  26     Pazpar2 is a high-performance, user interface-independent, data
  27     model-independent metasearching
  28     middleware featuring merging, relevance ranking, record sorting,
  29     and faceted results.
  30    </simpara>
  31    <simpara>
  32     This document is a guide and reference to Pazpar version &version;.
  33    </simpara>
  34    <simpara>
  35     <inlinemediaobject>
  36      <imageobject>
  37       <imagedata fileref="common/id.png" format="PNG"/>
  38    </imageobject>
  39     <imageobject>
  40      <imagedata fileref="common/id.eps" format="EPS"/>
  41    </imageobject>
  42    </inlinemediaobject>
  43    </simpara>
  44   </abstract>
  45   </bookinfo>
  46
  47   <chapter id="introduction">
  48    <title>Introduction</title>
  49    <para>
  50      Pazpar2 is a stand-alone metasearch client with a webservice API, designed
  51      to be used either from a browser-based client (JavaScript, Flash, Java,
  52      etc.), from from server-side code, or any combination of the two.
  53      Pazpar2 is a highly optimized client designed to
  54      search many resources in parallel. It implements record merging,
  55      relevance-ranking and sorting by arbitrary data content, and facet
  56      analysis for browsing purposes. It is designed to be data model
  57      independent, and is capable of working with MARC, DublinCore, or any
  58      other XML-structured response format -- XSLT is used to normalize and extract
  59      data from retrieval records for display and analysis. It can be used
  60      against any server which supports the Z39.50 protocol. Proprietary
  61      backend modules can be used to support a large number of other protocols
  62      (please contact Index Data for further information about this).
  63    </para>
  64    <para>
  65       Additional functionality such as
  66      user management, attractive displays are expected to be implemented by
  67      applications that use pazpar2. Pazpar2 is user interface independent.
  68      Its functionality is exposed through a simple REST-style webservice API,
  69      designed to be simple to use from an Ajax-enbled browser, Flash
  70      animation, Java applet, etc., or from a higher-level server-side language
  71      like PHP or Java. Because session information can be shared between
  72      browser-based logic and your server-side scripting, there is tremendous
  73      flexibility in how you implement your business logic on top of pazpar2.
  74    </para>
  75    <para>
  76      Once you launch a search in pazpar2, the operation continues behind the
  77      scenes. Pazpar2 connects to servers, carries out searches, and
  78      retrieves, deduplicates, and stores results internally. Your application
  79      code may periodically inquire about the status of an ongoing operation,
  80      and ask to see records or other result set facets. Result become
  81      available immediately, and it is easy to build end-user interfaces which
  82      feel extremely responsive, even when searching more than 100 servers
  83      concurrently.
  84    </para>
  85    <para>
  86      Pazpar2 is designed to be highly configurable. Incoming records are
  87      normalized to XML/UTF-8, and then further normalized using XSLT to a
  88      simple internal representation that is suitable for analysis. By
  89      providing XSLT stylesheets for different kinds of result records, you
  90      can tune pazpar2 to work against different kinds of information
  91      retrieval servers. Finally, metadata is extracted, in a configurable
  92      way, from this internal record, to support display, merging, ranking,
  93      result set facets, and sorting. Pazpar2 is not bound to a specific model
  94      of metadata, such as DublinCore or MARC -- by providing the right
  95      configuration, it can work with a number of different kinds of data in
  96      support of many different applications.
  97    </para>
  98    <para>
  99      Pazpar2 is designed to be efficient and scalable. You can set it up to
 100      search several hundred targets in parallel, or you can use it to support
 101      hundreds of concurrent users. It is implemented with the same attention
 102      to performance and economy that we use in our indexing engines, so that
 103      you can focus on building your application, without worrying about the
 104      details of metasearch logic. You can devote all of your attention to
 105      usability and let pazpar2 do what it does best -- metasearch.
 106     </para>
 107     <para>
 108       If you wish to connect to commercial or other databases which do not
 109       support open standards, please contact Index Data. We have a licensing
 110       agreement with a third party vendor which will enable pazpar2 to access
 111       thousands of online databases, in addition the vast number of catalogs
 112       and online services that support the Z39.50 protocol.
 113     </para>
 114     <para>
 115       Pazpar2 is our attempt to re-think the traditional paradigms for
 116       implementing and deploying metasearch logic, with an uncompromising
 117       approach to performance, and attempting to make maximum use of the
 118       capabilities of modern browsers. The demo user interface that
 119       accompanies the distribution is but one example. If you think of new
 120       ways of using pazpar2, we hope you'll share them with us, and if we
 121       can provide assistance with regards to training, design, programming,
 122       integration with different backends, hosting, or support, please don't
 123       hesitate to contact us. If you'd like to see functionality in pazpar2
 124       that is not there today, please don't hesitate to contact us. It may
 125       already be in our development pipeline, or there might be a
 126       possibility for you to help out by sponsoring development time or
 127       code. Either way, get in touch and we will give you straight answers.
 128     </para>
 129     <para>
 130       Enjoy!
 131     </para>
 132   </chapter>
 133
 134
 135   <chapter id="license">
 136    <title>Pazpar2 License</title>
 137    <para>To be decided and written.</para>
 138   </chapter>
 139
 140   <chapter id="installation">
 141    <title>Installation</title>
 142    <para>
 143     Pazpar2 depends on the following tools/libraries:
 144     <variablelist>
 145      <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
 146       <listitem>
 147        <para>
 148         The popular Z39.50 toolkit for the C language. YAZ must be
 149         compiled with Libxml2/Libxslt support.
 150        </para>
 151       </listitem>
 152      </varlistentry>
 153     </variablelist>
 154    </para>
 155    <para>
 156     In order to compile Pazpar2 an ANSI C compiler is
 157     required. The requirements should be the same as for YAZ.
 158    </para>
 159
 160    <section id="installation.unix">
 161     <title>Installation on Unix (from Source)</title>
 162     <para>
 163      Here is a quick step-by-step guide on how to compile the
 164      tools that Pazpar2 uses. Only few systems have none of the required
 165      tools binary packages. If, for example, Libxml2/libxslt are already
 166      installed as development packages use these.
 167     </para>
 168
 169     <para>
 170      Ensure that the development libraries + header files are
 171      available on your system before compiling Pazpar2. For installation
 172      of YAZ, refer to the YAZ installation chapter.
 173     </para>
 174     <screen>
 175      gunzip -c pazpar2-version.tar.gz|tar xf -
 176      cd pazpar2-version
 177      ./configure
 178      make
 179      su
 180      make install
 181     </screen>
 182    </section>
 183
 184    <section id="installation.debian">
 185     <title>Installation on Debian GNU/Linux</title>
 186     <para>
 187      All dependencies for Pazpar2 are available as
 188      <ulink url="&url.debian;">Debian</ulink>
 189      packages for the sarge (stable in 2005) and etch (testing in 2005)
 190      distributions.
 191     </para>
 192     <para>
 193      The procedures for Debian based systems, such as
 194      <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
 195     </para>
 196     <screen>
 197      apt-get install libyaz-dev
 198     </screen>
 199     <para>
 200      With these packages installed, the usual configure + make
 201      procedure can be used for Pazpar2 as outlined in
 202      <xref linkend="installation.unix"/>.
 203     </para>
 204    </section>
 205   </chapter>
 206
 207   <chapter id="using">
 208     <title>Using pazpar2</title>
 209     <para>
 210       This chapter provides a general introduction to the use and deployment of pazpar2.
 211     </para>
 212
 213     <section id="architecture">
 214       <title>Pazpar2 and your systems architecture</title>
 215       <para>
 216         Pazpar2 is designed to provide asynchronous, behind-the-scenes
 217         metasearching functionality to your application, exposing this
 218         functionality using a simple webservice API that can be accessed
 219         from any number of development environments. In particular, it is
 220         possible to combine pazpar2 either with your server-side dynamic
 221         website scripting, with scripting or code running in the browser, or
 222         with any combination of the two. Pazpar2 is an excellent tool for
 223         building advanced, Ajax-based user interfaces for metasearch
 224         functionality, but it isn't a requirement -- you can choose to use
 225         pazpar2 entirely as a backend to your regular server-side scripting.
 226         When you do use pazpar2 in conjunction
 227         with browser scripting (JavaScript/Ajax, Flash, applets, etc.), there are
 228         special considerations.
 229       </para>
 230
 231       <para>
 232         Pazpar2 implements a simple but efficient HTTP server, and it is
 233         designed to interact directly with scripting running in the browser
 234         for the best possible performance, and to limit overhead when
 235         several browser clients generate numerous webservice requests.
 236         However, it is still desirable to use a conventional webserver,
 237         such as Apache, to serve up graphics, HTML documents, and
 238         server-side scripting. Because the security sandbox environment of
 239         most browser-side programming environments only allows communication
 240         with the server from which the enclosing HTML page or object
 241         originated, pazpar2 is designed so that it can act as a transparent
 242         proxy in front of an existing webserver (see <xref
 243         linkend="pazpar2_conf"/> for details). In this mode, all regular
 244         HTTP requests are transparently passed through to your webserver,
 245         while pazpar2 only intercepts search-related webservice requests.
 246       </para>
 247
 248       <para>
 249         If you want to expose your combined service on port 80, you can
 250         either run your regular webserver on a different port, a different
 251         server, or a different IP address associated with the same server.
 252       </para>
 253
 254       <para>
 255         Sometimes, it may be necessary to implement functionality on your
 256         regular webserver that makes use of search results, for example to
 257         implement data import functionality, emailing results, history
 258         lists, personal citation lists, interlibrary loan functionality
 259         ,etc. Fortunately, it is simple to exchange information between
 260         pazpar2, your browser scripting, and backend server-side scripting.
 261         You can send a session ID and possibly a record ID from your browser
 262         code to your server code, and from there use pazpar2s webservice API
 263         to access result sets or individual records. You could even 'hide'
 264         all of pazpar2s functionality between your own API implemented on
 265         the server-side, and access that from the browser or elsewhere. The
 266         possibilities are just about endless.
 267       </para>
 268     </section>
 269
 270     <section id="data_model">
 271       <title>Your data model</title>
 272       <para>
 273         Pazpar2 does not have a preconceived model of what makes up a data
 274         model. There are no assumption that records have specific fields or
 275         that they are organized in any particular way. The only assumption
 276         is that data comes packaged in a form that the software can work
 277         with (presently, that means XML or MARC), and that you can provide
 278         the necessary information to massage it into pazpar2's internal
 279         record abstraction.
 280       </para>
 281
 282       <para>
 283         Handling retrieval records in pazpar2 is a two-step process. First,
 284         you decide which data elements of the source record you are
 285         interested in, and you specify any desired massaging or combining of
 286         elements using an XSLT stylesheet (MARC records are automatically
 287         normalized to MARCXML before this step). If desired, you can run
 288         multiple XSLT stylesheets in series to accomplish this, but the
 289         output of the last one should be a representation of the record in a
 290         schema that pazpar2 understands.
 291       </para>
 292
 293       <para>
 294         The intermediate, internal representation of the record looks like
 295         this:
 296         <screen><![CDATA[
 297 <record   xmlns="http://www.indexdata.com/pazpar2/1.0"
 298           mergekey="title The Shining author King, Stephen">
 299
 300     <metadata type="title">The Shining</metadata>
 301
 302     <metadata type="author">King, Stephen</metadata>
 303
 304     <metadata type="kind">ebook</metadata>
 305
 306     <!-- ... and so on -->
 307 </record>
 308 ]]></screen>
 309
 310         As you can see, there isn't much to it. There are really only a few
 311         important elements to this file.
 312       </para>
 313
 314       <para>
 315         Elements should belong to the namespace
 316         http://www.indexdata.com/pazpar2/1.0. If the root node contains the
 317         attribute 'mergekey', then every record that generates the same
 318         merge key (normalized for case differences, white space, and
 319         truncation) will be joined into a cluster. In other words, you
 320         decide how records are merged. If you don't include a merge key,
 321         records are never merged. The 'metadata' elements provide the meat
 322         of the elements -- the content. the 'type' attribute is used to
 323         match each element against processing rules that determine what
 324         happens to the data element next.
 325       </para>
 326
 327       <para>
 328         The next processing step is the extraction of metadata from the
 329         intermediate representation of the record. This is governed by the
 330         'metadata' elements in the 'service' section of the configuration
 331         file. See <xref linkend="config-server"/> for details. The metadata
 332         in the retrieval record ultimately drives merging, sorting, ranking,
 333         the extraction of browse facets, and display, all configurable.
 334       </para>
 335     </section>
 336
 337     <section id="client">
 338       <title>Client development</title>
 339       <para>
 340         You can use pazpar2 from any environment that allows you to use
 341         webservices. The initial goal of the software was to support
 342         Ajax-based applications, but there literally are no limits to what
 343         you can do. You can use pazpar2 from Javascript, Flash, Java, etc.,
 344         on the browser side, and from any development environment on the
 345         server side, and you can pass session tokens and record IDs freely
 346         around between these environments to build sophisticated applications.
 347         Use your imagination.
 348       </para>
 349
 350       <para>
 351         The webservice API of pazpar2 is described in detail in <xref
 352         linkend="pazpar2_protocol"/>.
 353       </para>
 354
 355       <para>
 356         In brief, you use the 'init' command to create a session, a
 357         temporary workspace which carries information about the current
 358         search. You start a new search using the 'search' command. Once the
 359         search has been started, you can follow its progress using the
 360         'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
 361         can be fetched using the 'record' command.
 362       </para>
 363     </section>
 364   </chapter> <!-- Using pazpar2 -->
 365
 366  <reference id="reference">
 367   <title>Reference</title>
 368   <partintro>
 369    <para>
 370     The material in this chapter is drawn directly from the individual
 371     manual entries.
 372    </para>
 373   </partintro>
 374   &manref;
 375  </reference>
 376 </book>
 377
 378  <!-- Keep this comment at the end of the file
 379  Local variables:
 380  mode: sgml
 381  sgml-omittag:t
 382  sgml-shorttag:t
 383  sgml-minimize-attributes:nil
 384  sgml-always-quote-attributes:t
 385  sgml-indent-step:1
 386  sgml-indent-data:t
 387  sgml-parent-document: nil
 388  sgml-local-catalogs: nil
 389  sgml-namecase-general:t
 390  End:
 391  -->