1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
5 <!ENTITY % local SYSTEM "local.ent">
7 <!ENTITY % entities SYSTEM "entities.ent">
9 <!ENTITY % common SYSTEM "common/common.ent">
12 <!-- $Id: book.xml,v 1.6 2007-01-19 21:50:02 adam Exp $ -->
15 <title>Pazpar2 - User's Guide and Reference</title>
17 <firstname>Sebastian</firstname><surname>Hammer</surname>
19 <releaseinfo>&version;</releaseinfo>
21 <year>©right-year;</year>
22 <holder>Index Data</holder>
26 Pazpar2 is a high-performance, user interface-independent, data
27 model-independent metasearching
28 middleware featuring merging, relevance ranking, record sorting,
32 This document is a guide and reference to Pazpar version &version;.
37 <imagedata fileref="common/id.png" format="PNG"/>
40 <imagedata fileref="common/id.eps" format="EPS"/>
47 <chapter id="introduction">
48 <title>Introduction</title>
50 Pazpar2 is a stand-alone metasearch client with a webservice API, designed
51 to be used either from a browser-based client (JavaScript, Flash, Java,
52 etc.), from from server-side code, or any combination of the two.
53 Pazpar2 is a highly optimized client designed to
54 search many resources in parallel. It implements record merging,
55 relevance-ranking and sorting by arbitrary data content, and facet
56 analysis for browsing purposes. It is designed to be data model
57 independent, and is capable of working with MARC, DublinCore, or any
58 other XML-structured response format -- XSLT is used to normalize and extract
59 data from retrieval records for display and analysis. It can be used
60 against any server which supports the Z39.50 protocol. Proprietary
61 backend modules can be used to support a large number of other protocols
62 (please contact Index Data for further information about this).
65 Additional functionality such as
66 user management, attractive displays are expected to be implemented by
67 applications that use pazpar2. Pazpar2 is user interface independent.
68 Its functionality is exposed through a simple REST-style webservice API,
69 designed to be simple to use from an Ajax-enbled browser, Flash
70 animation, Java applet, etc., or from a higher-level server-side language
71 like PHP or Java. Because session information can be shared between
72 browser-based logic and your server-side scripting, there is tremendous
73 flexibility in how you implement your business logic on top of pazpar2.
76 Once you launch a search in pazpar2, the operation continues behind the
77 scenes. Pazpar2 connects to servers, carries out searches, and
78 retrieves, deduplicates, and stores results internally. Your application
79 code may periodically inquire about the status of an ongoing operation,
80 and ask to see records or other result set facets. Result become
81 available immediately, and it is easy to build end-user interfaces which
82 feel extremely responsive, even when searching more than 100 servers
86 Pazpar2 is designed to be highly configurable. Incoming records are
87 normalized to XML/UTF-8, and then further normalized using XSLT to a
88 simple internal representation that is suitable for analysis. By
89 providing XSLT stylesheets for different kinds of result records, you
90 can tune pazpar2 to work against different kinds of information
91 retrieval servers. Finally, metadata is extracted, in a configurable
92 way, from this internal record, to support display, merging, ranking,
93 result set facets, and sorting. Pazpar2 is not bound to a specific model
94 of metadata, such as DublinCore or MARC -- by providing the right
95 configuration, it can work with a number of different kinds of data in
96 support of many different applications.
99 Pazpar2 is designed to be efficient and scalable. You can set it up to
100 search several hundred targets in parallel, or you can use it to support
101 hundreds of concurrent users. It is implemented with the same attention
102 to performance and economy that we use in our indexing engines, so that
103 you can focus on building your application, without worrying about the
104 details of metasearch logic. You can devote all of your attention to
105 usability and let pazpar2 do what it does best -- metasearch.
108 If you wish to connect to commercial or other databases which do not
109 support open standards, please contact Index Data. We have a licensing
110 agreement with a third party vendor which will enable pazpar2 to access
111 thousands of online databases, in addition the vast number of catalogs
112 and online services that support the Z39.50 protocol.
115 Pazpar2 is our attempt to re-think the traditional paradigms for
116 implementing and deploying metasearch logic, with an uncompromising
117 approach to performance, and attempting to make maximum use of the
118 capabilities of modern browsers. The demo user interface that
119 accompanies the distribution is but one example. If you think of new
120 ways of using pazpar2, we hope you'll share them with us, and if we
121 can provide assistance with regards to training, design, programming,
122 integration with different backends, hosting, or support, please don't
123 hesitate to contact us. If you'd like to see functionality in pazpar2
124 that is not there today, please don't hesitate to contact us. It may
125 already be in our development pipeline, or there might be a
126 possibility for you to help out by sponsoring development time or
127 code. Either way, get in touch and we will give you straight answers.
135 <chapter id="license">
136 <title>Pazpar2 License</title>
137 <para>To be decided and written.</para>
140 <chapter id="installation">
141 <title>Installation</title>
143 Pazpar2 depends on the following tools/libraries:
145 <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
148 The popular Z39.50 toolkit for the C language. YAZ must be
149 compiled with Libxml2/Libxslt support.
156 In order to compile Pazpar2 an ANSI C compiler is
157 required. The requirements should be the same as for YAZ.
160 <section id="installation.unix">
161 <title>Installation on Unix (from Source)</title>
163 Here is a quick step-by-step guide on how to compile the
164 tools that Pazpar2 uses. Only few systems have none of the required
165 tools binary packages. If, for example, Libxml2/libxslt are already
166 installed as development packages use these.
170 Ensure that the development libraries + header files are
171 available on your system before compiling Pazpar2. For installation
172 of YAZ, refer to the YAZ installation chapter.
175 gunzip -c pazpar2-version.tar.gz|tar xf -
184 <section id="installation.debian">
185 <title>Installation on Debian GNU/Linux</title>
187 All dependencies for Pazpar2 are available as
188 <ulink url="&url.debian;">Debian</ulink>
189 packages for the sarge (stable in 2005) and etch (testing in 2005)
193 The procedures for Debian based systems, such as
194 <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
197 apt-get install libyaz-dev
200 With these packages installed, the usual configure + make
201 procedure can be used for Pazpar2 as outlined in
202 <xref linkend="installation.unix"/>.
208 <title>Using pazpar2</title>
210 This chapter provides a general introduction to the use and deployment of pazpar2.
213 <section id="architecture">
214 <title>Pazpar2 and your systems architecture</title>
216 Pazpar2 is designed to provide asynchronous, behind-the-scenes
217 metasearching functionality to your application, exposing this
218 functionality using a simple webservice API that can be accessed
219 from any number of development environments. In particular, it is
220 possible to combine pazpar2 either with your server-side dynamic
221 website scripting, with scripting or code running in the browser, or
222 with any combination of the two. Pazpar2 is an excellent tool for
223 building advanced, Ajax-based user interfaces for metasearch
224 functionality, but it isn't a requirement -- you can choose to use
225 pazpar2 entirely as a backend to your regular server-side scripting.
226 When you do use pazpar2 in conjunction
227 with browser scripting (JavaScript/Ajax, Flash, applets, etc.), there are
228 special considerations.
232 Pazpar2 implements a simple but efficient HTTP server, and it is
233 designed to interact directly with scripting running in the browser
234 for the best possible performance, and to limit overhead when
235 several browser clients generate numerous webservice requests.
236 However, it is still desirable to use a conventional webserver,
237 such as Apache, to serve up graphics, HTML documents, and
238 server-side scripting. Because the security sandbox environment of
239 most browser-side programming environments only allows communication
240 with the server from which the enclosing HTML page or object
241 originated, pazpar2 is designed so that it can act as a transparent
242 proxy in front of an existing webserver (see <xref
243 linkend="pazpar2_conf"/> for details). In this mode, all regular
244 HTTP requests are transparently passed through to your webserver,
245 while pazpar2 only intercepts search-related webservice requests.
249 If you want to expose your combined service on port 80, you can
250 either run your regular webserver on a different port, a different
251 server, or a different IP address associated with the same server.
255 Sometimes, it may be necessary to implement functionality on your
256 regular webserver that makes use of search results, for example to
257 implement data import functionality, emailing results, history
258 lists, personal citation lists, interlibrary loan functionality
259 ,etc. Fortunately, it is simple to exchange information between
260 pazpar2, your browser scripting, and backend server-side scripting.
261 You can send a session ID and possibly a record ID from your browser
262 code to your server code, and from there use pazpar2s webservice API
263 to access result sets or individual records. You could even 'hide'
264 all of pazpar2s functionality between your own API implemented on
265 the server-side, and access that from the browser or elsewhere. The
266 possibilities are just about endless.
270 <section id="data_model">
271 <title>Your data model</title>
273 Pazpar2 does not have a preconceived model of what makes up a data
274 model. There are no assumption that records have specific fields or
275 that they are organized in any particular way. The only assumption
276 is that data comes packaged in a form that the software can work
277 with (presently, that means XML or MARC), and that you can provide
278 the necessary information to massage it into pazpar2's internal
283 Handling retrieval records in pazpar2 is a two-step process. First,
284 you decide which data elements of the source record you are
285 interested in, and you specify any desired massaging or combining of
286 elements using an XSLT stylesheet (MARC records are automatically
287 normalized to MARCXML before this step). If desired, you can run
288 multiple XSLT stylesheets in series to accomplish this, but the
289 output of the last one should be a representation of the record in a
290 schema that pazpar2 understands.
294 The intermediate, internal representation of the record looks like
297 <record xmlns="http://www.indexdata.com/pazpar2/1.0"
298 mergekey="title The Shining author King, Stephen">
300 <metadata type="title">The Shining</metadata>
302 <metadata type="author">King, Stephen</metadata>
304 <metadata type="kind">ebook</metadata>
306 <!-- ... and so on -->
310 As you can see, there isn't much to it. There are really only a few
311 important elements to this file.
315 Elements should belong to the namespace
316 http://www.indexdata.com/pazpar2/1.0. If the root node contains the
317 attribute 'mergekey', then every record that generates the same
318 merge key (normalized for case differences, white space, and
319 truncation) will be joined into a cluster. In other words, you
320 decide how records are merged. If you don't include a merge key,
321 records are never merged. The 'metadata' elements provide the meat
322 of the elements -- the content. the 'type' attribute is used to
323 match each element against processing rules that determine what
324 happens to the data element next.
328 The next processing step is the extraction of metadata from the
329 intermediate representation of the record. This is governed by the
330 'metadata' elements in the 'service' section of the configuration
331 file. See <xref linkend="config-server"/> for details. The metadata
332 in the retrieval record ultimately drives merging, sorting, ranking,
333 the extraction of browse facets, and display, all configurable.
337 <section id="client">
338 <title>Client development</title>
340 You can use pazpar2 from any environment that allows you to use
341 webservices. The initial goal of the software was to support
342 Ajax-based applications, but there literally are no limits to what
343 you can do. You can use pazpar2 from Javascript, Flash, Java, etc.,
344 on the browser side, and from any development environment on the
345 server side, and you can pass session tokens and record IDs freely
346 around between these environments to build sophisticated applications.
347 Use your imagination.
351 The webservice API of pazpar2 is described in detail in <xref
352 linkend="pazpar2_protocol"/>.
356 In brief, you use the 'init' command to create a session, a
357 temporary workspace which carries information about the current
358 search. You start a new search using the 'search' command. Once the
359 search has been started, you can follow its progress using the
360 'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
361 can be fetched using the 'record' command.
364 </chapter> <!-- Using pazpar2 -->
366 <reference id="reference">
367 <title>Reference</title>
370 The material in this chapter is drawn directly from the individual
378 <!-- Keep this comment at the end of the file
383 sgml-minimize-attributes:nil
384 sgml-always-quote-attributes:t
387 sgml-parent-document: nil
388 sgml-local-catalogs: nil
389 sgml-namecase-general:t