X-Git-Url: http://lists.indexdata.com/cgi-bin?a=blobdiff_plain;f=doc%2Fbook.xml;h=bec3735d086f4364653eb59f10e42c199ca3da92;hb=16346103d5c9e44c7e62f2989af9486e217042a5;hp=10f54bdbb6b6a2bf71ddac204194b53ad0d3cbd5;hpb=a7c6b853ed128f05ef6d30a8687c5029e75f69e9;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 10f54bd..bec3735 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -2,7 +2,8 @@ + + %local; @@ -17,34 +18,43 @@ --> ]> - + Metaproxy - User's Guide and Reference - - AdamDickmeiss - - - MarcCromme - - - MikeTaylor - + + + AdamDickmeiss + + + MarcCromme + + + MikeTaylor + + + &version; - 2006 + 2005-2007 Index Data ApS + This manual is part of Metaproxy version &version;. + + Metaproxy is a universal router, proxy and encapsulated metasearcher for information retrieval protocols. It accepts, processes, interprets and redirects requests from IR clients using - standard protocols such as + standard protocols such as the binary ANSI/NISO Z39.50 - (and in the future SRU - and SRW), as + and the information search and retireval + web services SRU + and SRW, as well as functioning as a limited HTTP server. + + Metaproxy is configured by an XML file which specifies how the software should function in terms of routes that the request packets can take through the proxy, each step on a @@ -155,7 +165,7 @@ You may modify your copy of the software (fix bugs, add features) if you need to. We encourage you to send your changes back to us for integration into the master copy, but you are not obliged to do so. You - may NOT pass your changes on to any other party. + may NOT pass your changes on to any other party. @@ -333,6 +343,39 @@ +
+ Installation on RPM based Linux Systems + + All external dependencies for Metaproxy are available as + RPM packages, either from your distribution site, or from the + RPMfind site. + + + For example, an installation of the requires Boost C++ development + libraries on RedHat Fedora C4 and C5 can be done like this: + + wget ftp://fr.rpmfind.net/wlinux/fedora/core/updates/testing/4/SRPMS/boost-1.33.0-3.fc4.src.rpm + sudo rpmbuild --buildroot src/ --rebuild -p fc4/boost-1.33.0-3.fc4.src.rpm + sudo rpm -U /usr/src/redhat/RPMS/i386/boost-*rpm + + + + The YAZ library is needed to + compile &metaproxy;, see there + for more information on available RPM packages. + + + There is currently no official RPM package for YAZ++. + See the YAZ++ pages + for more information on a Unix tarball install. + + + With these packages installed, the usual configure + make + procedure can be used for Metaproxy as outlined in + . + +
+
Installation on Windows @@ -492,6 +535,152 @@
+ + YAZ Proxy Comparison + + The table below lists facilities either supported by either + YAZ Proxy or Metaproxy. + + + Metaproxy / YAZ Proxy comparison + + + + Facility + Metaproxy + YAZ Proxy + + + + + Z39.50 server + Using filter frontend_net + Supported + + + SRU server + Supported with filter sru_z3950 + Supported + + + Z39.50 client + Supported with filter z3950_client + Supported + + + SRU client + Unsupported + Unsupported + + + Connection reuse + Supported with filter session_shared + Supported + + + Connection share + Supported with filter session_shared + Unsupported + + + Result set reuse + Supported with filter session_shared + Within one Z39.50 session / HTTP keep-alive + + + Record cache + Unsupported + Supported for last result set within one Z39.50/HTTP-keep alive session + + + Z39.50 Virtual database, i.e. select any Z39.50 target for database + Supported with filter virt_db + Unsupported + + + SRU Virtual database, i.e. select any Z39.50 target for path + Supported with filter virt_db, + sru_z3950 + Supported + + + Multi target search + Supported with filter multi (round-robin) + Unsupported + + + Retrieval and search limits + Unsupported + Supported + + + Bandwidth limits + Unsupported + Supported + + + Connect limits + Unsupported + Supported + + + Retrieval sanity check and conversions + Supported using filter record_transform + Supported + + + Query check + + Supported in a limited way using query_rewrite + + Supported + + + Query rewrite + Supported with query_rewrite + Unsupported + + + Session invalidate for -1 hits + Unsupported + Supported + + + Architecture + Multi-threaded + select for networked modules such as + frontend_net) + Single-threaded using select + + + + Extensability + Most functionality implemented as loadable modules + Unsupported and experimental + + + + USEMARCON + Unsupported + Supported + + + + Portability + + Requires YAZ, YAZ++ and modern C++ compiler supporting + Boost. + + + Requires YAZ and YAZ++. + STL is not required so pretty much any C++ compiler out there should work. + + + + + +
+
+ The Metaproxy Architecture @@ -572,7 +761,7 @@ plugins that provide new filters. The filter API is small and conceptually simple, but there are many details to master. See the section below on - extensions. + Filters.
@@ -627,10 +816,10 @@ packages (frontend_net); others are sinks: they consume packages and return a result - (z3950_client, - backend_test, + (backend_test, bounce, - http_file); + http_file, + z3950_client); the others are true filters, that read, process and pass on the packages they are fed (auth_simple, @@ -653,8 +842,7 @@ the core Metaproxy binary. This overview is intended to give a flavor of the available functionality; more detailed information about each type of filter is included below in - the reference guide to Metaproxy filters. + . The filters are here named by the string that is used as the @@ -732,8 +920,8 @@ Figure out what additional information we need in: sets Z39.50 packages to Z_Close, and HTTP_Request packages to HTTP_Response err code 400 packages, and adds a suitable bounce message. - The bounce filter is usually added at end of each filter chain - config.xml to prevent infinite hanging of for example HTTP + The bounce filter is usually added at end of each filter chain route + to prevent infinite hanging of for example HTTP requests packages when only the Z39.50 client partial sink filter is found in the route. @@ -741,6 +929,19 @@ Figure out what additional information we need in:
+ <literal>cql_rpn</literal> + (mp::filter::CQLtoRPN) + + A query language transforming filter which catches Z39.50 + searchRequest + packages containing CQL queries, transforms + those to RPN queries, + and sends the searchRequests on to the next + filters. It is among other things useful in a SRU context. + +
+ +
<literal>frontend_net</literal> (mp::filter::FrontendNet) @@ -755,7 +956,8 @@ Figure out what additional information we need in: <literal>http_file</literal> (mp::filter::HttpFile) - A partial sink which swallows only HTTP_Request packages, and + A partial sink which swallows only + HTTP_Request packages, and returns the contents of files from the local filesystem in response to HTTP requests. It lets Z39.50 packages and all other forthcoming package types @@ -768,6 +970,26 @@ Figure out what additional information we need in:
+ <literal>load_balance</literal> + (mp::filter::LoadBalance) + + Performs load balancing for incoming Z39.50 init requests. + It is used together with the virt_db filter, + but unlike the multi filter it does send an + entire session to only one of the virtual backends. The + load_balance filter is assuming that + all backend targets have equal content, and chooses the backend + with least load cost for a new session. + + + This filter is experimental and yet not mature for heavy load + production sites. + + + +
+ +
<literal>log</literal> (mp::filter::Log) @@ -776,7 +998,7 @@ Figure out what additional information we need in: as multiple different logging formats.
- +
<literal>multi</literal> (mp::filter::Multi) @@ -792,7 +1014,9 @@ Figure out what additional information we need in: <literal>query_rewrite</literal> (mp::filter::QueryRewrite) - Rewrites Z39.50 Type-1 and Type-101 (``RPN'') queries by a + Rewrites Z39.50 Type-1 + and Type-101 (``RPN'') + queries by a three-step process: the query is transliterated from Z39.50 packet structures into an XML representation; that XML representation is transformed by an XSLT stylesheet; and the @@ -820,18 +1044,11 @@ Figure out what additional information we need in: <literal>session_shared</literal> (mp::filter::SessionShared) - When this is finished, it will implement global sharing of + This filter implements global sharing of result sets (i.e. between threads and therefore between - clients), yielding performance improvements especially when - incoming requests are from a stateless environment such as a - web-server, in which the client process representing a session - might be any one of many. However: + clients), yielding performance improvements by clever resource + pooling. - - - This filter is not yet completed. - -
@@ -839,8 +1056,20 @@ Figure out what additional information we need in: (mp::filter::SRUtoZ3950) This filter transforms valid - SRU/GET or SRU/SOAP requests to Z3950 requests, and wraps the - received hit counts and XML records into suitable SRU response messages. + SRU GET/POST/SOAP searchRetrieve requests to Z3950 init, search, + and present requests, and wraps the + received hit counts and XML records into suitable SRU response + messages. + The sru_z3950 filter processes also SRU + GET/POST/SOAP explain requests, returning + either the absolute minimum required by the standard, or a full + pre-defined ZeeReX explain record. + See the + ZeeReX Explain + standard pages and the + SRU Explain pages + for more information on the correct explain syntax. + SRU scan requests are not supported yet.
@@ -888,6 +1117,29 @@ Figure out what additional information we need in: are passed untouched.
+ + +
+ <literal>zeerex_explain</literal> + (mp::filter::ZeerexExplain) + + This filter acts as a sink for + Z39.50 explain requests, returning a static ZeeReX + Explain XML record from the config section. All other packages + are passed through. + See the + ZeeReX Explain + standard pages + for more information on the correct explain syntax. + + + + This filter is not yet completed. + + +
+ + @@ -940,16 +1192,11 @@ Figure out what additional information we need in: If Metaproxy is an interpreter providing operations on packages, then its configuration file can be thought of as a program for that - interpreter. Configuration is by means of a single file, the name + interpreter. Configuration is by means of a single XML file, the name of which is supplied as the sole command-line argument to the metaproxy program. (See - the reference guide - below for more information on invoking Metaproxy.) - - - The configuration files are written in XML. (But that's just an - implementation detail - they could just as well have been written - in YAML or Lisp-like S-expressions, or in a custom syntax.) + below for more information on invoking + Metaproxy.) @@ -957,15 +1204,15 @@ Figure out what additional information we need in: Overview of the config file XML structure All elements and attributes are in the namespace - . + . This is most easily achieved by setting the default namespace on the top-level element, as here: - <yp2 xmlns="http://indexdata.dk/yp2/config/1"> + <metaproxy xmlns="http://indexdata.com/metaproxy" version="1.0"> - The top-level element is <yp2>. This contains a + The top-level element is <metaproxy>. This contains a <start> element, a <filters> element and a <routes> element, in that order. <filters> is optional; the other two are mandatory. All three are @@ -985,7 +1232,7 @@ Figure out what additional information we need in: and contain various elements that provide suitable configuration for a filter of its type. The filter-specific elements are described in - the reference guide below. + . Filters defined in this part of the file must carry an id attribute so that they can be referenced from elsewhere. @@ -1021,7 +1268,7 @@ Figure out what additional information we need in: client-server dialogues. - + @@ -1038,7 +1285,7 @@ Figure out what additional information we need in: - + ]]> It works by defining a single route, called @@ -1069,7 +1316,26 @@ Figure out what additional information we need in: which returns the response to the client. -
+ +
+ Config file modularity + + Metaproxy XML configuration snippets can be reused by other + filters using the XInclude standard, as seen in + the /etc/config-sru-to-z3950.xml example SRU + configuration. + + + + + +]]> + +
+ +
Config file syntax checking The distribution contains RelaxNG Compact and XML syntax checking @@ -1162,7 +1428,7 @@ Figure out what additional information we need in: marc - indexdata.dk/marc + indexdata.com/marc ]]> @@ -1187,7 +1453,7 @@ Figure out what additional information we need in: Index Data's tiny testing database of MARC records: - + @@ -1202,12 +1468,12 @@ Figure out what additional information we need in: marc - indexdata.dk/marc + indexdata.com/marc all z3950.loc.gov:7090/voyager - indexdata.dk/marc + indexdata.com/marc @@ -1217,7 +1483,7 @@ Figure out what additional information we need in: -]]> +]]> (Using a virt_db @@ -1471,12 +1737,123 @@ Z> + + Combined SRU webservice and Z39.50 server configuration + + Metaproxy can act as + SRU and + SRW + web service server, which translates web service requests to + ANSI/NISO Z39.50 packages and + sends them off to common available targets. + + + A typical setup for this operation needs a filter route including the + following modules: + + + + SRU/Z39.50 Server Filter Route Configuration + + + + Filter + Importance + Purpose + + + + + + frontend_net + required + Accepting HTTP connections and passing them to following + filters. Since this filter also accepts Z39.50 connections, the + server works as SRU and Z39.50 server on the same port. + + + sru_z3950 + required + Accepting SRU GET/POST/SOAP explain and + searchRetrieve requests for the the configured databases. + Explain requests are directly served from the static XML configuration. + SearchRetrieve requests are + transformed to Z39.50 search and present packages. + All other HTTP and Z39.50 packages are passed unaltered. + + + http_file + optional + Serving HTTP requests from the filesystem. This is only + needed if the server should serve XSLT stylesheets, static HTML + files or Java Script for thin browser based clients. + Z39.50 packages are passed unaltered. + + + cql_rpn + required + Usually, Z39.50 servers do not talk CQL, hence the + translation of the CQL query language to RPN is mandatory in + most cases. Affects only Z39.50 search packages. + + + record_transform + optional + Some Z39.50 backend targets can not present XML record + syntaxes in common wanted element sets. using this filter, one + can transform binary MARC records to MARCXML records, and + further transform those to any needed XML schema/format by XSLT + transformations. Changes only Z39.50 present packages. + + + session_shared + optional + The stateless nature of web services requires frequent + re-searching of the same targets for display of paged result set + records. This might be an unacceptable burden for the accessed + backend Z39.50 targets, and this mosule can be added for + efficient backend target resource pooling. + + + z3950_client + required + Finally, a Z39.50 package sink is needed in the filter + chain to provide the response packages. The Z39.50 client module + is used to access external targets over the network, but any + coming local Z39.50 package sink could be used instead of. + + + bounce + required + Any Metaproxy package arriving here did not do so by + purpose, and is bounced back with connection closure. this + prevents inifinite package hanging inside the SRU server. + + + +
+ + A typical minimal example SRU and + SRW server configuration file is found + in the tarball distribution at + etc/config-sru-to-z3950.xml. + + + Off course, any other metaproxy modules can be integrated into a + SRU server solution, including, but not limited to, load balancing, + multiple target querying + (see ), and complex RPN query rewrites. + + +
+ + @@ -1489,7 +1866,7 @@ Z> Stop! Do not read this! You won't enjoy it at all. You should just skip ahead to - the reference guide, + , which tells @@ -1738,9 +2115,9 @@ Z> - - - Reference guide + + Reference + The material in this chapter is drawn directly from the individual manual entries. In particular, the Metaproxy invocation section is @@ -1748,7 +2125,8 @@ Z> on each individual filter is available using the name of the filter as the argument to the man command. - &manref; + + &manref;