X-Git-Url: http://lists.indexdata.com/cgi-bin?a=blobdiff_plain;f=doc%2Fbook.xml;h=2c96f95adde4494940bfd45db370b4f60c4d871e;hb=3953421488cf82acb6553f51a049e82892c0277a;hp=3faf161d53b0bf54aedc2b20782a4c020a3b88bc;hpb=5b93313da166559fa944178bfd1775eb08c7995c;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 3faf161..2c96f95 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,11 +1,34 @@ - + + + + + + %common; + + + + +]> + + Metaproxy - User's Guide and Reference - MikeTaylor + AdamDickmeiss - AdamDickmeiss + MarcCromme + + + MikeTaylor 2006 @@ -32,10 +55,10 @@ using the filter API. - The terms under which Metaproxy will be distributed have yet to be - established, but it will not necessarily be open source; so users - should not at this stage redistribute the code without explicit - written permission from the copyright holders, Index Data ApS. + Metaproxy is not open-source software, but + may be freely downloaded, unpacked, inspected, built and run for + evaluation purposes. Deployment requires a separate, commercial, + license. @@ -56,7 +79,7 @@ Metaproxy - is a standalone program that acts as a universal router, proxy and + is a stand alone program that acts as a universal router, proxy and encapsulated metasearcher for information retrieval protocols such as Z39.50, and in the future SRU and SRW. @@ -84,13 +107,13 @@ being more powerful, flexible, configurable and extensible. Among its many advantages over the older, more pedestrian work are support for multiplexing (encapsulated metasearching), routing by - database name, authentication and authorisation and serving local + database name, authentication and authorization and serving local files via HTTP. Equally significant, its modular architecture facilitites the creation of pluggable modules implementing further functionality. - This manual will briefly describe Metaproxy's licensing situation + This manual will describe how to install Metaproxy before giving an overview of its architecture, then discussing the key concept of a filter in some depth and giving an overview of the various filter types, then discussing the configuration file @@ -105,26 +128,60 @@ - - - - The Metaproxy Licence - - - No decision has yet been made on the terms under which - Metaproxy will be distributed. - - It is possible that, unlike - other Index Data products, metaproxy may not be released under a - free-software licence such as the GNU GPL. Until a decision is - made and a public statement made, then, and unless it has been - delivered to you other specific terms, please treat Metaproxy as - though it were proprietary software. - The code should not be redistributed without explicit - written permission from the copyright holders, Index Data ApS. - + + The Metaproxy License + + + + You are allowed to download this software for evaluation purposes. + You can unpack it, build it, run it, see how it works and how it fits + your needs, all at zero cost. + + + + + You may NOT deploy the software. For the purposes of this license, + deployment means running it for any purpose other than evaluation, + whether or not you or anyone else makes a profit from doing so. If + you wish to deploy the software, you must first contact Index Data and + arrange to purchase a DEPLOYMENT LICENCE. If you are unsure + whether or not your proposed use of the software constitutes + deployment, email us at info@indexdata.com + for clarification. + + + + + You may modify your copy of the software (fix bugs, add features) + if you need to. We encourage you to send your changes back to us for + integration into the master copy, but you are not obliged to do so. You + may NOT pass your changes on to any other party. + + + + + There is NO WARRANTY for this software, to the extent permitted by + applicable law. We provide the software ``as is'' without warranty of + any kind, either expressed or implied, including, but not limited to, the + implied warranties of MERCHANTABILITY and FITNESS FOR A + PARTICULAR PURPOSE. The entire risk as to the quality and + performance of the software is with you. Should the software prove + defective, you assume the cost of all necessary servicing, repair or + correction. In no event unless required by applicable law will we be + liable to you for damages, arising out of the use of the software, + including but not limited to loss of data or data being rendered + inaccurate. + + + + + All rights to the software are reserved by Index Data except where + this license explicitly says otherwise. + + + - + Installation @@ -164,7 +221,7 @@ for more information. - We have succesfully built Metaproxy using the compilers + We have successfully built Metaproxy using the compilers GCC version 4.0 and Microsoft Visual Studio 2003/2005. @@ -321,7 +378,7 @@ \boost\lib, \boost\include. - For more informatation about installing Boost refer to the + For more information about installing Boost refer to the getting started pages. @@ -335,7 +392,7 @@ here. - Libxslt has other dependencies, but thes can all be downloaded + Libxslt has other dependencies, but these can all be downloaded from the same site. Get the following: iconv, zlib, libxml2, libxslt. @@ -367,7 +424,7 @@
Metaproxy - Metaproxy is shipped with NMAKE makfiles as well - similar + Metaproxy is shipped with NMAKE makefiles as well - similar to those found in the YAZ++/YAZ packages. Adjust this Makefile to point to the proper locations of Boost, Libxslt, Libxml2, zlib, iconv, yaz and yazpp. @@ -425,7 +482,7 @@ - After succesful compilation you'll find + After successful compilation you'll find metaproxy.exe in the bin directory. @@ -462,7 +519,7 @@ In general, packages are doctored as they pass through Metaproxy. For example, when the proxy performs authentication - and authorisation on a Z39.50 Init request, it removes the + and authorization on a Z39.50 Init request, it removes the authentication credentials from the package so that they are not passed onto the back-end server; and when search-response packages are obtained from multiple servers, they are merged @@ -501,7 +558,7 @@ The word ``filter'' is sometimes used rather loosely, in two different ways: it may be used to mean a particular type of filter, as when we speak of ``the - auth_simplefilter'' or ``the multi filter''; or it may be used + auth_simple filter'' or ``the multi filter''; or it may be used to be a specific instance of a filter within a Metaproxy configuration. For example, a single configuration will often contain multiple instances of the @@ -558,7 +615,7 @@ as part of Metaproxy, and others may be provided by third parties and dynamically loaded. They all conform to the same simple API of essentially two methods: configure() is - called at startup time, and is passed a DOM tree representing that + called at startup time, and is passed an XML DOM tree representing that part of the configuration file that pertains to this filter instance: it is expected to walk that tree extracting relevant information; and process() is called every @@ -572,6 +629,7 @@ others are sinks: they consume packages and return a result (z3950_client, backend_test, + bounce, http_file); the others are true filters, that read, process and pass on the packages they are fed @@ -579,7 +637,9 @@ log, multi, query_rewrite, + record_transform, session_shared, + sru_z3950, template, virt_db). @@ -591,7 +651,7 @@ We now briefly consider each of the types of filter supported by the core Metaproxy binary. This overview is intended to give a - flavour of the available functionality; more detailed information + flavor of the available functionality; more detailed information about each type of filter is included below in the reference guide to Metaproxy filters. @@ -609,11 +669,34 @@ The filters are here listed in alphabetical order: + +
<literal>auth_simple</literal> (mp::filter::AuthSimple) - Simple authentication and authorisation. The configuration + Simple authentication and authorization. The configuration specifies the name of a file that is the user register, which lists username:password pairs, one per line, colon separated. When a session begins, it @@ -632,7 +715,7 @@ <literal>backend_test</literal> (mp::filter::Backend_test) - A sink that provides dummy responses in the manner of the + A partial sink that provides dummy responses in the manner of the yaz-ztest Z39.50 server. This is useful only for testing. Seriously, you don't need this. Pretend you didn't even read this section. @@ -640,13 +723,31 @@
+ <literal>bounce</literal> + (mp::filter::Bounce) + + A sink that swallows all packages, + and returns them almost unprocessed. + It never sends any package of any type further down the row, but + sets Z39.50 packages to Z_Close, and HTTP_Request packages to + HTTP_Response err code 400 packages, and adds a suitable bounce + message. + The bounce filter is usually added at end of each filter chain + config.xml to prevent infinite hanging of for example HTTP + requests packages when only the Z39.50 client partial sink + filter is found in the + route. + +
+ +
<literal>frontend_net</literal> (mp::filter::FrontendNet) A source that accepts Z39.50 connections from a port specified in the configuration, reads protocol units, and feeds them into the next filter in the route. When the result is - revceived, it is returned to the original origin. + received, it is returned to the original origin.
@@ -654,8 +755,12 @@ <literal>http_file</literal> (mp::filter::HttpFile) - A sink that returns the contents of files from the local - filesystem in response to HTTP requests. (Yes, Virginia, this + A partial sink which swallows only HTTP_Request packages, and + returns the contents of files from the local + filesystem in response to HTTP requests. + It lets Z39.50 packages and all other forthcoming package types + pass untouched. + (Yes, Virginia, this does mean that Metaproxy is also a Web-server in its spare time. So far it does not contain either an email-reader or a Lisp interpreter, but that day is surely coming.) @@ -667,7 +772,8 @@ (mp::filter::Log) Writes logging information to standard output, and passes on - the package unchanged. + the package unchanged. A log file name can be specified, as well + as multiple different logging formats.
@@ -695,6 +801,21 @@ + +
+ <literal>record_transform</literal> + (mp::filter::RecordTransform) + + This filter acts only on Z3950 present requests, and let all + other types of packages and requests pass untouched. It's use is + twofold: blocking Z3950 present requests, which the backend + server does not understand and can not honor, and transforming + the present syntax and elementset name according to the rules + specified, to fetch only existing record formats, and transform + them on the fly to requested record syntaxes. + +
+
<literal>session_shared</literal> (mp::filter::SessionShared) @@ -712,6 +833,16 @@
+ +
+ <literal>sru_z3950</literal> + (mp::filter::SRUtoZ3950) + + This filter transforms valid + SRU/GET or SRU/SOAP requests to Z3950 requests, and wraps the + received hit counts and XML records into suitable SRU response messages. + +
<literal>template</literal> @@ -728,7 +859,7 @@ <section> <title><literal>virt_db</literal> - (mp::filter::Virt_db) + (mp::filter::VirtualDB) Performs virtual database selection: based on the name of the database in the search request, a server is selected, and its @@ -745,13 +876,16 @@ <literal>z3950_client</literal> (mp::filter::Z3950Client) - Performs Z39.50 searching and retrieval by proxying the + A partial sink which swallows only Z39.50 packages. + It performs Z39.50 searching and retrieval by proxying the packages that are passed to it. Init requests are sent to the address specified in the VAL_PROXY otherInfo attached to the request: this may have been specified by client, or generated by a virt_db filter earlier in the route. Subsequent requests are sent to the same address, which is remembered at Init time in a Session object. + HTTP_Request packages and all other forthcoming package types + are passed untouched.
@@ -841,26 +975,10 @@ implementation detail - they could just as well have been written in YAML or Lisp-like S-expressions, or in a custom syntax.) - - Since XML has been chosen, an XML schema, - config.xsd, is provided for validating - configuration files. This file is supplied in the - etc directory of the Metaproxy distribution. It - can be used by (among other tools) the xmllint - program supplied as part of the libxml2 - distribution: - - - xmllint --noout --schema etc/config.xsd my-config-file.xml - - - (A recent version of libxml2 is required, as - support for XML Schemas is a relatively recent addition.) -
- Overview of XML structure + Overview of the config file XML structure All elements and attributes are in the namespace . @@ -920,7 +1038,7 @@ The following is a small, but complete, Metaproxy configuration file (included in the distribution as - metaproxy/etc/config0.xml). + metaproxy/etc/config1.xml). This file defines a very simple configuration that simply proxies to whatever back-end server the client requests, but logs each request and response. This can be useful for debugging complex @@ -941,13 +1059,14 @@ + ]]> It works by defining a single route, called - start, which consists of a sequence of three + start, which consists of a sequence of four filters. The first and last of these are included by reference: their <filter> elements have refid attributes that refer to filters defined @@ -955,18 +1074,51 @@ middle filter is included inline in the route. - The three filters in the route are as follows: first, a + The four filters in the route are as follows: first, a frontend_net filter accepts Z39.50 requests from any host on port 9000; then these requests are passed through a log filter that emits a message for each request; they are then fed into a z3950_client - filter, which forwards the requests to the client-specified - back-end Z39.509 server. When the response arrives, it is handed + filter, which forwards all Z39.50 requests to the client-specified + back-end Z39.509 server. Those Z39.50 packages are returned by the + z3950_client filter, with the response data + filled by the external Z39.50 server targeted. + All non-Z39.50 packages are passed through to the + bounce filter, which definitely bounces + everything, including fish, bananas, cold pyjamas, + mutton, beef and trout packages. + When the response arrives, it is handed back to the log filter, which emits another - message; and then to the front-end filter, which returns the - response to the client. + message; and then to the frontend_net filter, + which returns the response to the client.
+
+ Config file syntax checking + + The distribution contains RelaxNG Compact and XML syntax checking + files, as well as XML Schema files. These are found in the + distribution paths + + xml/schema/metaproxy.rnc + xml/schema/metaproxy.rng + xml/schema/metaproxy.xsd + + and can be used to verify or debug the XML structure of + configuration files. For example, using the utility + xmllint, syntax checking is done like this: + + xmllint --noout --schema xml/schema/metaproxy.xsd etc/config-local.xml + xmllint --noout --relaxng xml/schema/metaproxy.rng etc/config-local.xml + + (A recent version of libxml2 is required, as + support for XML Schemas is a relatively recent addition.) + + + You can of course use any other RelaxNG or XML Schema compliant tool + you wish. + +
@@ -989,14 +1141,14 @@ The interaction between these two filters is necessarily complex: it reflects the real, irreducible complexity of multi-database searching in a protocol such - as Z39.50 that separates initialisation from searching, and in - which the database to be searched is not known at initialisation + as Z39.50 that separates initialization from searching, and in + which the database to be searched is not known at initialization time. It's possible to use these filters without understanding the details of their functioning and the interaction between them; the - next two sections of this chapter are ``HOWTO'' guides for doing + next two sections of this chapter are ``HOW-TO'' guides for doing just that. However, debugging complex configurations will require a deeper understanding, which the last two sections of this chapters attempt to provide. @@ -1086,6 +1238,7 @@ 30 + ]]> @@ -1184,6 +1337,27 @@ Z> be metasearched in this way: issues of resource usage and administrative complexity dictate the practical limits. + + What happens when one of the databases doesn't respond? By default, + the entire multi-database search fails, and the appropriate + diagnostic is returned to the client. This is usually appropriate + during development, when technicians need maximum information, but + can be inconvenient in deployment, when users typically don't want + to be bothered with problems of this kind and prefer just to get + the records from the databases that are available. To obtain this + latter behavior add an empty + <hideunavailable> + element inside the + multi filter: + + + + ]]> + + Under this regime, an error is reported to the client only if + all the databases in a multi-database search + are unavailable. + @@ -1221,15 +1395,18 @@ Z> >the HTTP 1.1 specification. - The role of the virt_db filter is to rewrite - this otherInfo packet dependent on the virtual database that the - client wants to search. + Within Metaproxy, Search requests that are part of the same + session as an Init request that carries a + VAL_PROXY otherInfo are also annotated with the + same information. The role of the virt_db + filter is to rewrite this otherInfo packet dependent on the + virtual database that the client wants to search. When Metaproxy receives a Z39.50 Init request from a client, it doesn't immediately forward that request to the back-end server. Why not? Because it doesn't know which - back-end server to forward it to until the client sends a search + back-end server to forward it to until the client sends a Search request that specifies the database that it wants to search in. Instead, it just treasures the Init request up in its heart; and, later, the first time the client does a search on one of the @@ -1259,12 +1436,10 @@ Z> <target> elements. What does this mean? Only that the filter will add multiple VAL_PROXY otherInfo packets to the - search requests that pass through it. That's because the virtual + Search requests that pass through it. That's because the virtual DB filter is dumb, and does exactly what it's told - no more, no less. - - - If a search request with multiple VAL_PROXY + If a Search request with multiple VAL_PROXY otherInfo packets reaches a z3950_client filter, this is an error. That filter doesn't know how to deal with multiple targets, so it will either just pick one and search @@ -1274,16 +1449,16 @@ Z> The multi filter comes to the rescue! This is the only filter that knows how to deal with multiple VAL_PROXY otherInfo packets, and it does so by - making multiple copies of the entire search-request: one for each + making multiple copies of the entire Search request: one for each VAL_PROXY. Each of these new copies is then - passed down through the remaining filters in the route, instead of - the original one. (The copies are handled in parallel though the - spawning of new threads.) Since the copies each have ony one + passed down through the remaining filters in the route. (The + copies are handled in parallel though the + spawning of new threads.) Since the copies each have only one VAL_PROXY otherInfo, they can be handled by the z3950_client filter, which happily deals with each one individually. When the results of the individual searches come back up to the multi filter, it - merges them into a single search-response, which is what + merges them into a single Search response, which is what eventually makes it back to the client. @@ -1305,9 +1480,8 @@ Z> [Here there should be a diagram showing the progress of packages through the filters during a simple virtual-database search and a multi-database search, but is seems that your - toolchain has not been able to include the diagram in this - document. This is because of LaTeX suckage. Time to move to - OpenOffice. Yes, really.] + tool chain has not been able to include the diagram in this + document.]