X-Git-Url: http://lists.indexdata.com/cgi-bin?a=blobdiff_plain;f=doc%2Fbook.xml;h=7a67a114ecc0c3f9073c34230db4e2a8f5ca9246;hb=4b4784cf0c2958bc4a4172d2ff8935b6b3c6e5d3;hp=d6ec440121ccb1deb7a7f33fe83754fa08117925;hpb=cb2467a71f98decd7adbaf768d3a1c0a1df65bdf;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index d6ec440..7a67a11 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,787 +1,1325 @@ - + - Metaproxy - User's Guide and Reference - - MikeTaylor - - - 2006 - Index Data - - - - ### - Metaproxy is ... in need of description :-) - - + Metaproxy - User's Guide and Reference + + MikeTaylor + + + AdamDickmeiss + + + 2006 + Index Data ApS + + + + Metaproxy is a universal router, proxy and encapsulated + metasearcher for information retrieval protocols. It accepts, + processes, interprets and redirects requests from IR clients using + standard protocols such as + ANSI/NISO Z39.50 + (and in the future SRU + and SRW), as + well as functioning as a limited + HTTP server. + Metaproxy is configured by an XML file which + specifies how the software should function in terms of routes that + the request packets can take through the proxy, each step on a + route being an instantiation of a filter. Filters come in many + types, one for each operation: accepting Z39.50 packets, logging, + query transformation, multiplexing, etc. Further filter-types can + be added as loadable modules to extend Metaproxy functionality, + using the filter API. + + + The terms under which Metaproxy will be distributed have yet to be + established, but it will not necessarily be open source; so users + should not at this stage redistribute the code without explicit + written permission from the copyright holders, Index Data ApS. + + + + + + + + + + + + - - - Introduction - - -
- Overview + Introduction + + - Metaproxy - is .. + Metaproxy + is a standalone program that acts as a universal router, proxy and + encapsulated metasearcher for information retrieval protocols such + as Z39.50, and in the future + SRU and SRW. + To clients, it acts as a server of these protocols: it can be searched, + records can be retrieved from it, etc. + To servers, it acts as a client: it searches in them, + retrieves records from them, etc. it satisfies its clients' + requests by transforming them, multiplexing them, forwarding them + on to zero or more servers, merging the results, transforming + them, and delivering them back to the client. In addition, it + acts as a simple HTTP server; support + for further protocols can be added in a modular fashion, through the + creation of new filters. + + Anything goes in! + Anything goes out! + Fish, bananas, cold pyjamas, + Mutton, beef and trout! + - attributed to Cole Porter. + - ### We should probably consider saying a little more by way of - introduction. + Metaproxy is a more capable alternative to + YAZ Proxy, + being more powerful, flexible, configurable and extensible. Among + its many advantages over the older, more pedestrian work are + support for multiplexing (encapsulated metasearching), routing by + database name, authentication and authorisation and serving local + files via HTTP. Equally significant, its modular architecture + facilitites the creation of pluggable modules implementing further + functionality. + + + This manual will briefly describe Metaproxy's licensing situation + before giving an overview of its architecture, then discussing the + key concept of a filter in some depth and giving an overview of + the various filter types, then discussing the configuration file + format. After this come several optional chapters which may be + freely skipped: a detailed discussion of virtual databases and + multi-database searching, some notes on writing extensions + (additional filter types) and a high-level description of the + source code. Finally comes the reference guide, which contains + instructions for invoking the metaproxy + program, and detailed information on each type of filter, + including examples. -
- - Filters - - -
- Introductory notes + + The Metaproxy Licence - It's useful to think of Metaproxy as an interpreter providing a small - number of primitives and operations, but operating on a very - complex data type, namely the ``package''. + + No decision has yet been made on the terms under which + Metaproxy will be distributed. + + It is possible that, unlike + other Index Data products, metaproxy may not be released under a + free-software licence such as the GNU GPL. Until a decision is + made and a public statement made, then, and unless it has been + delivered to you other specific terms, please treat Metaproxy as + though it were proprietary software. + The code should not be redistributed without explicit + written permission from the copyright holders, Index Data ApS. + + + + Installation - A package represents a Z39.50 or SRW/U request (whether for Init, - Search, Scan, etc.) together with information about where it came - from. Packages are created by front-end filters such as - frontend_net (see below), which reads them from - the network; other front-end filters are possible. They then pass - along a route consisting of a sequence of filters, each of which - transforms the package and may also have side-effects such as - generating logging. Eventually, the route will yield a response, - which is sent back to the origin. + Metaproxy depends on the following tools/libraries: + + YAZ++ + + + This is a C++ library based on YAZ. + + + + Libxslt + + This is an XSLT processor - based on + Libxml2. Both Libxml2 and + Libxslt must be installed with the development components + (header files, etc.) as well as the run-time libraries. + + + + Boost + + + The popular C++ library. Initial versions of Metaproxy + was built with 1.33.0. Version 1.33.1 works too. + + + + - There are many kinds of filter: some that are defined statically - as part of Metaproxy, and other that may be provided by third parties - and dynamically loaded. They all conform to the same simple API - of essentially two methods: configure() is - called at startup time, and is passed a DOM tree representing that - part of the configuration file that pertains to this filter - instance: it is expected to walk that tree extracting relevant - information; and process() is called every - time the filter has to processes a package. + In order to compile Metaproxy a modern C++ compiler is + required. Boost, in particular, requires the C++ compiler + to facilitate the newest features. Refer to Boost + Compiler Status + for more information. - While all filters provide the same API, there are different modes - of functionality. Some filters are sources: they create - packages - (frontend_net); - others are sinks: they consume packages and return a result - (z3950_client, - backend_test, - http_file); - the others are true filters, that read, process and pass on the - packages they are fed - (auth_simple, - log, - multi, - session_shared, - template, - virt_db). + We have succesfully used Metaproxy with Boost using the compilers + GCC version 4.0 and + Microsoft Visual Studio 2003/2005. -
- - -
- Individual filters - - The filters are here named by the string that is used as the - type attribute of a - <filter> element in the configuration - file to request them, with the name of the class that implements - them in parentheses. - - -
- <literal>auth_simple</literal> - (mp::filter::AuthSimple) - - Simple authentication and authorisation. The configuration - specifies the name of a file that is the user register, which - lists username:password - pairs, one per line, colon separated. When a session begins, it - is rejected unless username and passsword are supplied, and match - a pair in the register. - - - ### discuss authorisation phase - -
- -
- <literal>backend_test</literal> - (mp::filter::Backend_test) - - A sink that provides dummy responses in the manner of the - yaz-ztest Z39.50 server. This is useful only - for testing. - -
-
- <literal>frontend_net</literal> - (mp::filter::FrontendNet) +
+ Installation on Unix (from Source) - A source that accepts Z39.50 and SRW connections from a port - specified in the configuration, reads protocol units, and - feeds them into the next filter, eventually returning the - result to the origin. + Here is a quick step-by-step guide on how to compile all the + tools that Metaproxy uses. Only few systems have none of the required + tools binary packages. If, for example, Libxml2/libxslt are already + installed as development packages use those (and omit compilation). -
- -
- <literal>http_file</literal> - (mp::filter::HttpFile) + - A sink that returns the contents of files from the local - filesystem in response to HTTP requests. (Yes, Virginia, this - does mean that Metaproxy is also a Web-server in its spare time. So - far it does not contain either an email-reader or a Lisp - interpreter, but that day is surely coming.) + Libxml2/libxslt: -
- -
- <literal>log</literal> - (mp::filter::Log) + + gunzip -c libxml2-version.tar.gz|tar xf - + cd libxml2-version + ./configure + make + su + make install + + + gunzip -c libxslt-version.tar.gz|tar xf - + cd libxslt-version + ./configure + make + su + make install + - Writes logging information to standard output, and passes on - the package unchanged. + YAZ/YAZ++: -
- -
- <literal>multi</literal> - (mp::filter::Multi) + + gunzip -c yaz-version.tar.gz|tar xf - + cd yaz-version + ./configure + make + su + make install + + + gunzip -c yazpp-version.tar.gz|tar xf - + cd yazpp-version + ./configure + make + su + make install + - Performs multicast searching. See the extended discussion of - multi-database searching below. + Boost: -
- -
- <literal>session_shared</literal> - (mp::filter::SessionShared) + + gunzip -c boost-version.tar.gz|tar xf - + cd boost-version + ./configure + make + su + make install + - When this is finished, it will implement global sharing of - result sets (i.e. between threads and therefore between - clients), but it's not yet done. + Metaproxy: + + gunzip -c metaproxy-version.tar.gz|tar xf - + cd metaproxy-version + ./configure + make + su + make install +
-
- <literal>template</literal> - (mp::filter::Template) +
+ Installation on Debian - Does nothing at all, merely passing the packet on. (Maybe it - should be called nop or - passthrough?) This exists not to be used, but - to be copied - to become the skeleton of new filters as they are - written. + ### To be written -
+
-
- <literal>virt_db</literal> - (mp::filter::Virt_db) +
+ Installation on Windows - Performs virtual database selection. See the extended discussion - of virtual databases below. + Compilation of Metaproxy can be done using + Microsoft Visual Studio. + We know Version 2003 works. We expect Version 2005 should to + work as well. -
+
+ Boost + + Get Boost from its home page. + You also need Boost Jam (an alternative to make). + That's also available from this + home page. The files download are called something like: + boost_1_33-1.exe + and + boost-jam-3.1.12-1-ntx86.zip. + Unpack Boost Jam first. Put bjam.exe + in your system path. Make a command prompt and ensure + it can be found automatically. If not check the PATH. + The Boost .exe is a self-extracting exe with + complete source for Boost. Compile that source with + Boost Jam (An alternative to Make). + The compilation takes a while. + By default, the Boost build process puts the resulting + libraries + header files in + \boost\lib, \boost\include. + + + For more informatation about installing Boost refer to the + getting started + pages. + +
+ +
+ Libxslt + + Libxslt can be downloaded + for Windows from + here. + + + Libxslt has other dependencies, but thes can all be downloaded + from the same site. Get the following: + iconv, zlib, libxml2, libxslt. + +
+ +
+ YAZ + + YAZ can be downloaded + for Windows from + here. + +
+ +
+ YAZ++ + + Get YAZ++ as well. + Version 1.0 or later is required. For now get it from + Index Data's + Snapshot area. + + + YAZ++ includes NMAKE makefiles, similar to those found in the + YAZ package. + +
+ +
+ Metaproxy + + Metaproxy is shipped with NMAKE makfiles as well - similar + to those found in the YAZ++/YAZ packages. Adjust this Makefile + to point to the proper locations of Boost, Libxslt, Libxml2, + zlib, iconv, yaz and yazpp. + + + After succesful compilation you'll find + metaproxy.exe in the + bin directory. + +
-
- <literal>z3950_client</literal> - (mp::filter::Z3950Client) - - Performs Z39.50 searching and retrieval by proxying the - packages that are passed to it. Init requests are sent to the - address specified in the VAL_PROXY otherInfo - attached to the request: this may have been specified by client, - or generated by a virt_db filter earlier in - the route. Subsequent requests are sent to the same address, - which is remembered at Init time in a Session object. -
-
- - -
- Future directions + + + + The Metaproxy Architecture - Some other filters that do not yet exist, but which would be - useful, are briefly described. These may be added in future - releases. + The Metaproxy architecture is based on three concepts: + the package, + the route + and the filter. - - frontend_cli (source) + Packages - Command-line interface for generating requests. + A package is request or response, encoded in some protocol, + issued by a client, making its way through Metaproxy, send to or + received from a server, or sent back to the client. - - - - srw2z3950 (filter) - - Translate SRW requests into Z39.50 requests. + The core of a package is the protocol unit - for example, a + Z39.50 Init Request or Search Response, or an SRU searchRetrieve + URL or Explain Response. In addition to this core, a package + also carries some extra information added and used by Metaproxy + itself. - - - - srw_client (sink) - - SRW searching and retrieval. + In general, packages are doctored as they pass through + Metaproxy. For example, when the proxy performs authentication + and authorisation on a Z39.50 Init request, it removes the + authentication credentials from the package so that they are not + passed onto the back-end server; and when search-response + packages are obtained from multiple servers, they are merged + into a single unified package that makes its way back to the + client. - sru_client (sink) + Routes - SRU searching and retrieval. + Packages make their way through routes, which can be thought of + as programs that operate on the package data-type. Each + incoming package initially makes its way through a default + route, but may be switched to a different route based on various + considerations. Routes are made up of sequences of filters (see + below). - opensearch_client (sink) + Filters - A9 OpenSearch searching and retrieval. + Filters provide the individual instructions within a route, and + effect the necessary transformations on packages. A particular + configuration of Metaproxy is essentially a set of filters, + described by configuration details and arranged in order in one + or more routes. There are many kinds of filter - about a dozen + at the time of writing with more appearing all the time - each + performing a specific function and configured by different + information. + + + The word ``filter'' is sometimes used rather loosely, in two + different ways: it may be used to mean a particular + type of filter, as when we speak of ``the + auth_simplefilter'' or ``the multi filter''; or it may be used + to be a specific instance of a filter + within a Metaproxy configuration. For example, a single + configuration will often contain multiple instances of the + z3950_client filter. In + operational terms, of these is a separate filter. In practice, + context always make it clear which sense of the word ``filter'' + is being used. + + + Extensibility of Metaproxy is primarily through the creation of + plugins that provide new filters. The filter API is small and + conceptually simple, but there are many details to master. See + the section below on + extensions. -
- - - - - - Configuration: the Metaproxy configuration file format - - -
- Introductory notes - - If Metaproxy is an interpreter providing operations on packages, then - its configuration file can be thought of as a program for that - interpreter. Configuration is by means of a single file, the name - of which is supplied as the sole command-line argument to the - yp2 program. - - - The configuration files are written in XML. (But that's just an - implementation detail - they could just as well have been written - in YAML or Lisp-like S-expressions, or in a custom syntax.) - - - Since XML has been chosen, an XML schema, - config.xsd, is provided for validating - configuration files. This file is supplied in the - etc directory of the Metaproxy distribution. It - can be used by (among other tools) the xmllint - program supplied as part of the libxml2 - distribution: - - - xmllint --noout --schema etc/config.xsd my-config-file.xml - - - (A recent version of libxml2 is required, as - support for XML Schemas is a relatively recent addition.) - -
- -
- Overview of XML structure - - All elements and attributes are in the namespace - . - This is most easily achieved by setting the default namespace on - the top-level element, as here: - - - <yp2 xmlns="http://indexdata.dk/yp2/config/1"> - - - The top-level element is <yp2>. This contains a - <start> element, a <filters> element and a - <routes> element, in that order. <filters> is - optional; the other two are mandatory. All three are - non-repeatable. - - - The <start> element is empty, but carries a - route attribute, whose value is the name of - route at which to start running - analogouse to the name of the - start production in a formal grammar. - - If present, <filters> contains zero or more <filter> - elements; filters carry a type attribute and - contain various elements that provide suitable configuration for - filters of that type. The filter-specific elements are described - below. Filters defined in this part of the file must carry an - id attribute so that they can be referenced - from elsewhere. + Since packages are created and handled by the system itself, and + routes are conceptually simple, most of the remainder of this + document concentrates on filters. After a brief overview of the + filter types follows, along with some thoughts on possible future + directions. - - <routes> contains one or more <route> elements, each - of which must carry an id element. One of the - routes must have the ID value that was specified as the start - route in the <start> element's route - attribute. Each route contains zero or more <filter> - elements. These are of two types. They may be empty, but carry a - refid attribute whose value is the same as the - id of a filter previously defined in the - <filters> section. Alternatively, a route within a filter - may omit the refid attribute, but contain - configuration elements similar to those used for filters defined - in the <filters> section. - -
- - -
- Filter configuration - - All <filter> elements have in common that they must carry a - type attribute whose value is one of the - supported ones, listed in the schema file and discussed below. In - additional, <filters>s occurring the <filters> section - must have an id attribute, and those occurring - within a route must have either a refid - attribute referencing a previously defined filter or contain its - own configuration information. - - - In general, each filter recognises different configuration - elements within its element, as each filter has different - functionality. These are as follows: - - -
- <literal>auth_simple</literal> - - <filter type="auth_simple"> - <userRegister>../etc/example.simple-auth</userRegister> - </filter> - -
+ -
- <literal>backend_test</literal> - - <filter type="backend_test"/> - -
-
- <literal>frontend_net</literal> - - <filter type="frontend_net"> - <threads>10</threads> - <port>@:9000</port> - </filter> - -
+ + Filters + +
- <literal>http_file</literal> - - <filter type="http_file"> - <mimetypes>/etc/mime.types</mimetypes> - <area> - <documentroot>.</documentroot> - <prefix>/etc</prefix> - </area> - </filter> - + Introductory notes + + It's useful to think of Metaproxy as an interpreter providing a small + number of primitives and operations, but operating on a very + complex data type, namely the ``package''. + + + A package represents a Z39.50 or SRU/W request (whether for Init, + Search, Scan, etc.) together with information about where it came + from. Packages are created by front-end filters such as + frontend_net (see below), which reads them from + the network; other front-end filters are possible. They then pass + along a route consisting of a sequence of filters, each of which + transforms the package and may also have side-effects such as + generating logging. Eventually, the route will yield a response, + which is sent back to the origin. + + + There are many kinds of filter: some that are defined statically + as part of Metaproxy, and others may be provided by third parties + and dynamically loaded. They all conform to the same simple API + of essentially two methods: configure() is + called at startup time, and is passed a DOM tree representing that + part of the configuration file that pertains to this filter + instance: it is expected to walk that tree extracting relevant + information; and process() is called every + time the filter has to processes a package. + + + While all filters provide the same API, there are different modes + of functionality. Some filters are sources: they create + packages + (frontend_net); + others are sinks: they consume packages and return a result + (z3950_client, + backend_test, + http_file); + the others are true filters, that read, process and pass on the + packages they are fed + (auth_simple, + log, + multi, + query_rewrite, + session_shared, + template, + virt_db). + +
+ + +
+ Overview of filter types + + We now briefly consider each of the types of filter supported by + the core Metaproxy binary. This overview is intended to give a + flavour of the available functionality; more detailed information + about each type of filter is included below in + the reference guide to Metaproxy filters. + + + The filters are here named by the string that is used as the + type attribute of a + <filter> element in the configuration + file to request them, with the name of the class that implements + them in parentheses. (The classname is not needed for normal + configuration and use of Metaproxy; it is useful only to + developers.) + + + The filters are here listed in alphabetical order: + + +
+ <literal>auth_simple</literal> + (mp::filter::AuthSimple) + + Simple authentication and authorisation. The configuration + specifies the name of a file that is the user register, which + lists username:password + pairs, one per line, colon separated. When a session begins, it + is rejected unless username and passsword are supplied, and match + a pair in the register. The configuration file may also specific + the name of another file that is the target register: this lists + lists username:dbname,dbname... + sets, one per line, with multiple database names separated by + commas. When a search is processed, it is rejected unless the + database to be searched is one of those listed as available to + the user. + +
+ +
+ <literal>backend_test</literal> + (mp::filter::Backend_test) + + A sink that provides dummy responses in the manner of the + yaz-ztest Z39.50 server. This is useful only + for testing. Seriously, you don't need this. Pretend you didn't + even read this section. + +
+ +
+ <literal>frontend_net</literal> + (mp::filter::FrontendNet) + + A source that accepts Z39.50 connections from a port + specified in the configuration, reads protocol units, and + feeds them into the next filter in the route. When the result is + revceived, it is returned to the original origin. + +
+ +
+ <literal>http_file</literal> + (mp::filter::HttpFile) + + A sink that returns the contents of files from the local + filesystem in response to HTTP requests. (Yes, Virginia, this + does mean that Metaproxy is also a Web-server in its spare time. So + far it does not contain either an email-reader or a Lisp + interpreter, but that day is surely coming.) + +
+ +
+ <literal>log</literal> + (mp::filter::Log) + + Writes logging information to standard output, and passes on + the package unchanged. + +
+ +
+ <literal>multi</literal> + (mp::filter::Multi) + + Performs multicast searching. + See + the extended discussion + of virtual databases and multi-database searching below. + +
+ +
+ <literal>query_rewrite</literal> + (mp::filter::QueryRewrite) + + Rewrites Z39.50 Type-1 and Type-101 (``RPN'') queries by a + three-step process: the query is transliterated from Z39.50 + packet structures into an XML representation; that XML + representation is transformed by an XSLT stylesheet; and the + resulting XML is transliterated back into the Z39.50 packet + structure. + +
+ +
+ <literal>session_shared</literal> + (mp::filter::SessionShared) + + When this is finished, it will implement global sharing of + result sets (i.e. between threads and therefore between + clients), yielding performance improvements especially when + incoming requests are from a stateless environment such as a + web-server, in which the client process representing a session + might be any one of many. However: + + + + This filter is not yet completed. + + +
+ +
+ <literal>template</literal> + (mp::filter::Template) + + Does nothing at all, merely passing the packet on. (Maybe it + should be called nop or + passthrough?) This exists not to be used, but + to be copied - to become the skeleton of new filters as they are + written. As with backend_test, this is not + intended for civilians. + +
+ +
+ <literal>virt_db</literal> + (mp::filter::Virt_db) + + Performs virtual database selection: based on the name of the + database in the search request, a server is selected, and its + address added to the request in a VAL_PROXY + otherInfo packet. It will subsequently be used by a + z3950_client filter. + See + the extended discussion + of virtual databases and multi-database searching below. + +
+ +
+ <literal>z3950_client</literal> + (mp::filter::Z3950Client) + + Performs Z39.50 searching and retrieval by proxying the + packages that are passed to it. Init requests are sent to the + address specified in the VAL_PROXY otherInfo + attached to the request: this may have been specified by client, + or generated by a virt_db filter earlier in + the route. Subsequent requests are sent to the same address, + which is remembered at Init time in a Session object. +
- -
- <literal>log</literal> - - <filter type="log"> - <message>B</message> - </filter> -
+ + +
+ Future directions + + Some other filters that do not yet exist, but which would be + useful, are briefly described. These may be added in future + releases (or may be created by third parties, as loadable + modules). + -
- <literal>multi</literal> - - <filter type="multi"/> - + + + frontend_cli (source) + + + Command-line interface for generating requests. + + + + + frontend_sru (source) + + + Receive SRU (and perhaps SRW) requests. + + + + + sru2z3950 (filter) + + + Translate SRU requests into Z39.50 requests. + + + + + sru_client (sink) + + + SRU searching and retrieval. + + + + + srw_client (sink) + + + SRW searching and retrieval. + + + + + opensearch_client (sink) + + + A9 OpenSearch searching and retrieval. + + + +
- + + + + + + Configuration: the Metaproxy configuration file format + +
- <literal>session_shared</literal> + Introductory notes + + If Metaproxy is an interpreter providing operations on packages, then + its configuration file can be thought of as a program for that + interpreter. Configuration is by means of a single file, the name + of which is supplied as the sole command-line argument to the + metaproxy program. (See + the reference guide + below for more information on invoking Metaproxy.) + + + The configuration files are written in XML. (But that's just an + implementation detail - they could just as well have been written + in YAML or Lisp-like S-expressions, or in a custom syntax.) + + + Since XML has been chosen, an XML schema, + config.xsd, is provided for validating + configuration files. This file is supplied in the + etc directory of the Metaproxy distribution. It + can be used by (among other tools) the xmllint + program supplied as part of the libxml2 + distribution: + - <filter type="session_shared"> - ### Not yet defined - </filter> + xmllint --noout --schema etc/config.xsd my-config-file.xml + + (A recent version of libxml2 is required, as + support for XML Schemas is a relatively recent addition.) +
- -
- <literal>template</literal> + +
+ Overview of XML structure + + All elements and attributes are in the namespace + . + This is most easily achieved by setting the default namespace on + the top-level element, as here: + - <filter type="template"/> + <yp2 xmlns="http://indexdata.dk/yp2/config/1"> + + The top-level element is <yp2>. This contains a + <start> element, a <filters> element and a + <routes> element, in that order. <filters> is + optional; the other two are mandatory. All three are + non-repeatable. + + + The <start> element is empty, but carries a + route attribute, whose value is the name of + route at which to start running - analogous to the name of the + start production in a formal grammar. + + + If present, <filters> contains zero or more <filter> + elements. Each filter carries a type attribute + which specifies what kind of filter is being defined + (frontend_net, log, etc.) + and contain various elements that provide suitable configuration + for a filter of its type. The filter-specific elements are + described in + the reference guide below. + Filters defined in this part of the file must carry an + id attribute so that they can be referenced + from elsewhere. + + + <routes> contains one or more <route> elements, each + of which must carry an id element. One of the + routes must have the ID value that was specified as the start + route in the <start> element's route + attribute. Each route contains zero or more <filter> + elements. These are of two types. They may be empty, but carry a + refid attribute whose value is the same as the + id of a filter previously defined in the + <filters> section. Alternatively, a route within a filter + may omit the refid attribute, but contain + configuration elements similar to those used for filters defined + in the <filters> section. (In other words, each filter in a + route may be included either by reference or by physical + inclusion.) +
-
- <literal>virt_db</literal> - - <filter type="virt_db"> - <virtual> - <database>loc</database> - <target>z3950.loc.gov:7090/voyager</target> - </virtual> - <virtual> - <database>idgils</database> - <target>indexdata.dk/gils</target> - </virtual> - </filter> - -
-
- <literal>z3950_client</literal> - - <filter type="z3950_client"> - <timeout>30</timeout> - </filter> - +
+ An example configuration + + The following is a small, but complete, Metaproxy configuration + file (included in the distribution as + metaproxy/etc/config0.xml). + This file defines a very simple configuration that simply proxies + to whatever backend server the client requests, but logs each + request and response. This can be useful for debugging complex + client-server dialogues. + + + + + + + @:9000 + + + + + + + + + + + + +]]> + + It works by defining a single route, called + start, which consists of a sequence of three + filters. The first and last of these are included by reference: + their <filter> elements have + refid attributes that refer to filters defined + within the prior <filters> section. The + middle filter is included inline in the route. + + + The three filters in the route are as follows: first, a + frontend_net filter accepts Z39.50 requests + from any host on port 9000; then these requests are passed through + a log filter that emits a message for each + request; they are then fed into a z3950_client + filter, which forwards the requests to the client-specified + backend Z39.509 server. When the response arrives, it is handed + back to the log filter, which emits another + message; and then to the front-end filter, which returns the + response to the client. +
-
- Virtual database as multi-database searching + Virtual databases and multi-database searching -
- Introductory notes - - Two of Metaproxy's filters are concerned with multiple-database - operations. Of these, virt_db can work alone - to control the routing of searches to one of a number of servers, - while multi can work with the output of - virt_db to perform multicast searching, merging - the results into a unified result-set. The interaction between - these two filters is necessarily complex, reflecting the real - complexity of multicast searching in a protocol such as Z39.50 - that separates initialisation from searching, with the database to - search known only during the latter operation. - - - ### Much, much more to say! - -
-
+
+ Introductory notes + + Lark's vomit + + This chapter goes into a level of technical detail that is + probably not necessary in order to configure and use Metaproxy. + It is provided only for those who like to know how things work. + You should feel free to skip on to the next section if this one + doesn't seem like fun. + + + + Two of Metaproxy's filters are concerned with multiple-database + operations. Of these, virt_db can work alone + to control the routing of searches to one of a number of servers, + while multi can work with the output of + virt_db to perform multicast searching, merging + the results into a unified result-set. The interaction between + these two filters is necessarily complex: it reflecting the real, + irreducible complexity of multicast searching in a protocol such + as Z39.50 that separates initialisation from searching, and in + which the database to be searched is not known at initialisation + time. + + + Hold on tight - this may get a little hairy. + +
- - Module Reference - - The material in this chapter includes the man pages material - - &manref; + +
+ Virtual databases with the <literal>virt_db</literal> filter + + In the general course of things, a Z39.50 Init request may carry + with it an otherInfo packet of type VAL_PROXY, + whose value indicates the address of a Z39.50 server to which the + ultimate connection is to be made. (This otherInfo packet is + supported by YAZ-based Z39.50 clients and servers, but has not yet + been ratified by the Maintenance Agency and so is not widely used + in non-Index Data software. We're working on it.) + The VAL_PROXY packet functions + analogously to the absoluteURI-style Request-URI used with the GET + method when a web browser asks a proxy to forward its request: see + the + Request-URI + section of + the HTTP 1.1 specification. + + + The role of the virt_db filter is to rewrite + this otherInfo packet dependent on the virtual database that the + client wants to search. For example, a virt_db + filter could be set up so that searches in the virtual database + ``lc'' are forwarded to the Library of Congress server, and + searches in the virtual database ``id'' are forwarded to the toy + GILS database that Index Data hosts for testing purposes. A + virt_db configuration to make this switch would + look like this: + + + + lc + z3950.loc.gov:7090/Voyager + + + id + indexdata.dk/gils + + ]]> + + When Metaproxy receives a Z39.50 Init request from a client, it + doesn't immediately forward that request to the back-end server. + Why not? Because it doesn't know which + back-end server to forward it to until the client sends a search + request that specifies the database that it wants to search in. + Instead, it just treasures the Init request up in its heart; and, + later, the first time the client does a search on one of the + specified virtual databases, a connection is forged to the + appropriate server and the Init request is forwarded to it. If, + later in the session, the same client searches in a different + virtual database, then a connection is forged to the server that + hosts it, and the same cached Init request is forwarded there, + too. + + + All of this clever Init-delaying is done by the + frontend_net filter. The + virt_db filter knows nothing about it; in + fact, because the Init request that is received from the client + doesn't get forwarded until a Search reqeust is received, the + virt_db filter (and the + z3950_client filter behind it) doesn't even get + invoked at Init time. The only thing that a + virt_db filter ever does is rewrite the + VAL_PROXY otherInfo in the requests that pass + through it. + +
+ +
+ A picture is worth a thousand words (but only five hundred on 64-bit architectures) + + + + + + + + + + + + Diagram showing the progress of packages through the filters + during a simple virtual-database search and a multi-database + search. + + + + + +
- - Classes in the Metaproxy source code -
- Introductory notes - - Stop! Do not read this! - You won't enjoy it at all. - - - This chapter contains documentation of the Metaproxy source code, and is - of interest only to maintainers and developers. If you need to - change Metaproxy's behaviour or write a new filter, then you will most - likely find this chapter helpful. Otherwise it's a waste of your - good time. Seriously: go and watch a film or something. - This is Spinal Tap is particularly good. - - - Still here? OK, let's continue. - - - In general, classes seem to be named big-endianly, so that - FactoryFilter is not a filter that filters - factories, but a factory that produces filters; and - FactoryStatic is a factory for the statically - registered filters (as opposed to those that are dynamically - loaded). - -
+ + Writing extensions for Metaproxy + ### To be written + -
- Individual classes - - The classes making up the Metaproxy application are here listed by - class-name, with the names of the source files that define them in - parentheses. - -
- <literal>mp::FactoryFilter</literal> - (<filename>factory_filter.cpp</filename>) - - A factory class that exists primarily to provide the - create() method, which takes the name of a - filter class as its argument and returns a new filter of that - type. To enable this, the factory must first be populated by - calling add_creator() for static filters (this - is done by the FactoryStatic class, see below) - and add_creator_dyn() for filters loaded - dynamically. - -
-
- <literal>mp::FactoryStatic</literal> - (<filename>factory_static.cpp</filename>) - - A subclass of FactoryFilter which is - responsible for registering all the statically defined filter - types. It does this by knowing about all those filters' - structures, which are listed in its constructor. Merely - instantiating this class registers all the static classes. It is - for the benefit of this class that struct - yp2_filter_struct exists, and that all the filter - classes provide a static object of that type. - -
-
- <literal>mp::filter::Base</literal> - (<filename>filter.cpp</filename>) - - The virtual base class of all filters. The filter API is, on the - surface at least, extremely simple: two methods. - configure() is passed a DOM tree representing - that part of the configuration file that pertains to this filter - instance, and is expected to walk that tree extracting relevant - information. And process() processes a - package (see below). That surface simplicitly is a bit - misleading, as process() needs to know a lot - about the Package class in order to do - anything useful. - -
+ + Classes in the Metaproxy source code -
- <literal>mp::filter::AuthSimple</literal>, - <literal>Backend_test</literal>, etc. - (<filename>filter_auth_simple.cpp</filename>, - <filename>filter_backend_test.cpp</filename>, etc.) - - Individual filters. Each of these is implemented by a header and - a source file, named filter_*.hpp and - filter_*.cpp respectively. All the header - files should be pretty much identical, in that they declare the - class, including a private Rep class and a - member pointer to it, and the two public methods. The only extra - information in any filter header is additional private types and - members (which should really all be in the Rep - anyway) and private methods (which should also remain known only - to the source file, but C++'s brain-damaged design requires this - dirty laundry to be exhibited in public. Thanks, Bjarne!) - - - The source file for each filter needs to supply: - - - - - A definition of the private Rep class. - - - - - Some boilerplate constructors and destructors. - - - - - A configure() method that uses the - appropriate XML fragment. - - - - - Most important, the process() method that - does all the actual work. - - - -
- <literal>mp::Package</literal> - (<filename>package.cpp</filename>) + Introductory notes - Represents a package on its way through the series of filters - that make up a route. This is essentially a Z39.50 or SRU APDU - together with information about where it came from, which is - modified as it passes through the various filters. + Stop! Do not read this! + You won't enjoy it at all. You should just skip ahead to + the reference guide, + which tells + + you things you really need to know, like the fact that the + fabulously beautiful planet Bethselamin is now so worried about + the cumulative erosion by ten billion visiting tourists a year + that any net imbalance between the amount you eat and the amount + you excrete whilst on the planet is surgically removed from your + bodyweight when you leave: so every time you go to the lavatory it + is vitally important to get a receipt. -
- -
- <literal>mp::Pipe</literal> - (<filename>pipe.cpp</filename>) - This class provides a compatibility layer so that we have an IPC - mechanism that works the same under Unix and Windows. It's not - particularly exciting. + This chapter contains documentation of the Metaproxy source code, and is + of interest only to maintainers and developers. If you need to + change Metaproxy's behaviour or write a new filter, then you will most + likely find this chapter helpful. Otherwise it's a waste of your + good time. Seriously: go and watch a film or something. + This is Spinal Tap is particularly good. -
- -
- <literal>mp::RouterChain</literal> - (<filename>router_chain.cpp</filename>) - ### + Still here? OK, let's continue. -
- -
- <literal>mp::RouterFleXML</literal> - (<filename>router_flexml.cpp</filename>) - ### + In general, classes seem to be named big-endianly, so that + FactoryFilter is not a filter that filters + factories, but a factory that produces filters; and + FactoryStatic is a factory for the statically + registered filters (as opposed to those that are dynamically + loaded).
-
- <literal>mp::Session</literal> - (<filename>session.cpp</filename>) +
+ Individual classes - ### + The classes making up the Metaproxy application are here listed by + class-name, with the names of the source files that define them in + parentheses. -
-
- <literal>mp::ThreadPoolSocketObserver</literal> - (<filename>thread_pool_observer.cpp</filename>) - - ### - +
+ <literal>mp::FactoryFilter</literal> + (<filename>factory_filter.cpp</filename>) + + A factory class that exists primarily to provide the + create() method, which takes the name of a + filter class as its argument and returns a new filter of that + type. To enable this, the factory must first be populated by + calling add_creator() for static filters (this + is done by the FactoryStatic class, see below) + and add_creator_dyn() for filters loaded + dynamically. + +
+ +
+ <literal>mp::FactoryStatic</literal> + (<filename>factory_static.cpp</filename>) + + A subclass of FactoryFilter which is + responsible for registering all the statically defined filter + types. It does this by knowing about all those filters' + structures, which are listed in its constructor. Merely + instantiating this class registers all the static classes. It is + for the benefit of this class that struct + metaproxy_1_filter_struct exists, and that all the filter + classes provide a static object of that type. + +
+ +
+ <literal>mp::filter::Base</literal> + (<filename>filter.cpp</filename>) + + The virtual base class of all filters. The filter API is, on the + surface at least, extremely simple: two methods. + configure() is passed a DOM tree representing + that part of the configuration file that pertains to this filter + instance, and is expected to walk that tree extracting relevant + information. And process() processes a + package (see below). That surface simplicitly is a bit + misleading, as process() needs to know a lot + about the Package class in order to do + anything useful. + +
+ +
+ <literal>mp::filter::AuthSimple</literal>, + <literal>Backend_test</literal>, etc. + (<filename>filter_auth_simple.cpp</filename>, + <filename>filter_backend_test.cpp</filename>, etc.) + + Individual filters. Each of these is implemented by a header and + a source file, named filter_*.hpp and + filter_*.cpp respectively. All the header + files should be pretty much identical, in that they declare the + class, including a private Rep class and a + member pointer to it, and the two public methods. The only extra + information in any filter header is additional private types and + members (which should really all be in the Rep + anyway) and private methods (which should also remain known only + to the source file, but C++'s brain-damaged design requires this + dirty laundry to be exhibited in public. Thanks, Bjarne!) + + + The source file for each filter needs to supply: + + + + + A definition of the private Rep class. + + + + + Some boilerplate constructors and destructors. + + + + + A configure() method that uses the + appropriate XML fragment. + + + + + Most important, the process() method that + does all the actual work. + + + +
+ +
+ <literal>mp::Package</literal> + (<filename>package.cpp</filename>) + + Represents a package on its way through the series of filters + that make up a route. This is essentially a Z39.50 or SRU APDU + together with information about where it came from, which is + modified as it passes through the various filters. + +
+ +
+ <literal>mp::Pipe</literal> + (<filename>pipe.cpp</filename>) + + This class provides a compatibility layer so that we have an IPC + mechanism that works the same under Unix and Windows. It's not + particularly exciting. + +
+ +
+ <literal>mp::RouterChain</literal> + (<filename>router_chain.cpp</filename>) + + ### to be written + +
+ +
+ <literal>mp::RouterFleXML</literal> + (<filename>router_flexml.cpp</filename>) + + ### to be written + +
+ +
+ <literal>mp::Session</literal> + (<filename>session.cpp</filename>) + + ### to be written + +
+ +
+ <literal>mp::ThreadPoolSocketObserver</literal> + (<filename>thread_pool_observer.cpp</filename>) + + ### to be written + +
+ +
+ <literal>mp::util</literal> + (<filename>util.cpp</filename>) + + A namespace of various small utility functions and classes, + collected together for convenience. Most importantly, includes + the mp::util::odr class, a wrapper for YAZ's + ODR facilities. + +
+ +
+ <literal>mp::xml</literal> + (<filename>xmlutil.cpp</filename>) + + A namespace of various XML utility functions and classes, + collected together for convenience. + +
-
- <literal>mp::util</literal> - (<filename>util.cpp</filename>) + +
+ Other Source Files - A namespace of various small utility functions and classes, - collected together for convenience. Most importantly, includes - the mp::util::odr class, a wrapper for YAZ's - ODR facilities. + In addition to the Metaproxy source files that define the classes + described above, there are a few additional files which are + briefly described here: -
- -
- <literal>mp::xml</literal> - (<filename>xmlutil.cpp</filename>) + + + metaproxy_prog.cpp + + + The main function of the metaproxy program. + + + + + ex_router_flexml.cpp + + + Identical to metaproxy_prog.cpp: it's not clear why. + + + + + test_*.cpp + + + Unit-tests for various modules. + + + + - A namespace of various XML utility functions and classes, - collected together for convenience. + ### Still to be described: + ex_filter_frontend_net.cpp, + filter_dl.cpp, + plainfile.cpp, + tstdl.cpp.
-
+ -
- Other Source Files - - In addition to the Metaproxy source files that define the classes - described above, there are a few additional files which are - briefly described here: - - - - metaproxy_prog.cpp - - - The main function of the yp2 program. - - - - - ex_router_flexml.cpp - - - Identical to metaproxy_prog.cpp: it's not clear why. - - - - - test_*.cpp - - - Unit-tests for various modules. - - - - - - ### Still to be described: - ex_filter_frontend_net.cpp, - filter_dl.cpp, - plainfile.cpp, - tstdl.cpp. - - - - + + + Reference guide - -- + The material in this chapter is drawn directly from the individual + manual entries. In particular, the Metaproxy invocation section is + available using man metaproxy, and the section + on each individual filter is available using the name of the filter + as the argument to the man command. - - - - - - - -
+ + +
+ Metaproxy invocation + &progref; +
+ + +
+ Reference guide to Metaproxy filters + &manref; +
- + + +