X-Git-Url: http://lists.indexdata.com/cgi-bin?a=blobdiff_plain;f=doc%2Fbook.xml;h=1db13c54aec60a18fad6ec00f5298eefa23e403e;hb=f1ef386cc963c62b97f20332c77a474e895daf26;hp=74bd70d8ebf80c30dbe1fdc49c6e8fe982160d08;hpb=9e8552ead7fb4fc4d884222c5aac08dd94e3f450;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 74bd70d..1db13c5 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,4 +1,4 @@ - + Metaproxy - User's Guide and Reference @@ -9,11 +9,29 @@ 2006 - Index Data + Index Data ApS - Metaproxy - universal Z39.50/SRU router, proxy and encapsulated metasearcher + Metaproxy is a universal router, proxy and encapsulated + metasearcher for information retrieval protocols. It accepts, + processes, interprets and redirects requests from IR clients using + standard protocols such as ANSI/NISO Z39.50 (and in the future SRU + and SRW), as well as functioning as a limited + HTTP server. Metaproxy is configured by an XML file which + specifies how the software should function in terms of routes that + the request packets can take through the proxy, each step on a + route being an instantiation of a filter. Filters come in many + types, one for each operation: accepting Z39.50 packets, logging, + query transformation, multiplexing, etc. Further filter-types can + be added as loadable modules to extend Metaproxy functionality, + using the filter API. + + + The terms under which Metaproxy will be distributed have yet to be + established, but it will not necessarily be open source; so users + should not at this stage redistribute the code without explicit + written permission from the copyright holders, Index Data ApS. @@ -22,23 +40,31 @@ Introduction -
- Overview - Metaproxy + Metaproxy is a standalone program that acts as a universal router, proxy and encapsulated metasearcher for information retrieval protocols such - as Z39.50 and SRU/SRW. To clients, it acts as a server of these + as Z39.50, and in the future SRU and SRW. To clients, it acts as a + server of these protocols: it can be searched, records can be retrieved from it, etc. To servers, it acts as a client: it searches in them, retrieves records from them, etc. it satisfies its clients' requests by transforming them, multiplexing them, forwarding them on to zero or more servers, merging the results, transforming - them, and delivering them back to the client. + them, and delivering them back to the client. In addition, it + acts as a simple HTTP server; support for further protocols can be + added in a modular fashion, through the creation of new filters. + + Anything goes in! + Anything goes out! + Cold bananas, fish, pyjamas, + Mutton, beef and trout! + - attributed to Cole Porter. + Metaproxy is a more capable alternative to - YAZ Proxy, + YAZ Proxy, being more powerful, flexible, configurable and extensible. Among its many advantages over the older, more pedestrian work are support for multiplexing (encapsulated metasearching), routing by @@ -47,7 +73,20 @@ facilitites the creation of pluggable modules implementing further functionality. -
+ + This manual will briefly describe Metaproxy's licensing situation + before giving an overview of its architecture, then discussing the + key concept of a filter in some depth and giving an overview of + the various filter types, then discussing the configuration file + format. After this come several optional chapters which may be + freely skipped: a detailed discussion of virtual databases and + multi-database searching, some notes on writing extensions + (additional filter types) and a high-level description of the + source code. Finally comes the reference guide, which contains + instructions for invoking the metaproxy + program, and detailed information on each type of filter, + including examples. + @@ -65,6 +104,8 @@ made and a public statement made, then, and unless it has been delivered to you other specific terms, please treat Metaproxy as though it were proprietary software. + The code should not be redistributed without explicit + written permission from the copyright holders, Index Data ApS. @@ -137,9 +178,10 @@ different ways: it may be used to mean a particular type of filter, as when we speak of ``the auth_simplefilter'' or ``the multi filter''; or it may be used - to be a specific instance of a filter within a Metaproxy - configuration. For example, a single configuration will often - contain multiple instances of the z3950_client filter. In + to be a specific instance of a filter + within a Metaproxy configuration. For example, a single + configuration will often contain multiple instances of the + z3950_client filter. In operational terms, of these is a separate filter. In practice, context always make it clear which sense of the word ``filter'' is being used. @@ -177,7 +219,7 @@ complex data type, namely the ``package''. - A package represents a Z39.50 or SRW/U request (whether for Init, + A package represents a Z39.50 or SRU/W request (whether for Init, Search, Scan, etc.) together with information about where it came from. Packages are created by front-end filters such as frontend_net (see below), which reads them from @@ -189,7 +231,7 @@ There are many kinds of filter: some that are defined statically - as part of Metaproxy, and other that may be provided by third parties + as part of Metaproxy, and others may be provided by third parties and dynamically loaded. They all conform to the same simple API of essentially two methods: configure() is called at startup time, and is passed a DOM tree representing that @@ -220,14 +262,27 @@ -
- Individual filters +
+ Overview of filter types + + We now briefly consider each of the types of filter supported by + the core Metaproxy binary. This overview is intended to give a + flavour of the available functionality; more detailed information + about each type of filter is included below in + the reference guide to Metaproxy filters. + The filters are here named by the string that is used as the type attribute of a <filter> element in the configuration file to request them, with the name of the class that implements - them in parentheses. + them in parentheses. (The classname is not needed for normal + configuration and use of Metaproxy; it is useful only to + developers.) + + + The filters are here listed in alphabetical order:
@@ -239,10 +294,13 @@ lists username:password pairs, one per line, colon separated. When a session begins, it is rejected unless username and passsword are supplied, and match - a pair in the register. - - - ### discuss authorisation phase + a pair in the register. The configuration file may also specific + the name of another file that is the target register: this lists + lists username:dbname,dbname... + sets, one per line, with multiple database names separated by + commas. When a search is processed, it is rejected unless the + database to be searched is one of those listed as available to + the user.
@@ -252,7 +310,8 @@ A sink that provides dummy responses in the manner of the yaz-ztest Z39.50 server. This is useful only - for testing. + for testing. Seriously, you don't need this. Pretend you didn't + even read this section.
@@ -260,10 +319,10 @@ <literal>frontend_net</literal> (mp::filter::FrontendNet) - A source that accepts Z39.50 and SRW connections from a port + A source that accepts Z39.50 connections from a port specified in the configuration, reads protocol units, and - feeds them into the next filter, eventually returning the - result to the origin. + feeds them into the next filter in the route. When the result is + revceived, it is returned to the original origin.
@@ -292,8 +351,23 @@ <literal>multi</literal> (mp::filter::Multi) - Performs multicast searching. See the extended discussion of - multi-database searching below. + Performs multicast searching. + See + the extended discussion + of virtual databases and multi-database searching below. + + + +
+ <literal>query_rewrite</literal> + (mp::filter::QueryRewrite) + + Rewrites Z39.50 Type-1 and Type-101 (``RPN'') queries by a + three-step process: the query is transliterated from Z39.50 + packet structures into an XML representation; that XML + representation is transformed by an XSLT stylesheet; and the + resulting XML is transliterated back into the Z39.50 packet + structure.
@@ -303,8 +377,16 @@ When this is finished, it will implement global sharing of result sets (i.e. between threads and therefore between - clients), but it's not yet done. + clients), yielding performance improvements especially when + incoming requests are from a stateless environment such as a + web-server, in which the client process representing a session + might be any one of many. However: + + + This filter is not yet completed. + +
@@ -315,7 +397,8 @@ should be called nop or passthrough?) This exists not to be used, but to be copied - to become the skeleton of new filters as they are - written. + written. As with backend_test, this is not + intended for civilians.
@@ -323,8 +406,14 @@ <literal>virt_db</literal> (mp::filter::Virt_db) - Performs virtual database selection. See the extended discussion - of virtual databases below. + Performs virtual database selection: based on the name of the + database in the search request, a server is selected, and its + address added to the request in a VAL_PROXY + otherInfo packet. It will subsequently be used by a + z3950_client filter. + See + the extended discussion + of virtual databases and multi-database searching below. @@ -344,12 +433,13 @@ -
+
Future directions Some other filters that do not yet exist, but which would be useful, are briefly described. These may be added in future - releases. + releases (or may be created by third parties, as loadable + modules). @@ -362,19 +452,19 @@ - srw2z3950 (filter) + frontend_sru (source) - Translate SRW requests into Z39.50 requests. + Receive SRU (and perhaps SRW) requests. - srw_client (sink) + sru2z3950 (filter) - SRW searching and retrieval. - + Translate SRU requests into Z39.50 requests. + @@ -386,6 +476,14 @@ + srw_client (sink) + + + SRW searching and retrieval. + + + + opensearch_client (sink) @@ -410,7 +508,9 @@ its configuration file can be thought of as a program for that interpreter. Configuration is by means of a single file, the name of which is supplied as the sole command-line argument to the - yp2 program. + metaproxy program. (See + the reference guide + below for more information on invoking Metaproxy.) The configuration files are written in XML. (But that's just an @@ -435,7 +535,7 @@
-
+
Overview of XML structure All elements and attributes are in the namespace @@ -456,15 +556,19 @@ The <start> element is empty, but carries a route attribute, whose value is the name of - route at which to start running - analogouse to the name of the + route at which to start running - analogous to the name of the start production in a formal grammar. If present, <filters> contains zero or more <filter> - elements; filters carry a type attribute and - contain various elements that provide suitable configuration for - filters of that type. The filter-specific elements are described - below. Filters defined in this part of the file must carry an + elements. Each filter carries a type attribute + which specifies what kind of filter is being defined + (frontend_net, log, etc.) + and contain various elements that provide suitable configuration + for a filter of its type. The filter-specific elements are + described in + the reference guide below. + Filters defined in this part of the file must carry an id attribute so that they can be referenced from elsewhere. @@ -480,135 +584,86 @@ <filters> section. Alternatively, a route within a filter may omit the refid attribute, but contain configuration elements similar to those used for filters defined - in the <filters> section. + in the <filters> section. (In other words, each filter in a + route may be included either by reference or by physical + inclusion.)
-
- Filter configuration +
+ An example configuration - All <filter> elements have in common that they must carry a - type attribute whose value is one of the - supported ones, listed in the schema file and discussed below. In - additional, <filters>s occurring the <filters> section - must have an id attribute, and those occurring - within a route must have either a refid - attribute referencing a previously defined filter or contain its - own configuration information. + The following is a small, but complete, Metaproxy configuration + file (included in the distribution as + metaproxy/etc/config0.xml). + This file defines a very simple configuration that simply proxies + to whatever backend server the client requests, but logs each + request and response. This can be useful for debugging complex + client-server dialogues. + + + + + + @:9000 + + + + + + + + + + + + +]]> - In general, each filter recognises different configuration - elements within its element, as each filter has different - functionality. These are as follows: + It works by defining a single route, called + start, which consists of a sequence of three + filters. The first and last of these are included by reference: + their <filter> elements have + refid attributes that refer to filters defined + within the prior <filters> section. The + middle filter is included inline in the route. + + + The three filters in the route are as follows: first, a + frontend_net filter accepts Z39.50 requests + from any host on port 9000; then these requests are passed through + a log filter that emits a message for each + request; they are then fed into a z3950_client + filter, which forwards the requests to the client-specified + backend Z39.509 server. When the response arrives, it is handed + back to the log filter, which emits another + message; and then to the front-end filter, which returns the + response to the client. - -
- <literal>auth_simple</literal> - - <filter type="auth_simple"> - <userRegister>../etc/example.simple-auth</userRegister> - </filter> - -
- -
- <literal>backend_test</literal> - - <filter type="backend_test"/> - -
- -
- <literal>frontend_net</literal> - - <filter type="frontend_net"> - <threads>10</threads> - <port>@:9000</port> - </filter> - -
- -
- <literal>http_file</literal> - - <filter type="http_file"> - <mimetypes>/etc/mime.types</mimetypes> - <area> - <documentroot>.</documentroot> - <prefix>/etc</prefix> - </area> - </filter> - -
- -
- <literal>log</literal> - - <filter type="log"> - <message>B</message> - </filter> - -
- -
- <literal>multi</literal> - - <filter type="multi"/> - -
- -
- <literal>session_shared</literal> - - <filter type="session_shared"> - ### Not yet defined - </filter> - -
- -
- <literal>template</literal> - - <filter type="template"/> - -
- -
- <literal>virt_db</literal> - - <filter type="virt_db"> - <virtual> - <database>loc</database> - <target>z3950.loc.gov:7090/voyager</target> - </virtual> - <virtual> - <database>idgils</database> - <target>indexdata.dk/gils</target> - </virtual> - </filter> - -
- -
- <literal>z3950_client</literal> - - <filter type="z3950_client"> - <timeout>30</timeout> - </filter> - -
- Virtual database as multi-database searching + Virtual databases and multi-database searching
Introductory notes + + Lark's vomit + + This chapter goes into a level of technical detail that is + probably not necessary in order to configure and use Metaproxy. + It is provided only for those who like to know how things work. + You should feel free to skip on to the next section if this one + doesn't seem like fun. + + Two of Metaproxy's filters are concerned with multiple-database operations. Of these, virt_db can work alone @@ -616,30 +671,96 @@ while multi can work with the output of virt_db to perform multicast searching, merging the results into a unified result-set. The interaction between - these two filters is necessarily complex, reflecting the real - complexity of multicast searching in a protocol such as Z39.50 - that separates initialisation from searching, with the database to - search known only during the latter operation. + these two filters is necessarily complex: it reflecting the real, + irreducible complexity of multicast searching in a protocol such + as Z39.50 that separates initialisation from searching, and in + which the database to be searched is not known at initialisation + time. + + + Hold on tight - this may get a little hairy. + + + In the general course of things, a Z39.50 Init request may carry + with it an otherInfo packet of type VAL_PROXY, + whose value indicates the address of a Z39.50 server to which the + ultimate connection is to be made. (This otherInfo packet is + supported by YAZ-based Z39.50 clients and servers, but has not yet + been ratified by the Maintenance Agency and so is not widely used + in non-Index Data software. We're working on it.) + The VAL_PROXY packet functions + analogously to the absoluteURI-style Request-URI used with the GET + method when a web browser asks a proxy to forward its request: see + the + Request-URI + section of + the HTTP 1.1 specification. - ### Much, much more to say! + The role of the virt_db filter is to rewrite + this otherInfo packet dependent on the virtual database that the + client wants to search. For example, a virt_db + filter could be set up so that searches in the virtual database + ``lc'' are forwarded to the Library of Congress server, and + searches in the virtual database ``id'' are forwarded to the toy + GILS database that Index Data hosts for testing purposes. A + virt_db configuration to make this switch would + look like this: + + + + lc + z3950.loc.gov:7090/Voyager + + + id + indexdata.dk/gils + + ]]> + + When Metaproxy receives a Z39.50 Init request from a client, it + doesn't immediately forward that request to the back-end server. + Why not? Because it doesn't know which + back-end server to forward it to until the client sends a search + request that specifies the database that it wants to search in. + Instead, it just treasures the Init request up in its heart; and, + later, the first time the client does a search on one of the + specified virtual databases, a connection is forged to the + appropriate server and the Init request is forwarded to it. If, + later in the session, the same client searches in a different + virtual database, then a connection is forged to the server that + hosts it, and the same cached Init request is forwarded there, + too. + + + All of this clever Init-delaying is done by the + frontend_net filter. The + virt_db filter knows nothing about it; in + fact, because the Init request that is received from the client + doesn't get forwarded until a Search reqeust is received, the + virt_db filter (and the + z3950_client filter behind it) doesn't even get + invoked at Init time. The only thing that a + virt_db filter ever does is rewrite the + VAL_PROXY otherInfo in the requests that pass + through it.
- - Module Reference - - The material in this chapter includes the man pages material - - &manref; - + Writing extensions for Metaproxy - ### + ### To be written + + + Classes in the Metaproxy source code @@ -648,7 +769,18 @@ Introductory notes Stop! Do not read this! - You won't enjoy it at all. + You won't enjoy it at all. You should just skip ahead to + the reference guide, + which tells + + you things you really need to know, like the fact that the + fabulously beautiful planet Bethselamin is now so worried about + the cumulative erosion by ten billion visiting tourists a year + that any net imbalance between the amount you eat and the amount + you excrete whilst on the planet is surgically removed from your + bodyweight when you leave: so every time you go to the lavatory it + is vitally important to get a receipt. This chapter contains documentation of the Metaproxy source code, and is @@ -671,7 +803,7 @@
-
+
Individual classes The classes making up the Metaproxy application are here listed by @@ -704,7 +836,7 @@ structures, which are listed in its constructor. Merely instantiating this class registers all the static classes. It is for the benefit of this class that struct - yp2_filter_struct exists, and that all the filter + metaproxy_1_filter_struct exists, and that all the filter classes provide a static object of that type.
@@ -798,7 +930,7 @@ <literal>mp::RouterChain</literal> (<filename>router_chain.cpp</filename>) - ### + ### to be written
@@ -806,7 +938,7 @@ <literal>mp::RouterFleXML</literal> (<filename>router_flexml.cpp</filename>) - ### + ### to be written
@@ -814,7 +946,7 @@ <literal>mp::Session</literal> (<filename>session.cpp</filename>) - ### + ### to be written
@@ -822,7 +954,7 @@ <literal>mp::ThreadPoolSocketObserver</literal> (<filename>thread_pool_observer.cpp</filename>) - ### + ### to be written @@ -848,7 +980,7 @@ -
+
Other Source Files In addition to the Metaproxy source files that define the classes @@ -860,7 +992,7 @@ metaproxy_prog.cpp - The main function of the yp2 program. + The main function of the metaproxy program. @@ -888,23 +1020,36 @@ plainfile.cpp, tstdl.cpp. - - - - - -- - - - - - - - -
+ + + + Reference guide + + The material in this chapter is drawn directly from the individual + manual entries. In particular, the Metaproxy invocation section is + available using man metaproxy, and the section + on each individual filter is available using the name of the filter + as the argument to the man command. + + + +
+ Metaproxy invocation + &progref; +
+ + +
+ Reference guide to Metaproxy filters + &manref; +
+
+ + +