1 <!-- $Id: book.xml,v 1.2 2006-03-16 13:20:05 adam Exp $ -->
3 <title>Metaproxy - User's Guide and Reference</title>
5 <firstname>Mike</firstname><surname>Taylor</surname>
9 <holder>Index Data</holder>
14 Metaproxy is ... in need of description :-)
21 <chapter id="introduction">
22 <title>Introduction</title>
26 <title>Overview</title>
28 <ulink url="http://indexdata.dk/metaproxy/">Metaproxy</ulink>
32 ### We should probably consider saying a little more by way of
40 <chapter id="filters">
41 <title>Filters</title>
45 <title>Introductory notes</title>
47 It's useful to think of Metaproxy as an interpreter providing a small
48 number of primitives and operations, but operating on a very
49 complex data type, namely the ``package''.
52 A package represents a Z39.50 or SRW/U request (whether for Init,
53 Search, Scan, etc.) together with information about where it came
54 from. Packages are created by front-end filters such as
55 <literal>frontend_net</literal> (see below), which reads them from
56 the network; other front-end filters are possible. They then pass
57 along a route consisting of a sequence of filters, each of which
58 transforms the package and may also have side-effects such as
59 generating logging. Eventually, the route will yield a response,
60 which is sent back to the origin.
63 There are many kinds of filter: some that are defined statically
64 as part of Metaproxy, and other that may be provided by third parties
65 and dynamically loaded. They all conform to the same simple API
66 of essentially two methods: <function>configure()</function> is
67 called at startup time, and is passed a DOM tree representing that
68 part of the configuration file that pertains to this filter
69 instance: it is expected to walk that tree extracting relevant
70 information; and <function>process()</function> is called every
71 time the filter has to processes a package.
74 While all filters provide the same API, there are different modes
75 of functionality. Some filters are sources: they create
77 (<literal>frontend_net</literal>);
78 others are sinks: they consume packages and return a result
79 (<literal>z3950_client</literal>,
80 <literal>backend_test</literal>,
81 <literal>http_file</literal>);
82 the others are true filters, that read, process and pass on the
84 (<literal>auth_simple</literal>,
85 <literal>log</literal>,
86 <literal>multi</literal>,
87 <literal>session_shared</literal>,
88 <literal>template</literal>,
89 <literal>virt_db</literal>).
95 <title>Individual filters</title>
97 The filters are here named by the string that is used as the
98 <literal>type</literal> attribute of a
99 <literal><filter></literal> element in the configuration
100 file to request them, with the name of the class that implements
105 <title><literal>auth_simple</literal>
106 (mp::filter::AuthSimple)</title>
108 Simple authentication and authorisation. The configuration
109 specifies the name of a file that is the user register, which
110 lists <varname>username</varname>:<varname>password</varname>
111 pairs, one per line, colon separated. When a session begins, it
112 is rejected unless username and passsword are supplied, and match
113 a pair in the register.
116 ### discuss authorisation phase
121 <title><literal>backend_test</literal>
122 (mp::filter::Backend_test)</title>
124 A sink that provides dummy responses in the manner of the
125 <literal>yaz-ztest</literal> Z39.50 server. This is useful only
131 <title><literal>frontend_net</literal>
132 (mp::filter::FrontendNet)</title>
134 A source that accepts Z39.50 and SRW connections from a port
135 specified in the configuration, reads protocol units, and
136 feeds them into the next filter, eventually returning the
137 result to the origin.
142 <title><literal>http_file</literal>
143 (mp::filter::HttpFile)</title>
145 A sink that returns the contents of files from the local
146 filesystem in response to HTTP requests. (Yes, Virginia, this
147 does mean that Metaproxy is also a Web-server in its spare time. So
148 far it does not contain either an email-reader or a Lisp
149 interpreter, but that day is surely coming.)
154 <title><literal>log</literal>
155 (mp::filter::Log)</title>
157 Writes logging information to standard output, and passes on
158 the package unchanged.
163 <title><literal>multi</literal>
164 (mp::filter::Multi)</title>
166 Performs multicast searching. See the extended discussion of
167 multi-database searching below.
172 <title><literal>session_shared</literal>
173 (mp::filter::SessionShared)</title>
175 When this is finished, it will implement global sharing of
176 result sets (i.e. between threads and therefore between
177 clients), but it's not yet done.
182 <title><literal>template</literal>
183 (mp::filter::Template)</title>
185 Does nothing at all, merely passing the packet on. (Maybe it
186 should be called <literal>nop</literal> or
187 <literal>passthrough</literal>?) This exists not to be used, but
188 to be copied - to become the skeleton of new filters as they are
194 <title><literal>virt_db</literal>
195 (mp::filter::Virt_db)</title>
197 Performs virtual database selection. See the extended discussion
198 of virtual databases below.
203 <title><literal>z3950_client</literal>
204 (mp::filter::Z3950Client)</title>
206 Performs Z39.50 searching and retrieval by proxying the
207 packages that are passed to it. Init requests are sent to the
208 address specified in the <literal>VAL_PROXY</literal> otherInfo
209 attached to the request: this may have been specified by client,
210 or generated by a <literal>virt_db</literal> filter earlier in
211 the route. Subsequent requests are sent to the same address,
212 which is remembered at Init time in a Session object.
219 <title>Future directions</title>
221 Some other filters that do not yet exist, but which would be
222 useful, are briefly described. These may be added in future
228 <term><literal>frontend_cli</literal> (source)</term>
231 Command-line interface for generating requests.
236 <term><literal>srw2z3950</literal> (filter)</term>
239 Translate SRW requests into Z39.50 requests.
244 <term><literal>srw_client</literal> (sink)</term>
247 SRW searching and retrieval.
252 <term><literal>sru_client</literal> (sink)</term>
255 SRU searching and retrieval.
260 <term><literal>opensearch_client</literal> (sink)</term>
263 A9 OpenSearch searching and retrieval.
273 <chapter id="configuration">
274 <title>Configuration: the Metaproxy configuration file format</title>
278 <title>Introductory notes</title>
280 If Metaproxy is an interpreter providing operations on packages, then
281 its configuration file can be thought of as a program for that
282 interpreter. Configuration is by means of a single file, the name
283 of which is supplied as the sole command-line argument to the
284 <command>yp2</command> program.
287 The configuration files are written in XML. (But that's just an
288 implementation detail - they could just as well have been written
289 in YAML or Lisp-like S-expressions, or in a custom syntax.)
292 Since XML has been chosen, an XML schema,
293 <filename>config.xsd</filename>, is provided for validating
294 configuration files. This file is supplied in the
295 <filename>etc</filename> directory of the Metaproxy distribution. It
296 can be used by (among other tools) the <command>xmllint</command>
297 program supplied as part of the <literal>libxml2</literal>
301 xmllint --noout --schema etc/config.xsd my-config-file.xml
304 (A recent version of <literal>libxml2</literal> is required, as
305 support for XML Schemas is a relatively recent addition.)
310 <title>Overview of XML structure</title>
312 All elements and attributes are in the namespace
313 <ulink url="http://indexdata.dk/yp2/config/1"/>.
314 This is most easily achieved by setting the default namespace on
315 the top-level element, as here:
318 <yp2 xmlns="http://indexdata.dk/yp2/config/1">
321 The top-level element is <yp2>. This contains a
322 <start> element, a <filters> element and a
323 <routes> element, in that order. <filters> is
324 optional; the other two are mandatory. All three are
328 The <start> element is empty, but carries a
329 <literal>route</literal> attribute, whose value is the name of
330 route at which to start running - analogouse to the name of the
331 start production in a formal grammar.
334 If present, <filters> contains zero or more <filter>
335 elements; filters carry a <literal>type</literal> attribute and
336 contain various elements that provide suitable configuration for
337 filters of that type. The filter-specific elements are described
338 below. Filters defined in this part of the file must carry an
339 <literal>id</literal> attribute so that they can be referenced
343 <routes> contains one or more <route> elements, each
344 of which must carry an <literal>id</literal> element. One of the
345 routes must have the ID value that was specified as the start
346 route in the <start> element's <literal>route</literal>
347 attribute. Each route contains zero or more <filter>
348 elements. These are of two types. They may be empty, but carry a
349 <literal>refid</literal> attribute whose value is the same as the
350 <literal>id</literal> of a filter previously defined in the
351 <filters> section. Alternatively, a route within a filter
352 may omit the <literal>refid</literal> attribute, but contain
353 configuration elements similar to those used for filters defined
354 in the <filters> section.
360 <title>Filter configuration</title>
362 All <filter> elements have in common that they must carry a
363 <literal>type</literal> attribute whose value is one of the
364 supported ones, listed in the schema file and discussed below. In
365 additional, <filters>s occurring the <filters> section
366 must have an <literal>id</literal> attribute, and those occurring
367 within a route must have either a <literal>refid</literal>
368 attribute referencing a previously defined filter or contain its
369 own configuration information.
372 In general, each filter recognises different configuration
373 elements within its element, as each filter has different
374 functionality. These are as follows:
378 <title><literal>auth_simple</literal></title>
380 <filter type="auth_simple">
381 <userRegister>../etc/example.simple-auth</userRegister>
387 <title><literal>backend_test</literal></title>
389 <filter type="backend_test"/>
394 <title><literal>frontend_net</literal></title>
396 <filter type="frontend_net">
397 <threads>10</threads>
398 <port>@:9000</port>
404 <title><literal>http_file</literal></title>
406 <filter type="http_file">
407 <mimetypes>/etc/mime.types</mimetypes>
409 <documentroot>.</documentroot>
410 <prefix>/etc</prefix>
417 <title><literal>log</literal></title>
419 <filter type="log">
420 <message>B</message>
426 <title><literal>multi</literal></title>
428 <filter type="multi"/>
433 <title><literal>session_shared</literal></title>
435 <filter type="session_shared">
442 <title><literal>template</literal></title>
444 <filter type="template"/>
449 <title><literal>virt_db</literal></title>
451 <filter type="virt_db">
453 <database>loc</database>
454 <target>z3950.loc.gov:7090/voyager</target>
457 <database>idgils</database>
458 <target>indexdata.dk/gils</target>
465 <title><literal>z3950_client</literal></title>
467 <filter type="z3950_client">
468 <timeout>30</timeout>
477 <chapter id="multidb">
478 <title>Virtual database as multi-database searching</title>
482 <title>Introductory notes</title>
484 Two of Metaproxy's filters are concerned with multiple-database
485 operations. Of these, <literal>virt_db</literal> can work alone
486 to control the routing of searches to one of a number of servers,
487 while <literal>multi</literal> can work with the output of
488 <literal>virt_db</literal> to perform multicast searching, merging
489 the results into a unified result-set. The interaction between
490 these two filters is necessarily complex, reflecting the real
491 complexity of multicast searching in a protocol such as Z39.50
492 that separates initialisation from searching, with the database to
493 search known only during the latter operation.
496 ### Much, much more to say!
501 <chapter id="moduleref">
502 <title>Module Reference</title>
504 The material in this chapter includes the man pages material
509 <chapter id="classes">
510 <title>Classes in the Metaproxy source code</title>
514 <title>Introductory notes</title>
516 <emphasis>Stop! Do not read this!</emphasis>
517 You won't enjoy it at all.
520 This chapter contains documentation of the Metaproxy source code, and is
521 of interest only to maintainers and developers. If you need to
522 change Metaproxy's behaviour or write a new filter, then you will most
523 likely find this chapter helpful. Otherwise it's a waste of your
524 good time. Seriously: go and watch a film or something.
525 <citetitle>This is Spinal Tap</citetitle> is particularly good.
528 Still here? OK, let's continue.
531 In general, classes seem to be named big-endianly, so that
532 <literal>FactoryFilter</literal> is not a filter that filters
533 factories, but a factory that produces filters; and
534 <literal>FactoryStatic</literal> is a factory for the statically
535 registered filters (as opposed to those that are dynamically
541 <title>Individual classes</title>
543 The classes making up the Metaproxy application are here listed by
544 class-name, with the names of the source files that define them in
549 <title><literal>mp::FactoryFilter</literal>
550 (<filename>factory_filter.cpp</filename>)</title>
552 A factory class that exists primarily to provide the
553 <literal>create()</literal> method, which takes the name of a
554 filter class as its argument and returns a new filter of that
555 type. To enable this, the factory must first be populated by
556 calling <literal>add_creator()</literal> for static filters (this
557 is done by the <literal>FactoryStatic</literal> class, see below)
558 and <literal>add_creator_dyn()</literal> for filters loaded
564 <title><literal>mp::FactoryStatic</literal>
565 (<filename>factory_static.cpp</filename>)</title>
567 A subclass of <literal>FactoryFilter</literal> which is
568 responsible for registering all the statically defined filter
569 types. It does this by knowing about all those filters'
570 structures, which are listed in its constructor. Merely
571 instantiating this class registers all the static classes. It is
572 for the benefit of this class that <literal>struct
573 yp2_filter_struct</literal> exists, and that all the filter
574 classes provide a static object of that type.
579 <title><literal>mp::filter::Base</literal>
580 (<filename>filter.cpp</filename>)</title>
582 The virtual base class of all filters. The filter API is, on the
583 surface at least, extremely simple: two methods.
584 <literal>configure()</literal> is passed a DOM tree representing
585 that part of the configuration file that pertains to this filter
586 instance, and is expected to walk that tree extracting relevant
587 information. And <literal>process()</literal> processes a
588 package (see below). That surface simplicitly is a bit
589 misleading, as <literal>process()</literal> needs to know a lot
590 about the <literal>Package</literal> class in order to do
596 <title><literal>mp::filter::AuthSimple</literal>,
597 <literal>Backend_test</literal>, etc.
598 (<filename>filter_auth_simple.cpp</filename>,
599 <filename>filter_backend_test.cpp</filename>, etc.)</title>
601 Individual filters. Each of these is implemented by a header and
602 a source file, named <filename>filter_*.hpp</filename> and
603 <filename>filter_*.cpp</filename> respectively. All the header
604 files should be pretty much identical, in that they declare the
605 class, including a private <literal>Rep</literal> class and a
606 member pointer to it, and the two public methods. The only extra
607 information in any filter header is additional private types and
608 members (which should really all be in the <literal>Rep</literal>
609 anyway) and private methods (which should also remain known only
610 to the source file, but C++'s brain-damaged design requires this
611 dirty laundry to be exhibited in public. Thanks, Bjarne!)
614 The source file for each filter needs to supply:
619 A definition of the private <literal>Rep</literal> class.
624 Some boilerplate constructors and destructors.
629 A <literal>configure()</literal> method that uses the
630 appropriate XML fragment.
635 Most important, the <literal>process()</literal> method that
636 does all the actual work.
643 <title><literal>mp::Package</literal>
644 (<filename>package.cpp</filename>)</title>
646 Represents a package on its way through the series of filters
647 that make up a route. This is essentially a Z39.50 or SRU APDU
648 together with information about where it came from, which is
649 modified as it passes through the various filters.
654 <title><literal>mp::Pipe</literal>
655 (<filename>pipe.cpp</filename>)</title>
657 This class provides a compatibility layer so that we have an IPC
658 mechanism that works the same under Unix and Windows. It's not
659 particularly exciting.
664 <title><literal>mp::RouterChain</literal>
665 (<filename>router_chain.cpp</filename>)</title>
672 <title><literal>mp::RouterFleXML</literal>
673 (<filename>router_flexml.cpp</filename>)</title>
680 <title><literal>mp::Session</literal>
681 (<filename>session.cpp</filename>)</title>
688 <title><literal>mp::ThreadPoolSocketObserver</literal>
689 (<filename>thread_pool_observer.cpp</filename>)</title>
696 <title><literal>mp::util</literal>
697 (<filename>util.cpp</filename>)</title>
699 A namespace of various small utility functions and classes,
700 collected together for convenience. Most importantly, includes
701 the <literal>mp::util::odr</literal> class, a wrapper for YAZ's
707 <title><literal>mp::xml</literal>
708 (<filename>xmlutil.cpp</filename>)</title>
710 A namespace of various XML utility functions and classes,
711 collected together for convenience.
718 <title>Other Source Files</title>
720 In addition to the Metaproxy source files that define the classes
721 described above, there are a few additional files which are
722 briefly described here:
726 <term><literal>metaproxy_prog.cpp</literal></term>
729 The main function of the <command>yp2</command> program.
734 <term><literal>ex_router_flexml.cpp</literal></term>
737 Identical to <literal>metaproxy_prog.cpp</literal>: it's not clear why.
742 <term><literal>test_*.cpp</literal></term>
745 Unit-tests for various modules.
751 ### Still to be described:
752 <literal>ex_filter_frontend_net.cpp</literal>,
753 <literal>filter_dl.cpp</literal>,
754 <literal>plainfile.cpp</literal>,
755 <literal>tstdl.cpp</literal>.
764 <!-- This is just a lame way to get some vertical whitespace at
765 the end of the document -->
774 <!-- Keep this comment at the end of the file
779 sgml-minimize-attributes:nil
780 sgml-always-quote-attributes:t
783 sgml-parent-document:yp2.xml
784 sgml-local-catalogs: nil
785 sgml-namecase-general:t