2 <title>The YAZ Proxy</title>
4 The YAZ proxy is a transparent Z39.50-to-Z39.50 gateway. That is,
5 it is a Z39.50 server which has as its back-end a Z39.50 client
6 that forwards requests on to another server (known as the
7 <firstterm>backend target</firstterm>.)
10 The YAZ Proxy is useful for debugging Z39.50 software, logging
11 APDUs, redirecting Z39.50 packages through firewalls, etc.
12 Furthermore, it offers facilities that often
13 boost performance for connectionless Z39.50 clients such
17 Unlike most other server software, the proxy runs single-threaded,
18 single-process. Every I/O operation
19 is non-blocking so it is very lightweight and extremely fast.
20 It does not store any state information on the hard drive,
21 except any log files you ask for.
24 <section id="proxy-example">
25 <title>Example: Using the Proxy to Log APDUs</title>
27 Suppose you use a commercial Z39.50 client for which you do not
28 have source code, and it's not behaving how you think it should
29 when running against some specific server that you have no control
30 over. One way to diagnose the problem is to find out what packets
31 (APDUs) are being sent and received, but not all client
32 applications have facilities to do APDU logging.
35 No problem. Run the proxy on a friendly machine, get it to log
36 APDUs, and point the errant client at the proxy instead of
37 directly at the server that's causing it problems.
40 Suppose the server is running on <literal>foo.bar.com</literal>,
41 port 18398. Run the proxy on the machine of your choice, say
42 <literal>your.company.com</literal> like this:
45 yaz-proxy -a - -t tcp:foo.bar.com:18398 tcp:@:9000
48 (The <literal>-a -</literal> option requests APDU logging on
49 standard output, <literal>-t tcp:foo.bar.com:18398</literal>
50 specifies where the backend target is, and
51 <literal>tcp:@:9000</literal> tells the proxy to listen on port
52 9000 and accept connections from any machine.)
55 Now change your client application's configuration so that instead
56 of connecting to <literal>foo.bar.com</literal> port 18398, it
57 connects to <literal>your.company.com</literal> port 9000, and
58 start it up. It will work exactly as usual, but all the packets
59 will be sent via the proxy, which will generate a log like this:
64 referenceId OCTETSTRING(len=4) 69 6E 69 74
65 protocolVersion BITSTRING(len=1)
66 options BITSTRING(len=2)
67 preferredMessageSize 1048576
68 maximumRecordSize 1048576
69 implementationId 'Mike Taylor (id=169)'
70 implementationName 'Net::Z3950.pm (Perl)'
71 implementationVersion '0.31'
75 referenceId OCTETSTRING(len=4) 69 6E 69 74
76 protocolVersion BITSTRING(len=1)
77 options BITSTRING(len=2)
78 preferredMessageSize 1048576
79 maximumRecordSize 1048576
82 implementationName 'GFS/YAZ / Zebra Information Server'
83 implementationVersion 'YAZ 1.9.1 / Zebra 1.3.3'
87 referenceId OCTETSTRING(len=1) 30
90 mediumSetPresentNumber 0
92 resultSetName 'default'
97 smallSetElementSetNames choice
101 mediumSetElementSetNames choice
104 preferredRecordSyntax OID: 1 2 840 10003 5 10
108 attributeSetId OID: 1 2 840 10003 3 1
116 general OCTETSTRING(len=7) 6D 69 6E 65 72 61 6C
126 <section id="proxy-target">
127 <title>Specifying the Backend Target</title>
129 When the proxy accepts a Z39.50 client session, it
130 determines the backend target by the following rules:
133 <para> If the <literal>InitializeRequest</literal> PDU from the
135 <link linkend="otherinfo-encoding"><literal>otherInfo</literal></link>
137 <literal>1.2.840.10003.10.1000.81.1</literal>, then the
138 contents of that element specify the target to be used, in the
139 usual YAZ address format (typically
140 <literal>tcp:<parameter>hostname</parameter>:<parameter>port</parameter></literal>)
142 <ulink url="http://www.indexdata.dk/yaz/doc/comstack.addresses.php"
143 >the Addresses section of the YAZ manual</ulink>.
147 <para> Otherwise, the Proxy uses the default target, if one was
148 specified on the command-line with the <literal>-t</literal>
149 option. A default target can also be specified in the
154 <para> Otherwise, the proxy closes the connection with
161 <section id="proxy-keepalive">
162 <title>Keep-alive Facility</title>
164 The keep-alive is a facility where the proxy keeps the connection to the
165 backend - even if the client closes the connection to the proxy.
168 If a new or another client connects to the proxy again and requests the
169 same backend it will be reassigned to this backend. In this case, the
170 proxy sends an initialize response directly to the client and an
171 initialize handshake with the backend is omitted.
174 When a client reconnects, query and record caching works better, if the
175 proxy assigns it to the same backend as before. And the result set
176 (if any) is re-used. To achive this, Index Data defined a session
177 cookie which identifies the backend session.
180 The cookie is defined by the client and is sent as part of the
181 Initialize Request and passed in an
182 <link linkend="otherinfo-encoding"><literal>otherInfo</literal></link>
183 element with OID <literal>1.2.840.10003.10.1000.81.2</literal>.
186 Clients that do not send a cookie as part of the initialize request
187 may still better performance, since the init handshake is saved.
191 <section id="query-cache">
192 <title>Query Caching</title>
194 Simple stateless clients often send identical Z39.50 searches
195 in a relatively short period of time (e.g. in order to produce a
196 results-list page, the next page,
197 a single full-record, etc). And for many targets, it's
198 much more expensive to produce a new result set than to
199 reuse an existing one.
202 The proxy tries to solve that by remembering the last query for each
203 backend target, so that if an identical query is received next, it
204 is turned into Present Requests rather than new Search Requests.
208 In a future we release will will probably allows for
209 an arbitrary-sized cache for targets supporting named result sets.
213 You can enable/disable query caching using option -o.
217 <section id="record-cache">
218 <title>Record Caching</title>
220 As an option, the proxy may also cache result set records for the
222 The proxy takes into account the Record Syntax and CompSpec.
223 The CompSpec includes simple element set names as well.
224 By default the cache is 200000 bytes per session.
228 <section id="query-validation">
229 <title>Query Validation</title>
231 The Proxy may also be configured to trap particular attributes in
232 Type-1 queries and send Bib-1 diagnostics back to the client without
233 even consulting the backend target. This facility may be useful if
234 a target does not properly issue diagnostics when unsupported attributes
239 <section id="record-validation">
240 <title>Record Syntax Validation</title>
242 The proxy may be configured to accept, reject or convert records.
243 When accepted, the target passes search/present requests to the
244 backend target under the assumption that the target can honor the
245 request (In fact it may not do that). When a record is rejected because
246 the record syntax is "unsupported" the proxy returns a diagnostic to the
247 client. Finally, the proxy may convert records.
250 In the current version the only supported conversion is
251 MARC21/USMARC in MARC-8 charset to MARCXML in UTF-8. Future version of
252 the proxy may do other record/charset conversions.
256 <section id="other-optimizations">
257 <title>Other Optimizations</title>
259 We've had some plans to support global caching of result set records,
260 but this has not yet been implemented.
264 <section id="proxy-config-file">
265 <title>Proxy Configuration File</title>
267 The Proxy as an option may read a configuration file using option
268 <literal>-c</literal> followed by the filename of a config file.
271 The config file is in XML format. The YAZ proxy must be compiled
272 with <ulink url="http://www.xmlsoft.org/">libxml2</ulink> support in
273 order for the config file facility to be enabled.
276 <para>To check for a config file to be well-formed, the yaz-proxy may
277 be invoked without specifying a listening port, i.e.
279 yaz-proxy -c myconfig.xml
281 If this does not produce errors, the file is well-formed.
284 <section id="proxy-config-header">
285 <title>Proxy Configuration Header</title>
287 The proxy config file must have a root element called
288 <literal>proxy</literal>. All information except an optional XML
289 header must be stored within the <literal>proxy</literal> element.
292 <?xml version="1.0"?>
294 <!-- content here .. -->
298 <section id="proxy-config-target">
299 <title>Configuration: target</title>
301 The element <literal>target</literal> which may be repeated zero
302 or more times with parent elemtn <literal>proxy</literal> contains
303 information about each backend target.
304 The <literal>target</literal> element have two attributes:
305 <literal>name</literal> which holds the logical name of the backend
306 target (required) and <literal>default</literal> (optional) which
307 (when given) specifies that the backend target is the default target -
308 equivalent to command line option <literal>-t</literal>.
312 <?xml version="1.0"?>
314 <target name="server1" default="1">
315 <!-- description of server1 .. -->
317 <target name="server2">
318 <!-- description of server2 .. -->
324 <section id="proxy-config-url">
325 <title>Configuration:url</title>
327 The <literal>url</literal> which may be repeated one or more times
328 should be the child of the <literal>target</literal> element.
329 The CDATA of <literal>url</literal> is the Z-URL of the backend.
332 Multiple <literal>url</literal> element may be used. In that case, then
333 a client initiates a session, the proxy chooses the URL with the lowest
334 number of active sessions, thereby distributing the load. It is
335 assumed that each URL represents the same database (data).
338 <section id="proxy-config-keepalive">
339 <title>Configuration: keepalive</title>
340 <para>The <literal>keepalive</literal> element holds information about
341 the keepalive Z39.50 sessions. Keepalive sessions are proxy-to-backend
342 sessions that is no longer associated with a client session.
344 <para>The <literal>keepalive</literal> element which is the child of
345 the <literal>target</literal>holds two elements:
346 <literal>bandwidth</literal> and <literal>pdu</literal>.
347 The <literal>bandwidth</literal> is the maximum total bytes
348 transferred to/from the target. If a target session exceeds this
349 limit, it is shut down (and no longer kept alive).
350 The <literal>pdu</literal> is the maximum number of requests sent
351 to the target. If a target session exceeds this limit, it is
352 shut down. The idea of these two limits is that avoid very long
353 sessions that use resources in a backend (that leaks!).
356 The following sets maximum number of bytes transferred in a
357 target session to 1 MB and maxinum of requests to 400.
360 <bandwidth>1048576</bandwidth>
361 <retrieve>400</retrieve>
366 <section id="proxy-config-limit">
367 <title>Configuration: limit</title>
369 The <literal>limit</literal> section specifies bandwidth/pdu requests
370 limits for an active session.
371 The proxy records bandwidth/pdu requests during the last 60 seconds
372 (1 minute). The <literal>limit</literal> may include the
373 elements <literal>bandwidth</literal>, <literal>pdu</literal>,
374 and <literal>retrieve</literal>. The <literal>bandwidth</literal>
375 measures the number of bytes transferred within the last minute.
376 The <literal>pdu</literal> is the number of requests in the last
377 minute. The <literal>retrieve</literal> holds the maximum records to
378 be retrived in one Present Request.
381 If a bandwidth/pdu limit is reached the proxy will postpone the
382 requests to the target and wait one or more seconds. The idea of the
383 limit is to ensure that clients that downloads hundreds or thousands of
384 records do not hurt other users.
387 The following sets maximum number of bytes transferred per minute to
388 500Kbytes and maximum number of requests to 40.
391 <bandwidth>524288</bandwidth>
392 <retrieve>40</retrieve>
398 Typically the limits for keepalive are much higher than
399 those for session minute average.
404 <section id="proxy-config-attribute">
405 <title>Configuration: attribute</title>
407 The <literal>attribute</literal> element specifies accept or reject
408 or a particular attribute type, value pair.
409 Well-behaving targets will reject unsupported attributes on their
410 own. This feature is useful for targets that do not gracefully
411 handle unsupported attributes.
414 Attribute elements may be repeated. The proxy inspects the attribute
415 specifications in the order as specified in the configuration file.
416 When a given attribute specification matches a given attribute list
417 in a query, the proxy takes appropriate action (reject, accept).
420 If no attribute specifications matches the attribute list in a query,
424 The <literal>attribute</literal> element has two required attributes:
425 <literal>type</literal> which is the Attribute Type-1 type, and
426 <literal>value</literal> which is the Attribute Type-1 value.
427 The special value/type <literal>*</literal> matches any attribute
428 type/value. A value may also be specified as a list with each
429 value separated by comma, a value may also be specified as a
430 list: low value - dash - high value.
433 If attribute <literal>error</literal> is given, that holds a
434 Bib-1 diagnostic which is sent to the client if the particular
435 type, value is part of a query.
438 If attribute <literal>error</literal> is not given, the attribute
439 type, value is accepted and passed to the backend target.
442 A target that supports use attributes 1,4, 1000 through 1003 and
443 no other use attributes, could use the following rules:
445 <attribute type="1" value="1,4,1000-1003">
446 <attribute type="1" value="*" error="114"/>
451 <section id="proxy-config-syntax">
452 <title>Configuration: syntax</title>
454 The <literal>syntax</literal> element specifies accept or reject
455 or a particular record syntax request from the client.
458 The <literal>syntax</literal> has one equired attribute:
459 <literal>type</literal> which is the Preferred Record Syntax.
462 If attribute <literal>error</literal> is given, that holds a
463 Bib-1 diagnostic which is sent to the client if the particular
464 record syntax is part of a present - or search request.
467 If attribute <literal>error</literal> is not given, the record syntax
468 is accepted and passed to the backend target.
471 If attribute <literal>marcxml</literal> is given, the proxy will
472 perform MARC21 to MARCXML conversion. In this case the
473 <literal>type</literal> should be XML. The proxy will use
474 preferred record syntax USMARC/MARC21 against the backend target.
476 <para>To accept USMARC and offer MARCXML XML recors but reject
477 all other requests the following configuaration could be used:
480 <target name="mytarget">
481 <syntax type="usmarc"/>
482 <syntax type="xml" marcxml="1"/>
483 <syntax type="*" error="238"/>
490 <section id="proxy-config-target-timeout">
491 <title>Configuration: target-timeout</title>
493 The element <literal>target-timeout</literal> is the child of element
494 <literal>target</literal> and specifies the amount in seconds before
495 a target session is shut down.
498 This can also be specified on the command line bt using option
499 <literal>-T</literal>. Refer to <xref linkend="proxy-usage"/>.
503 <section id="proxy-config-client-timeout">
504 <title>Configuration: client-timeout</title>
506 The element <literal>client-timeout</literal> is the child of element
507 <literal>target</literal> and specifies the amount in seconds before
508 a client session is shut down.
511 This can also be specified on the command line by using option
512 <literal>-i</literal>. Refer to <xref linkend="proxy-usage"/>.
516 <section id="proxy-config-preinit">
517 <title>Configuration: preinit</title>
519 The element <literal>preinit</literal> is the child of element
520 <literal>target</literal> and specifies the number of spare
521 connection to a target. By default no spare connection are
522 created by the proxy. If the proxy uses a target exclusive or
523 a lot, the preinit session will ensure that target sessions
524 have been made before the client makes a connection and will therefore
525 reduce the connect-init handshake dramatically. Never set this to
530 <section id="proxy-config-max-clients">
531 <title>Configuration: max-clients</title>
533 The element <literal>max-clients</literal> is the child of element
534 <literal>proxy</literal> and specifies the total number of
535 allowed connections to targets (all targets). If this limit
536 is reached the proxy will close the least recently used connection.
539 Note, that many Unix systems impose a system on the number of
540 open files allowed in a single process, typically in the
541 range 256 (Solaris) to 1024 (Linux).
542 The proxy uses 2 sockets per session + a few files
543 for logging. As a rule of thumb, ensure that 2*max-clients + 5
544 can be opened by the proxy process.
548 Using the <ulink url="http://www.gnu.org/software/bash/bash.html">
549 bash</ulink> shell, you can set the limit with
550 <literal>ulimit -n</literal><replaceable>no</replaceable>.
551 Use <literal>ulimit -a</literal> to display limits.
556 <section id="proxy-config-log">
557 <title>Configuration: log</title>
559 The element <literal>log</literal> is the child of element
560 <literal>proxy</literal> and specifies what to be logged by the
564 Specify the log file with command-line option <literal>-l</literal>.
567 The text of the <literal>log</literal> element is a sequence of
568 options separated by white space. See the table below:
569 <table frame="top"><title>Logging options</title>
571 <colspec colwidth="1*" colname="option"/>
572 <colspec colwidth="2*" colname="description"/>
575 <entry>Option</entry>
576 <entry>Description</entry>
581 <entry><literal>client-apdu</literal></entry>
583 Log APDUs as reported by YAZ for the
584 communication between the client and the proxy.
585 This facility is equivalent to the APDU logging that
586 happens when using option <literal>-a</literal>, however
587 this tells the proxy to log in the same file as given
588 by <literal>-l</literal>.
592 <entry><literal>server-apdu</literal></entry>
594 Log APDUs as reported by YAZ for the
595 communication between the proxy and the server (backend).
599 <entry><literal>clients-requests</literal></entry>
601 Log a brief description about requests transferred between
602 the client and the proxy. The name of the request and the size
603 of the APDU is logged.
607 <entry><literal>server-requests</literal></entry>
609 Log a brief description about requests transferred between
610 the proxy and the server (backend). The name of the request
611 and the size of the APDU is logged.
619 To log communication in details between the proxy and the backend, th
620 following configuration could be used:
622 <target name="mytarget">
623 <log>server-apdu server-requests</log>
631 <section id="proxy-usage">
632 <title>Proxy Usage</title>
635 <refentry id="yaz-proxy">
639 <section id="otherinfo-encoding"><title>OtherInformation Encoding</title>
641 The proxy uses the OtherInformation definition to carry
642 information about the target address and cookie.
645 OtherInformation ::= [201] IMPLICIT SEQUENCE OF SEQUENCE{
646 category [1] IMPLICIT InfoCategory OPTIONAL,
648 characterInfo [2] IMPLICIT InternationalString,
649 binaryInfo [3] IMPLICIT OCTET STRING,
650 externallyDefinedInfo [4] IMPLICIT EXTERNAL,
651 oid [5] IMPLICIT OBJECT IDENTIFIER}}
653 InfoCategory ::= SEQUENCE{
654 categoryTypeId [1] IMPLICIT OBJECT IDENTIFIER OPTIONAL,
655 categoryValue [2] IMPLICIT INTEGER}
658 The <literal>categoryTypeId</literal> is either
659 OID 1.2.840.10003.10.1000.81.1, 1.2.840.10003.10.1000.81.2
660 for proxy target and proxy cookie respectively. The
661 integer element <literal>category</literal> is set to 0.
662 The value proxy and cookie is stored in element
663 <literal>characterInfo</literal> of the <literal>information</literal>
668 <!-- Keep this comment at the end of the file
673 sgml-minimize-attributes:nil
674 sgml-always-quote-attributes:t
677 sgml-parent-document: "yaz++.xml"
678 sgml-local-catalogs: nil
679 sgml-namecase-general:t