1 <chapter id="proxy-reference">
2 <title>Proxy Reference</title>
3 <section id="proxy-operation">
4 <title>Operating Environment</title>
6 The YAZ proxy is a console program. After startup it spawns
7 a child process (except on Windows or if option -X is given).
8 The child process is the core of the proxy and it handles all
9 communication with clients and servers. The parent process
10 will restart the child process if it dies unexpectedly and report
11 the reason. For options for YAZ proxy,
12 see <xref linkend="proxy-usage"/>.
15 As an option, the proxy may change user identity to a less privileged
19 <section id="proxy-target">
20 <title>Choosing the Backend Server</title>
22 When the proxy receives a Z39.50 Initialize Request from a Z39.50
23 client, it determines the backend server by the following rules:
26 <para>If the <literal>InitializeRequest</literal> PDU from the
28 <link linkend="otherinfo-encoding"><literal>otherInfo</literal></link>
30 <literal>1.2.840.10003.10.1000.81.1</literal>, then the
31 contents of that element specify the server to be used, in the
32 usual YAZ address format (typically
33 <literal>tcp:<parameter>hostname</parameter>:<parameter>port</parameter></literal>)
35 <ulink url="http://www.indexdata.dk/yaz/doc/comstack.addresses.tkl"
36 >the Addresses section of the YAZ manual</ulink>.
41 <para>Otherwise, the Proxy uses the default server, if one was
42 specified in the proxy configuration file. See
43 <xref linkend="proxy-config-target"/>.
48 <para>Otherwise, the Proxy uses the default server, if one was
49 specified on the command-line with the <literal>-t</literal>
54 <para>Otherwise, the proxy closes the connection with
61 If the proxy receives an SRW/SRU request, the following rules are used.
64 <para>If default target has Explain information with a
65 <literal>database</literal> that matches the path of the
66 HTTP request of SRW/SRU that backend server is used for
72 Otherwise the service will return HTTP 404 (Not found).
79 We know it is stupid to only check for explain in default target.
80 It means that it is only possible to offer one SRW/SRU server.
81 We expect to improve that in the next version of the YAZ proxy.
85 <section id="proxy-keepalive">
86 <title>Keep-alive Facility</title>
88 The keep-alive is a facility where the proxy keeps the connection to the
89 backend server - even if the client closes the connection to the proxy.
92 If a new or another client connects to the proxy again and requests the
93 same backend it will be reassigned to this backend. In this case, the
94 proxy sends an initialize response directly to the client and an
95 initialize handshake with the backend is omitted.
98 When a client reconnects, query and record caching works better, if the
99 proxy assigns it to the same backend as before. And the result set
100 (if any) is re-used. To achieve this, Index Data defined a session
101 cookie which identifies the backend session.
104 The cookie is defined by the client and is sent as part of the
105 Initialize Request and passed in an
106 <link linkend="otherinfo-encoding"><literal>otherInfo</literal></link>
107 element with OID <literal>1.2.840.10003.10.1000.81.2</literal>.
110 Clients that do not send a cookie as part of the initialize request
111 may still better performance, since the init handshake is saved.
114 Refer to <xref linkend="proxy-config-keepalive"/> on how to setup
115 configuration parameters for keepalive.
120 <section id="query-cache">
121 <title>Query Caching</title>
123 Simple stateless clients often send identical Z39.50 searches
124 in a relatively short period of time (e.g. in order to produce a
125 results-list page, the next page,
126 a single full-record, etc). And for many targets, it's
127 much more expensive to produce a new result set than to
128 reuse an existing one.
131 The proxy tries to solve that by remembering the last query for each
132 backend target, so that if an identical query is received next, it
133 is turned into Present Requests rather than new Search Requests.
137 In a future we release will will probably allows for
138 an arbitrary-sized cache for targets supporting named result sets.
142 You can enable/disable query caching using option -o.
146 <section id="record-cache">
147 <title>Record Caching</title>
149 As an option, the proxy may also cache result set records for the
151 The proxy takes into account the Record Syntax and CompSpec.
152 The CompSpec includes simple element set names as well.
153 By default the cache is 200000 bytes per session.
157 <section id="query-validation">
158 <title>Query Validation</title>
160 The Proxy may also be configured to trap particular attributes in
161 Type-1 queries and send Bib-1 diagnostics back to the client without
162 even consulting the backend target. This facility may be useful if
163 a target does not properly issue diagnostics when unsupported attributes
168 <section id="record-validation">
169 <title>Record Syntax Validation</title>
171 The proxy may be configured to accept, reject or convert records.
172 When accepted, the target passes search/present requests to the
173 backend target under the assumption that the target can honor the
174 request (In fact it may not do that). When a record is rejected because
175 the record syntax is "unsupported" the proxy returns a diagnostic to the
176 client. Finally, the proxy may convert records.
179 The proxy can convert from MARC to MARCXML and thereby offer an
180 XML version of any MARC record as long as it is ISO2709 encoded.
181 If the proxy is compiled with libXSLT support it can also
186 <section id="other-optimizations">
187 <title>Other Optimizations</title>
189 We've had some plans to support global caching of result set records,
190 but this has not yet been implemented.
194 <section id="proxy-config-file">
195 <title>Proxy Configuration File</title>
197 The Proxy may read a configuration file using option
198 <literal>-c</literal> followed by the filename of a config file.
201 The config file is XML based. The YAZ proxy must be compiled
202 with <ulink url="http://www.xmlsoft.org/">libxml2</ulink> and
203 <ulink url="http://xmlsoft.org/XSLT/">libXSLT</ulink> support in
204 order for the config file facility to be enabled.
207 See <xref linkend="yazproxy-schema"/> for an XML schema
208 for the configuration.
211 <para>To check for a config file to be well-formed, the yazproxy may
212 be invoked without specifying a listening port, i.e.
214 yazproxy -c myconfig.xml
216 If this does not produce errors, the file is well-formed.
219 <section id="proxy-config-header">
220 <title>Proxy Configuration Header</title>
222 The proxy config file must have a root element called
223 <literal>proxy</literal> and scoped within namespace
224 <literal> xmlns="http://indexdata.dk/yazproxy/schema/0.8/</literal>.
225 All information except an optional XML header must be stored
226 within the <literal>proxy</literal> element.
229 <?xml version="1.0"?>
230 <proxy xmlns="http://indexdata.dk/yazproxy/schema/0.8/">
231 <!-- content here .. -->
235 <section id="proxy-config-target">
236 <title>target</title>
238 The element <literal>target</literal> which may be repeated zero
239 or more times with parent element <literal>proxy</literal> contains
240 information about each backend target.
241 The <literal>target</literal> element have two attributes:
242 <literal>name</literal> which holds the logical name of the backend
243 target (required) and <literal>default</literal> (optional) which
244 (when given) specifies that the backend target is the default target -
245 equivalent to command line option <literal>-t</literal>.
249 <?xml version="1.0"?>
250 <proxy xmlns="http://indexdata.dk/yazproxy/schema/0.8/">
251 <target name="server1" default="1">
252 <!-- description of server1 .. -->
254 <target name="server2">
255 <!-- description of server2 .. -->
261 <section id="proxy-config-url">
264 The <literal>url</literal> which may be repeated one or more times
265 should be the child of the <literal>target</literal> element.
266 The CDATA of <literal>url</literal> is the Z-URL of the backend.
269 Multiple <literal>url</literal> element may be used. In that case, then
270 a client initiates a session, the proxy chooses the URL with the lowest
271 number of active sessions, thereby distributing the load. It is
272 assumed that each URL represents the same database (data).
276 <section id="proxy-config-target-timeout">
277 <title>target-timeout</title>
279 The element <literal>target-timeout</literal> is the child of element
280 <literal>target</literal> and specifies the amount in seconds before
281 a target session is shut down.
284 This can also be specified on the command line by using option
285 <literal>-T</literal>. Refer to OPTIONS in <xref linkend="proxy-usage"/>.
289 <section id="proxy-config-client-timeout">
290 <title>client-timeout</title>
292 The element <literal>client-timeout</literal> is the child of element
293 <literal>target</literal> and specifies the amount in seconds before
294 a client session is shut down.
297 This can also be specified on the command line by using option
298 <literal>-i</literal>. Refer to OPTIONS in <xref linkend="proxy-usage"/>.
302 <section id="proxy-config-keepalive">
303 <title>keepalive</title>
304 <para>The <literal>keepalive</literal> element holds information about
305 the keepalive Z39.50 sessions. Keepalive sessions are proxy-to-backend
306 sessions that is no longer associated with a client session.
308 <para>The <literal>keepalive</literal> element which is the child of
309 the <literal>target</literal>holds two elements:
310 <literal>bandwidth</literal> and <literal>pdu</literal>.
311 The <literal>bandwidth</literal> is the maximum total bytes
312 transferred to/from the target. If a target session exceeds this
313 limit, it is shut down (and no longer kept alive).
314 The <literal>pdu</literal> is the maximum number of requests sent
315 to the target. If a target session exceeds this limit, it is
316 shut down. The idea of these two limits is that avoid very long
317 sessions that use resources in a backend (that leaks!).
320 The following sets maximum number of bytes transferred in a
321 target session to 1 MB and maximum of requests to 400.
324 <bandwidth>1048576</bandwidth>
330 <section id="proxy-config-limit">
333 The <literal>limit</literal> section specifies bandwidth/pdu requests
334 limits for an active session.
335 The proxy records bandwidth/pdu requests during the last 60 seconds
336 (1 minute). The <literal>limit</literal> may include the
337 elements <literal>bandwidth</literal>, <literal>pdu</literal>,
338 and <literal>retrieve</literal>. The <literal>bandwidth</literal>
339 measures the number of bytes transferred within the last minute.
340 The <literal>pdu</literal> is the number of requests in the last
341 minute. The <literal>retrieve</literal> holds the maximum records to
342 be retrieved in one Present Request.
345 If a bandwidth/pdu limit is reached the proxy will postpone the
346 requests to the target and wait one or more seconds. The idea of the
347 limit is to ensure that clients that downloads hundreds or thousands of
348 records do not hurt other users.
351 The following sets maximum number of bytes transferred per minute to
352 500Kbytes and maximum number of requests to 40.
355 <bandwidth>524288</bandwidth>
356 <retrieve>40</retrieve>
362 Typically the limits for keepalive are much higher than
363 those for session minute average.
368 <section id="proxy-config-attribute">
369 <title>attribute</title>
371 The <literal>attribute</literal> element specifies accept or reject
372 or a particular attribute type, value pair.
373 Well-behaving targets will reject unsupported attributes on their
374 own. This feature is useful for targets that do not gracefully
375 handle unsupported attributes.
378 Attribute elements may be repeated. The proxy inspects the attribute
379 specifications in the order as specified in the configuration file.
380 When a given attribute specification matches a given attribute list
381 in a query, the proxy takes appropriate action (reject, accept).
384 If no attribute specifications matches the attribute list in a query,
388 The <literal>attribute</literal> element has two required attributes:
389 <literal>type</literal> which is the Attribute Type-1 type, and
390 <literal>value</literal> which is the Attribute Type-1 value.
391 The special value/type <literal>*</literal> matches any attribute
392 type/value. A value may also be specified as a list with each
393 value separated by comma, a value may also be specified as a
394 list: low value - dash - high value.
397 If attribute <literal>error</literal> is given, that holds a
398 Bib-1 diagnostic which is sent to the client if the particular
399 type, value is part of a query.
402 If attribute <literal>error</literal> is not given, the attribute
403 type, value is accepted and passed to the backend target.
406 A target that supports use attributes 1,4, 1000 through 1003 and
407 no other use attributes, could use the following rules:
409 <attribute type="1" value="1,4,1000-1003"/>
410 <attribute type="1" value="*" error="114"/>
414 <section id="proxy-config-syntax">
415 <title>syntax</title>
417 The <literal>syntax</literal> element specifies accept or reject
418 or a particular record syntax request from the client.
421 The <literal>syntax</literal> has one required attribute:
422 <literal>type</literal> which is the Preferred Record Syntax.
425 If attribute <literal>error</literal> is given, that holds a
426 Bib-1 diagnostic which is sent to the client if the particular
427 record syntax is part of a present - or search request.
430 If attribute <literal>error</literal> is not given, the record syntax
431 is accepted and passed to the backend target.
434 If attribute <literal>marcxml</literal> is given, the proxy will
435 perform MARC21 to MARCXML conversion. In this case the
436 <literal>type</literal> should be XML. The proxy will use
437 preferred record syntax USMARC/MARC21 or <literal>backendtype</literal>
438 (if given) against the backend target.
441 If attribute <literal>backendtype</literal> is given, that holds the
442 record syntax to be transmitted to backend.
445 If attribute <literal>stylesheet</literal> is given, the proxy
446 will convert XML record from server via XSLT. It is important
447 that the content from server is XML. If used in conjunction with
448 attribute <literal>marcxml</literal> the MARC to MARCXML conversion
449 takes place before the XSLT conversion takes place.
452 If attribute <literal>identifier</literal> is given that is the
453 SRW/SRU record schema identifier for the resulting output record (after
454 MARCXML and/or XSLT conversion).
457 If sub element <literal>title</literal> is given (as child element
458 of <literal>syntax</literal>, then that is the official SRW/SRU
459 name of the resulting record schema.
462 If sub element <literal>name</literal> is given that is an alias
463 for the record schema identifier. Multiple <literal>name</literal>s
467 <title>MARCXML conversion</title>
468 <para>To accept USMARC and offer MARCXML XML plus Dublin Core (via
469 XSLT conversion) but the following configuration could be used:
472 <target name="mytarget">
474 <syntax type="usmarc"/>
475 <syntax type="xml" marcxml="1"
476 identifier="info:srw/schema/1/marcxml-v1.1"
477 <title>MARCXML<title>
478 <name>marcxml<name>
480 <syntax type="xml" marcxml="1" stylesheet="MARC21slim2SRWDC.xsl"
481 identifier="info:srw/schema/1/dc-v1.1">
482 <title>Dublin Core<title>
485 <syntax type="*" error="238"/>
495 <section id="proxy-config-explain">
496 <title>explain</title>
498 The <literal>explain</literal> element includes Explain information
499 for SRW/SRU about the server in the target section. This
500 information must have a <literal>serverInfo</literal> element
501 with a database that this target must be available as (URL path).
504 <explain xmlns="http://explain.z3950.org/dtd/2.0/">
506 <host>myhost.org</host>
508 <database>mydatabase</database>
510 <!-- remaining Explain stuff -->
514 In the above case, the SRW/SRU service is available as
515 <literal>http://myhost.org:8000/mydatabase</literal>.
520 <section id="proxy-config-cql2rpn">
521 <title>cql2rpn</title>
523 The content of the <literal>cql2rpn</literal> element specifies
524 the path from the working directory to a CQL-to-RPN conversion
525 file for the server in the target section. This element
526 is required for SRW/SRU searches to operate against Z39.50
527 servers that don't support CQL. Most Z39.50 servers only support
528 Type-1/RPN so this is usually required.
531 See YAZ documentation for more information about the
532 <ulink url="http://indexdata.dk/yaz/doc/tools.tkl#tools.cql.pqf">CQL
533 to PQF</ulink> conversion. See also the
534 <filename>pqf.properties</filename> in the <filename>etc</filename>
535 (or <replaceable>prefix/share/yazproxy</replaceable>)
536 directory of the YAZ proxy distribution.
540 <section id="proxy-config-preinit">
541 <title>preinit</title>
543 The element <literal>preinit</literal> is the child of element
544 <literal>target</literal> and specifies the number of spare
545 connection to a target. By default no spare connection are
546 created by the proxy. If the proxy uses a target exclusive or
547 a lot, the preinit session will ensure that target sessions
548 have been made before the client makes a connection and will therefore
549 reduce the connect-init handshake dramatically. Never set this to
554 <section id="proxy-config-max-clients">
555 <title>max-clients</title>
557 The element <literal>max-clients</literal> is the child of element
558 <literal>proxy</literal> and specifies the total number of
559 allowed connections to targets (all targets). If this limit
560 is reached the proxy will close the least recently used connection.
563 Note, that many Unix systems impose a system on the number of
564 open files allowed in a single process, typically in the
565 range 256 (Solaris) to 1024 (Linux).
566 The proxy uses 2 sockets per session + a few files
567 for logging. As a rule of thumb, ensure that 2*max-clients + 5
568 can be opened by the proxy process.
572 Using the <ulink url="http://www.gnu.org/software/bash/bash.html">
573 bash</ulink> shell, you can set the limit with
574 <literal>ulimit -n</literal><replaceable>no</replaceable>.
575 Use <literal>ulimit -a</literal> to display limits.
580 <section id="proxy-config-log">
583 The element <literal>log</literal> is the child of element
584 <literal>proxy</literal> and specifies what to be logged by the
588 Specify the log file with command-line option <literal>-l</literal>.
591 The text of the <literal>log</literal> element is a sequence of
592 options separated by white space. See the table below:
593 <table frame="top"><title>Logging options</title>
595 <colspec colwidth="1*"/>
596 <colspec colwidth="2*"/><thead>
598 <entry>Option</entry>
599 <entry>Description</entry>
604 <entry><literal>client-apdu</literal></entry>
606 Log APDUs as reported by YAZ for the
607 communication between the client and the proxy.
608 This facility is equivalent to the APDU logging that
609 happens when using option <literal>-a</literal>, however
610 this tells the proxy to log in the same file as given
611 by <literal>-l</literal>.
615 <entry><literal>server-apdu</literal></entry>
617 Log APDUs as reported by YAZ for the
618 communication between the proxy and the server (backend).
622 <entry><literal>clients-requests</literal></entry>
624 Log a brief description about requests transferred between
625 the client and the proxy. The name of the request and the size
626 of the APDU is logged.
630 <entry><literal>server-requests</literal></entry>
632 Log a brief description about requests transferred between
633 the proxy and the server (backend). The name of the request
634 and the size of the APDU is logged.
642 To log communication in details between the proxy and the backend, th
643 following configuration could be used:
645 <target name="mytarget">
646 <log>server-apdu server-requests</log>
653 <section id="proxy-usage">
654 <title>Proxy Manual Pages</title>
655 <refentry id="yazproxy-man">
660 <section id="otherinfo-encoding">
661 <title>OtherInformation Encoding</title>
663 The proxy uses the OtherInformation definition to carry
664 information about the target address and cookie.
667 OtherInformation ::= [201] IMPLICIT SEQUENCE OF SEQUENCE{
668 category [1] IMPLICIT InfoCategory OPTIONAL,
670 characterInfo [2] IMPLICIT InternationalString,
671 binaryInfo [3] IMPLICIT OCTET STRING,
672 externallyDefinedInfo [4] IMPLICIT EXTERNAL,
673 oid [5] IMPLICIT OBJECT IDENTIFIER}}
675 InfoCategory ::= SEQUENCE{
676 categoryTypeId [1] IMPLICIT OBJECT IDENTIFIER OPTIONAL,
677 categoryValue [2] IMPLICIT INTEGER}
680 The <literal>categoryTypeId</literal> is either
681 OID 1.2.840.10003.10.1000.81.1, 1.2.840.10003.10.1000.81.2
682 for proxy target and proxy cookie respectively. The
683 integer element <literal>category</literal> is set to 0.
684 The value proxy and cookie is stored in element
685 <literal>characterInfo</literal> of the <literal>information</literal>
689 <section id="yazproxy-schema">
690 <title>YAZ Proxy Configuration Schema</title>
692 Here an XML Schema for the YAZ proxy configuration file.
693 The schema, <filename>yazproxy.xsd</filename> is located in sub
694 directory <filename>etc</filename> of the distribution.
697 <?xml version="1.0"?>
698 <!-- XML Schema for YAZ proxy config file.
699 $Id: reference.xml,v 1.13 2005-09-07 09:28:46 adam Exp $
702 xmlns:xs="http://www.w3.org/2001/XMLSchema"
703 xmlns:exp="http://explain.z3950.org/dtd/2.0/"
704 xmlns="http://indexdata.dk/yazproxy/schema/0.9/"
705 targetNamespace="http://indexdata.dk/yazproxy/schema/0.9/"
707 <xs:import namespace="http://explain.z3950.org/dtd/2.0/"
708 schemaLocation="zeerex-2.0.xsd"/>
709 <xs:element name="proxy">
712 <xs:element ref="target" minOccurs="0" maxOccurs="unbounded"/>
713 <xs:element ref="max-clients" minOccurs="0"/>
714 <xs:element ref="log" minOccurs="0"/>
715 <xs:element ref="module" minOccurs="0"/>
720 <xs:element name="target">
723 <xs:element ref="url" minOccurs="0" maxOccurs="unbounded"/>
724 <xs:element ref="target-timeout" minOccurs="0"/>
725 <xs:element ref="client-timeout" minOccurs="0"/>
726 <xs:element ref="keepalive" minOccurs="0"/>
727 <xs:element ref="limit" minOccurs="0"/>
728 <xs:element ref="attribute" minOccurs="0" maxOccurs="unbounded"/>
729 <xs:element ref="syntax" minOccurs="0" maxOccurs="unbounded"/>
730 <xs:element ref="preinit" minOccurs="0"/>
731 <xs:element ref="exp:explain" minOccurs="0"/>
732 <xs:element ref="cql2rpn" minOccurs="0"/>
733 <xs:element ref="target-authentication" minOccurs="0"/>
734 <xs:element ref="client-authentication" minOccurs="0"/>
735 <xs:element ref="negotiation-charset" minOccurs="0"/>
736 <xs:element ref="negotiation-lang" minOccurs="0"/>
738 <xs:attribute name="default" type="xs:string" use="optional"/>
739 <xs:attribute name="name" type="xs:string"/>
740 <xs:attribute name="database" type="xs:string"/>
744 <xs:element name="url" type="xs:string"/>
745 <xs:element name="target-timeout" type="xs:integer"/>
746 <xs:element name="client-timeout" type="xs:integer"/>
747 <xs:element name="bandwidth" type="xs:integer"/>
748 <xs:element name="pdu" type="xs:integer"/>
749 <xs:element name="retrieve" type="xs:integer"/>
750 <xs:element name="preinit" type="xs:integer"/>
751 <xs:element name="cql2rpn" type="xs:string"/>
752 <xs:element name="target-authentication">
755 <xs:extension base="xs:string">
756 <xs:attribute name="type" type="xs:string"/>
762 <xs:element name="client-authentication">
765 <xs:extension base="xs:string">
766 <xs:attribute name="module" type="xs:string"/>
767 <xs:attribute name="args" type="xs:string"/>
773 <xs:element name="negotiation-charset" type="xs:string"/>
774 <xs:element name="negotiation-lang" type="xs:string"/>
776 <xs:element name="keepalive">
779 <xs:element ref="bandwidth" minOccurs="0"/>
780 <xs:element ref="pdu" minOccurs="0"/>
784 <xs:element name="limit">
787 <xs:element ref="bandwidth" minOccurs="0"/>
788 <xs:element ref="pdu" minOccurs="0"/>
789 <xs:element ref="retrieve" minOccurs="0"/>
794 <xs:element name="attribute">
796 <xs:attribute name="type" type="xs:string"/>
797 <xs:attribute name="value" type="xs:string"/>
798 <xs:attribute name="error" type="xs:integer"/>
802 <xs:element name="syntax">
805 <xs:element ref="title" minOccurs="0"/>
806 <xs:element ref="name" minOccurs="0" maxOccurs="unbounded"/>
808 <xs:attribute name="error" type="xs:string" />
809 <xs:attribute name="type" type="xs:string" />
810 <xs:attribute name="marcxml" type="xs:string" />
811 <xs:attribute name="identifier" type="xs:string" />
812 <xs:attribute name="stylesheet" type="xs:string" />
813 <xs:attribute name="backendtype" type="xs:string" />
814 <xs:attribute name="backendcharset" type="xs:string" />
815 <xs:attribute name="usemarconstage1" type="xs:string" />
816 <xs:attribute name="usemarconstage2" type="xs:string" />
820 <xs:element name="title" type="xs:string"/>
821 <xs:element name="name" type="xs:string"/>
823 <xs:element name="max-clients" type="xs:integer"/>
824 <xs:element name="log" type="xs:string"/>
825 <xs:element name="module" type="xs:string"/>
833 <!-- Keep this comment at the end of the file
838 sgml-minimize-attributes:nil
839 sgml-always-quote-attributes:t
842 sgml-parent-document: "yazproxy.xml"
843 sgml-local-catalogs: nil
844 sgml-namecase-general:t