1 <chapter id="tools"><title>Supporting Tools</title>
4 In support of the service API - primarily the ASN module, which
5 provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains
6 a collection of tools that support the development of applications.
9 <sect1 id="tools.query"><title>Query Syntax Parsers</title>
12 Since the type-1 (RPN) query structure has no direct, useful string
13 representation, every origin application needs to provide some form of
14 mapping from a local query notation or representation to a
15 <token>Z_RPNQuery</token> structure. Some programmers will prefer to
16 construct the query manually, perhaps using
17 <function>odr_malloc()</function> to simplify memory management.
18 The &yaz; distribution includes three separate, query-generating tools
19 that may be of use to you.
22 <sect2 id="PQF"><title>Prefix Query Format</title>
25 Since RPN or reverse polish notation is really just a fancy way of
26 describing a suffix notation format (operator follows operands), it
27 would seem that the confusion is total when we now introduce a prefix
28 notation for RPN. The reason is one of simple laziness - it's somewhat
29 simpler to interpret a prefix format, and this utility was designed
30 for maximum simplicity, to provide a baseline representation for use
31 in simple test applications and scripting environments (like Tcl). The
32 demonstration client included with YAZ uses the PQF.
37 The PQF have been adopted by other parties developing Z39.50
38 software. It is often referred to as Prefix Query Notation
43 The PQF is defined by the pquery module in the YAZ library.
44 There are two sets of function that have similar behavior. First
45 set operates on a PQF parser handle, second set doesn't. First set
46 set of functions are more flexible than the second set. Second set
47 is obsolete and is only provided to ensure backwards compatibility.
50 First set of functions all operate on a PQF parser handle:
53 #include <yaz/pquery.h>
55 YAZ_PQF_Parser yaz_pqf_create (void);
57 void yaz_pqf_destroy (YAZ_PQF_Parser p);
59 Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
61 Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
62 Odr_oid **attributeSetId, const char *qbuf);
65 int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
68 A PQF parser is created and destructed by functions
69 <function>yaz_pqf_create</function> and
70 <function>yaz_pqf_destroy</function> respectively.
71 Function <function>yaz_pqf_parse</function> parses query given
72 by string <literal>qbuf</literal>. If parsing was successful,
73 a Z39.50 RPN Query is returned which is created using ODR stream
74 <literal>o</literal>. If parsing failed, a NULL pointer is
76 Function <function>yaz_pqf_scan</function> takes a scan query in
77 <literal>qbuf</literal>. If parsing was successful, the function
78 returns attributes plus term pointer and modifies
79 <literal>attributeSetId</literal> to hold attribute set for the
80 scan request - both allocated using ODR stream <literal>o</literal>.
81 If parsing failed, yaz_pqf_scan returns a NULL pointer.
82 Error information for bad queries can be obtained by a call to
83 <function>yaz_pqf_error</function> which returns an error code and
84 modifies <literal>*msg</literal> to point to an error description,
85 and modifies <literal>*off</literal> to the offset within last
86 query were parsing failed.
89 The second set of functions are declared as follows:
92 #include <yaz/pquery.h>
94 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
96 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
97 Odr_oid **attributeSetP, const char *qbuf);
99 int p_query_attset (const char *arg);
102 The function <function>p_query_rpn()</function> takes as arguments an
103 &odr; stream (see section <link linkend="odr">The ODR Module</link>)
104 to provide a memory source (the structure created is released on
105 the next call to <function>odr_reset()</function> on the stream), a
106 protocol identifier (one of the constants <token>PROTO_Z3950</token> and
107 <token>PROTO_SR</token>), an attribute set reference, and
108 finally a null-terminated string holding the query string.
111 If the parse went well, <function>p_query_rpn()</function> returns a
112 pointer to a <literal>Z_RPNQuery</literal> structure which can be
113 placed directly into a <literal>Z_SearchRequest</literal>.
114 If parsing failed, due to syntax error, a NULL pointer is returned.
117 The <literal>p_query_attset</literal> specifies which attribute set
118 to use if the query doesn't specify one by the
119 <literal>@attrset</literal> operator.
120 The <literal>p_query_attset</literal> returns 0 if the argument is a
121 valid attribute set specifier; otherwise the function returns -1.
125 The grammar of the PQF is as follows:
129 query ::= top-set query-struct.
131 top-set ::= [ '@attrset' string ]
133 query-struct ::= attr-spec | simple | complex | '@term' term-type query
135 attr-spec ::= '@attr' [ string ] string query-struct
137 complex ::= operator query-struct query-struct.
139 operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
141 simple ::= result-set | term.
143 result-set ::= '@set' string.
147 proximity ::= exclusion distance ordered relation which-code unit-code.
149 exclusion ::= '1' | '0' | 'void'.
151 distance ::= integer.
153 ordered ::= '1' | '0'.
155 relation ::= integer.
157 which-code ::= 'known' | 'private' | integer.
159 unit-code ::= integer.
161 term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
165 You will note that the syntax above is a fairly faithful
166 representation of RPN, except for the Attribute, which has been
167 moved a step away from the term, allowing you to associate one or more
168 attributes with an entire query structure. The parser will
169 automatically apply the given attributes to each term as required.
173 The @attr operator is followed by an attribute specification
174 (<literal>attr-spec</literal> above). The specification consists
175 of an optional attribute set, an attribute type-value pair and
176 a sub-query. The attribute type-value pair is packed in one string:
177 an attribute type, an equals sign, and an attribute value, like this:
178 <literal>@attr 1=1003</literal>.
179 The type is always an integer but the value may be either an
180 integer or a string (if it doesn't start with a digit character).
181 A string attribute-value is encoded as a Type-1 ``complex''
182 attribute with the list of values containing the single string
183 specified, and including no semantic indicators.
187 Version 3 of the Z39.50 specification defines various encoding of terms.
188 Use <literal>@term </literal> <replaceable>type</replaceable>
189 <replaceable>string</replaceable>,
190 where type is one of: <literal>general</literal>,
191 <literal>numeric</literal> or <literal>string</literal>
192 (for InternationalString).
193 If no term type has been given, the <literal>general</literal> form
194 is used. This is the only encoding allowed in both versions 2 and 3
195 of the Z39.50 standard.
198 <sect3 id="PQF-prox">
199 <title>Using Proximity Operators with PQF</title>
202 This is an advanced topic, describing how to construct
203 queries that make very specific requirements on the
204 relative location of their operands.
205 You may wish to skip this section and go straight to
206 <link linkend="pqf-examples">the example PQF queries</link>.
211 Most Z39.50 servers do not support proximity searching, or
212 support only a small subset of the full functionality that
213 can be expressed using the PQF proximity operator. Be
214 aware that the ability to <emphasis>express</emphasis> a
215 query in PQF is no guarantee that any given server will
216 be able to <emphasis>execute</emphasis> it.
222 The proximity operator <literal>@prox</literal> is a special
223 and more restrictive version of the conjunction operator
224 <literal>@and</literal>. Its semantics are described in
225 section 3.7.2 (Proximity) of Z39.50 the standard itself, which
226 can be read on-line at
227 <ulink url="&url.z39.50.proximity;"/>
230 In PQF, the proximity operation is represented by a sequence
233 @prox <replaceable>exclusion</replaceable> <replaceable>distance</replaceable> <replaceable>ordered</replaceable> <replaceable>relation</replaceable> <replaceable>which-code</replaceable> <replaceable>unit-code</replaceable>
235 in which the meanings of the parameters are as described in in
236 the standard, and they can take the following values:
238 <listitem><formalpara><title>exclusion</title><para>
239 0 = false (i.e. the proximity condition specified by the
240 remaining parameters must be satisfied) or
241 1 = true (the proximity condition specified by the
242 remaining parameters must <emphasis>not</emphasis> be
244 </para></formalpara></listitem>
245 <listitem><formalpara><title>distance</title><para>
246 An integer specifying the difference between the locations
247 of the operands: e.g. two adjacent words would have
248 distance=1 since their locations differ by one unit.
249 </para></formalpara></listitem>
250 <listitem><formalpara><title>ordered</title><para>
251 1 = ordered (the operands must occur in the order the
252 query specifies them) or
253 0 = unordered (they may appear in either order).
254 </para></formalpara></listitem>
255 <listitem><formalpara><title>relation</title><para>
256 Recognised values are
260 4 (greaterThanOrEqual),
263 </para></formalpara></listitem>
264 <listitem><formalpara><title>which-code</title><para>
265 <literal>known</literal>
268 (the unit-code parameter is taken from the well-known list
269 of alternatives described in below) or
270 <literal>private</literal>
273 (the unit-code paramater has semantics specific to an
274 out-of-band agreement such as a profile).
275 </para></formalpara></listitem>
276 <listitem><formalpara><title>unit-code</title><para>
277 If the which-code parameter is <literal>known</literal>
278 then the recognised values are
290 If which-code is <literal>private</literal> then the
291 acceptable values are determined by the profile.
292 </para></formalpara></listitem>
294 (The numeric values of the relation and well-known unit-code
295 parameters are taken straight from
296 <ulink url="&url.z39.50.proximity.asn1;"
297 >the ASN.1</ulink> of the proximity structure in the standard.)
301 <sect3 id="pqf-examples"><title>PQF queries</title>
303 <example id="example.pqf.simple.terms">
304 <title>PQF queries using simple terms</title>
313 <example id="pqf.example.pqf.boolean.operators">
314 <title>PQF boolean operators</title>
317 @or "dylan" "zimmerman"
319 @and @or dylan zimmerman when
321 @and when @or dylan zimmerman
325 <example id="example.pqf.result.sets">
326 <title>PQF references to result sets</title>
331 @and @set seta @set setb
335 <example id="example.pqf.attributes">
336 <title>Attributes for terms</title>
341 @attr 1=4 @attr 4=1 "self portrait"
343 @attrset exp1 @attr 1=1 CategoryList
345 @attr gils 1=2008 Copenhagen
347 @attr 1=/book/title computer
351 <example id="example.pqf.proximity">
352 <title>PQF Proximity queries</title>
355 @prox 0 3 1 2 k 2 dylan zimmerman
358 Here the parameters 0, 3, 1, 2, k and 2 represent exclusion,
359 distance, ordered, relation, which-code and unit-code, in that
363 exclusion = 0: the proximity condition must hold
366 distance = 3: the terms must be three units apart
369 ordered = 1: they must occur in the order they are specified
372 relation = 2: lessThanOrEqual (to the distance of 3 units)
375 which-code is ``known'', so the standard unit-codes are used
381 So the whole proximity query means that the words
382 <literal>dylan</literal> and <literal>zimmerman</literal> must
383 both occur in the record, in that order, differing in position
384 by three or fewer words (i.e. with two or fewer words between
385 them.) The query would find ``Bob Dylan, aka. Robert
386 Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman''
387 since the distance in this case is four.
391 <example id="example.pqf.search.term.type">
392 <title>PQF specification of search term type</title>
395 @term string "a UTF-8 string, maybe?"
399 <example id="example.pqf.mixed.queries">
400 <title>PQF mixed queries</title>
403 @or @and bob dylan @set Result-1
405 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
407 @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
411 The last of these examples is a spatial search: in
412 <ulink url="http://www.gils.net/prof_v2.html#sec_7_4"
413 >the GILS attribute set</ulink>,
415 2038 indicates West Bounding Coordinate and
416 2030 indicates East Bounding Coordinate,
417 so the query is for areas extending from -114 degrees
418 to no more than -109 degrees.
425 <sect2 id="CCL"><title>CCL</title>
428 Not all users enjoy typing in prefix query structures and numerical
429 attribute values, even in a minimalistic test client. In the library
430 world, the more intuitive Common Command Language - CCL (ISO 8777)
431 has enjoyed some popularity - especially before the widespread
432 availability of graphical interfaces. It is still useful in
433 applications where you for some reason or other need to provide a
434 symbolic language for expressing boolean query structures.
437 <sect3 id="ccl.syntax">
438 <title>CCL Syntax</title>
441 The CCL parser obeys the following grammar for the FIND argument.
442 The syntax is annotated by in the lines prefixed by
443 <literal>--</literal>.
447 CCL-Find ::= CCL-Find Op Elements
450 Op ::= "and" | "or" | "not"
451 -- The above means that Elements are separated by boolean operators.
453 Elements ::= '(' CCL-Find ')'
456 | Qualifiers Relation Terms
457 | Qualifiers Relation '(' CCL-Find ')'
458 | Qualifiers '=' string '-' string
459 -- Elements is either a recursive definition, a result set reference, a
460 -- list of terms, qualifiers followed by terms, qualifiers followed
461 -- by a recursive definition or qualifiers in a range (lower - upper).
463 Set ::= 'set' = string
464 -- Reference to a result set
466 Terms ::= Terms Prox Term
468 -- Proximity of terms.
472 -- This basically means that a term may include a blank
474 Qualifiers ::= Qualifiers ',' string
476 -- Qualifiers is a list of strings separated by comma
478 Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
479 -- Relational operators. This really doesn't follow the ISO8777
483 -- Proximity operator
487 <example id="example.ccl.queries">
488 <title>CCL queries</title>
490 The following queries are all valid:
502 (dylan and bob) or set=1
506 Assuming that the qualifiers <literal>ti</literal>,
507 <literal>au</literal>
508 and <literal>date</literal> are defined we may use:
514 au=(bob dylan and slow train coming)
516 date>1980 and (ti=((self portrait)))
522 <sect3 id="ccl.qualifiers">
523 <title>CCL Qualifiers</title>
526 Qualifiers are used to direct the search to a particular searchable
527 index, such as title (ti) and author indexes (au). The CCL standard
528 itself doesn't specify a particular set of qualifiers, but it does
529 suggest a few short-hand notations. You can customize the CCL parser
530 to support a particular set of qualifiers to reflect the current target
531 profile. Traditionally, a qualifier would map to a particular
532 use-attribute within the BIB-1 attribute set. It is also
533 possible to set other attributes, such as the structure
538 A CCL profile is a set of predefined CCL qualifiers that may be
539 read from a file or set in the CCL API.
540 The YAZ client reads its CCL qualifiers from a file named
541 <filename>default.bib</filename>. There are four types of
542 lines in a CCL profile: qualifier specification,
543 qualifier alias, comments and directives.
545 <sect4 id="ccl.qualifier.specification">
546 <title>Qualifier specification</title>
548 A qualifier specification is of the form:
552 <replaceable>qualifier-name</replaceable>
553 [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable>
554 [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable> ...
558 where <replaceable>qualifier-name</replaceable> is the name of the
559 qualifier to be used (eg. <literal>ti</literal>),
560 <replaceable>type</replaceable> is attribute type in the attribute
561 set (Bib-1 is used if no attribute set is given) and
562 <replaceable>val</replaceable> is attribute value.
563 The <replaceable>type</replaceable> can be specified as an
564 integer or as it be specified either as a single-letter:
565 <literal>u</literal> for use,
566 <literal>r</literal> for relation,<literal>p</literal> for position,
567 <literal>s</literal> for structure,<literal>t</literal> for truncation
568 or <literal>c</literal> for completeness.
569 The attributes for the special qualifier name <literal>term</literal>
570 are used when no CCL qualifier is given in a query.
571 <table id="ccl.common.bib1.attributes">
572 <title>Common Bib-1 attributes</title>
574 <colspec colwidth="2*" colname="type"></colspec>
575 <colspec colwidth="9*" colname="description"></colspec>
579 <entry>Description</entry>
584 <entry><literal>u=</literal><replaceable>value</replaceable></entry>
586 Use attribute (1). Common use attributes are
587 1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date,
588 62 Subject, 1003 Author), 1016 Any. Specify value
594 <entry><literal>r=</literal><replaceable>value</replaceable></entry>
596 Relation attribute (2). Common values are
597 1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>,
598 100 phonetic, 101 stem, 102 relevance, 103 always matches.
603 <entry><literal>p=</literal><replaceable>value</replaceable></entry>
605 Position attribute (3). Values: 1 first in field, 2
606 first in any subfield, 3 any position in field.
611 <entry><literal>s=</literal><replaceable>value</replaceable></entry>
613 Structure attribute (4). Values: 1 phrase, 2 word,
614 3 key, 4 year, 5 date, 6 word list, 100 date (un),
615 101 name (norm), 102 name (un), 103 structure, 104 urx,
616 105 free-form-text, 106 document-text, 107 local-number,
617 108 string, 109 numeric string.
622 <entry><literal>t=</literal><replaceable>value</replaceable></entry>
624 Truncation attribute (5). Values: 1 right, 2 left,
625 3 left& right, 100 none, 101 process #, 102 regular-1,
626 103 regular-2, 104 CCL.
631 <entry><literal>c=</literal><replaceable>value</replaceable></entry>
633 Completeness attribute (6). Values: 1 incomplete subfield,
634 2 complete subfield, 3 complete field.
643 Refer to <xref linkend="bib1"/> or the complete
644 <ulink url="&url.z39.50.attset.bib1;">list of Bib-1 attributes</ulink>
647 It is also possible to specify non-numeric attribute values,
648 which are used in combination with certain types.
649 The special combinations are:
651 <table id="ccl.special.attribute.combos">
652 <title>Special attribute combos</title>
654 <colspec colwidth="2*" colname="name"></colspec>
655 <colspec colwidth="9*" colname="description"></colspec>
659 <entry>Description</entry>
664 <entry><literal>s=pw</literal></entry><entry>
665 The structure is set to either word or phrase depending
666 on the number of tokens in a term (phrase-word).
670 <entry><literal>s=al</literal></entry><entry>
671 Each token in the term is ANDed. (and-list).
672 This does not set the structure at all.
676 <row><entry><literal>s=ol</literal></entry><entry>
677 Each token in the term is ORed. (or-list).
678 This does not set the structure at all.
682 <row><entry><literal>r=o</literal></entry><entry>
683 Allows ranges and the operators greather-than, less-than, ...
685 This sets Bib-1 relation attribute accordingly (relation
686 ordered). A query construct is only treated as a range if
687 dash is used and that is surrounded by white-space. So
688 <literal>-1980</literal> is treated as term
689 <literal>"-1980"</literal> not <literal><= 1980</literal>.
690 If <literal>- 1980</literal> is used, however, that is
695 <row><entry><literal>r=r</literal></entry><entry>
696 Similar to <literal>r=o</literal> but assumes that terms
697 are non-negative (not prefixed with <literal>-</literal>).
698 Thus, a dash will always be treated as a range.
699 The construct <literal>1980-1990</literal> is
700 treated as a range with <literal>r=r</literal> but as a
701 single term <literal>"1980-1990"</literal> with
702 <literal>r=o</literal>. The special attribute
703 <literal>r=r</literal> is available in YAZ 2.0.24 or later.
707 <row><entry><literal>t=l</literal></entry><entry>
708 Allows term to be left-truncated.
709 If term is of the form <literal>?x</literal>, the resulting
710 Type-1 term is <literal>x</literal> and truncation is left.
714 <row><entry><literal>t=r</literal></entry><entry>
715 Allows term to be right-truncated.
716 If term is of the form <literal>x?</literal>, the resulting
717 Type-1 term is <literal>x</literal> and truncation is right.
721 <row><entry><literal>t=n</literal></entry><entry>
722 If term is does not include <literal>?</literal>, the
723 truncation attribute is set to none (100).
727 <row><entry><literal>t=b</literal></entry><entry>
728 Allows term to be both left&right truncated.
729 If term is of the form <literal>?x?</literal>, the
730 resulting term is <literal>x</literal> and trunctation is
731 set to both left&right.
735 <row><entry><literal>t=x</literal></entry><entry>
736 Allows masking anywhere in a term, thus fully supporting
737 # (mask one character) and ? (zero or more of any).
738 If masking is used, trunction is set to 102 (regexp-1 in term)
739 and the term is converted accordingly to a regular expression.
747 <example id="example.ccl.profile"><title>CCL profile</title>
749 Consider the following definition:
760 <literal>ti</literal> and <literal>au</literal> both set
761 structure attribute to phrase (s=1).
762 <literal>ti</literal>
763 sets the use-attribute to 4. <literal>au</literal> sets the
765 When no qualifiers are used in the query the structure-attribute is
766 set to free-form-text (105) (rule for <literal>term</literal>).
767 The <literal>date</literal> sets the relation attribute to
768 the relation used in the CCL query and sets the use attribute
772 You can combine attributes. To Search for "ranked title" you
775 ti,ranked=knuth computer
777 which will set relation=ranked, use=title, structure=phrase.
784 is a valid query. But
792 <sect4 id="ccl.qualifier.alias">
793 <title>Qualifier alias</title>
795 A qualifier alias is of the form:
798 <replaceable>q</replaceable>
799 <replaceable>q1</replaceable> <replaceable>q2</replaceable> ..
802 which declares <replaceable>q</replaceable> to
803 be an alias for <replaceable>q1</replaceable>,
804 <replaceable>q2</replaceable>... such that the CCL
805 query <replaceable>q=x</replaceable> is equivalent to
806 <replaceable>q1=x or q2=x or ...</replaceable>.
810 <sect4 id="ccl.comments">
811 <title>Comments</title>
813 Lines with white space or lines that begin with
814 character <literal>#</literal> are treated as comments.
818 <sect4 id="ccl.directives">
819 <title>Directives</title>
821 Directive specifications takes the form
823 <para><literal>@</literal><replaceable>directive</replaceable> <replaceable>value</replaceable>
825 <table id="ccl.directives.table">
826 <title>CCL directives</title>
828 <colspec colwidth="2*" colname="name"></colspec>
829 <colspec colwidth="8*" colname="description"></colspec>
830 <colspec colwidth="1*" colname="default"></colspec>
834 <entry>Description</entry>
835 <entry>Default</entry>
840 <entry>truncation</entry>
841 <entry>Truncation character</entry>
842 <entry><literal>?</literal></entry>
846 <entry>Specifies how multiple fields are to be
847 combined. There are two modes: <literal>or</literal>:
848 multiple qualifier fields are ORed,
849 <literal>merge</literal>: attributes for the qualifier
850 fields are merged and assigned to one term.
852 <entry><literal>merge</literal></entry>
856 <entry>Specificies if CCL operatores and qualifiers should be
857 compared with case sensitivity or not. Specify 0 for
858 case sensitive; 1 for case insensitive.</entry>
859 <entry><literal>0</literal></entry>
864 <entry>Specifies token for CCL operator AND.</entry>
865 <entry><literal>and</literal></entry>
870 <entry>Specifies token for CCL operator OR.</entry>
871 <entry><literal>or</literal></entry>
876 <entry>Specifies token for CCL operator NOT.</entry>
877 <entry><literal>not</literal></entry>
882 <entry>Specifies token for CCL operator SET.</entry>
883 <entry><literal>set</literal></entry>
891 <title>CCL API</title>
893 All public definitions can be found in the header file
894 <filename>ccl.h</filename>. A profile identifier is of type
895 <literal>CCL_bibset</literal>. A profile must be created with the call
896 to the function <function>ccl_qual_mk</function> which returns a profile
897 handle of type <literal>CCL_bibset</literal>.
901 To read a file containing qualifier definitions the function
902 <function>ccl_qual_file</function> may be convenient. This function
903 takes an already opened <literal>FILE</literal> handle pointer as
904 argument along with a <literal>CCL_bibset</literal> handle.
908 To parse a simple string with a FIND query use the function
911 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
912 int *error, int *pos);
915 which takes the CCL profile (<literal>bibset</literal>) and query
916 (<literal>str</literal>) as input. Upon successful completion the RPN
917 tree is returned. If an error occur, such as a syntax error, the integer
918 pointed to by <literal>error</literal> holds the error code and
919 <literal>pos</literal> holds the offset inside query string in which
924 An English representation of the error may be obtained by calling
925 the <literal>ccl_err_msg</literal> function. The error codes are
926 listed in <filename>ccl.h</filename>.
930 To convert the CCL RPN tree (type
931 <literal>struct ccl_rpn_node *</literal>)
932 to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
933 must be used. This function which is part of YAZ is implemented in
934 <filename>yaz-ccl.c</filename>.
935 After calling this function the CCL RPN tree is probably no longer
936 needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
940 A CCL profile may be destroyed by calling the
941 <function>ccl_qual_rm</function> function.
945 The token names for the CCL operators may be changed by setting the
946 globals (all type <literal>char *</literal>)
947 <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
948 <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
949 An operator may have aliases, i.e. there may be more than one name for
950 the operator. To do this, separate each alias with a space character.
954 <sect2 id="cql"><title>CQL</title>
956 <ulink url="&url.cql;">CQL</ulink>
957 - Common Query Language - was defined for the
958 <ulink url="&url.sru;">SRU</ulink> protocol.
959 In many ways CQL has a similar syntax to CCL.
960 The objective of CQL is different. Where CCL aims to be
961 an end-user language, CQL is <emphasis>the</emphasis> protocol
962 query language for SRU.
966 If you are new to CQL, read the
967 <ulink url="&url.cql.intro;">Gentle Introduction</ulink>.
971 The CQL parser in &yaz; provides the following:
975 It parses and validates a CQL query.
980 It generates a C structure that allows you to convert
981 a CQL query to some other query language, such as SQL.
986 The parser converts a valid CQL query to PQF, thus providing a
987 way to use CQL for both SRU servers and Z39.50 targets at the
993 The parser converts CQL to
994 <ulink url="&url.xcql;">XCQL</ulink>.
995 XCQL is an XML representation of CQL.
996 XCQL is part of the SRU specification. However, since SRU
997 supports CQL only, we don't expect XCQL to be widely used.
998 Furthermore, CQL has the advantage over XCQL that it is
1004 <sect3 id="cql.parsing"><title>CQL parsing</title>
1006 A CQL parser is represented by the <literal>CQL_parser</literal>
1007 handle. Its contents should be considered &yaz; internal (private).
1009 #include <yaz/cql.h>
1011 typedef struct cql_parser *CQL_parser;
1013 CQL_parser cql_parser_create(void);
1014 void cql_parser_destroy(CQL_parser cp);
1016 A parser is created by <function>cql_parser_create</function> and
1017 is destroyed by <function>cql_parser_destroy</function>.
1020 To parse a CQL query string, the following function
1023 int cql_parser_string(CQL_parser cp, const char *str);
1025 A CQL query is parsed by the <function>cql_parser_string</function>
1026 which takes a query <parameter>str</parameter>.
1027 If the query was valid (no syntax errors), then zero is returned;
1028 otherwise -1 is returned to indicate a syntax error.
1032 int cql_parser_stream(CQL_parser cp,
1033 int (*getbyte)(void *client_data),
1034 void (*ungetbyte)(int b, void *client_data),
1037 int cql_parser_stdio(CQL_parser cp, FILE *f);
1039 The functions <function>cql_parser_stream</function> and
1040 <function>cql_parser_stdio</function> parses a CQL query
1041 - just like <function>cql_parser_string</function>.
1042 The only difference is that the CQL query can be
1043 fed to the parser in different ways.
1044 The <function>cql_parser_stream</function> uses a generic
1045 byte stream as input. The <function>cql_parser_stdio</function>
1046 uses a <literal>FILE</literal> handle which is opened for reading.
1050 <sect3 id="cql.tree"><title>CQL tree</title>
1052 The the query string is valid, the CQL parser
1053 generates a tree representing the structure of the
1058 struct cql_node *cql_parser_result(CQL_parser cp);
1060 <function>cql_parser_result</function> returns the
1061 a pointer to the root node of the resulting tree.
1064 Each node in a CQL tree is represented by a
1065 <literal>struct cql_node</literal>.
1066 It is defined as follows:
1068 #define CQL_NODE_ST 1
1069 #define CQL_NODE_BOOL 2
1079 struct cql_node *modifiers;
1083 struct cql_node *left;
1084 struct cql_node *right;
1085 struct cql_node *modifiers;
1090 There are two node types: search term (ST) and boolean (BOOL).
1091 A modifier is treated as a search term too.
1094 The search term node has five members:
1098 <literal>index</literal>: index for search term.
1099 If an index is unspecified for a search term,
1100 <literal>index</literal> will be NULL.
1105 <literal>index_uri</literal>: index URi for search term
1106 or NULL if none could be resolved for the index.
1111 <literal>term</literal>: the search term itself.
1116 <literal>relation</literal>: relation for search term.
1121 <literal>relation_uri</literal>: relation URI for search term.
1126 <literal>modifiers</literal>: relation modifiers for search
1127 term. The <literal>modifiers</literal> list itself of cql_nodes
1128 each of type <literal>ST</literal>.
1135 The boolean node represents both <literal>and</literal>,
1136 <literal>or</literal>, not as well as
1141 <literal>left</literal> and <literal>right</literal>: left
1142 - and right operand respectively.
1147 <literal>modifiers</literal>: proximity arguments.
1154 <sect3 id="cql.to.pqf"><title>CQL to PQF conversion</title>
1156 Conversion to PQF (and Z39.50 RPN) is tricky by the fact
1157 that the resulting RPN depends on the Z39.50 target
1158 capabilities (combinations of supported attributes).
1159 In addition, the CQL and SRU operates on index prefixes
1160 (URI or strings), whereas the RPN uses Object Identifiers
1164 The CQL library of &yaz; defines a <literal>cql_transform_t</literal>
1165 type. It represents a particular mapping between CQL and RPN.
1166 This handle is created and destroyed by the functions:
1168 cql_transform_t cql_transform_open_FILE (FILE *f);
1169 cql_transform_t cql_transform_open_fname(const char *fname);
1170 void cql_transform_close(cql_transform_t ct);
1172 The first two functions create a tranformation handle from
1173 either an already open FILE or from a filename respectively.
1176 The handle is destroyed by <function>cql_transform_close</function>
1177 in which case no further reference of the handle is allowed.
1180 When a <literal>cql_transform_t</literal> handle has been created
1181 you can convert to RPN.
1183 int cql_transform_buf(cql_transform_t ct,
1184 struct cql_node *cn, char *out, int max);
1186 This function converts the CQL tree <literal>cn</literal>
1187 using handle <literal>ct</literal>.
1188 For the resulting PQF, you supply a buffer <literal>out</literal>
1189 which must be able to hold at at least <literal>max</literal>
1193 If conversion failed, <function>cql_transform_buf</function>
1194 returns a non-zero SRU error code; otherwise zero is returned
1195 (conversion successful). The meanings of the numeric error
1196 codes are listed in the SRU specifications at
1197 <ulink url="&url.sru.diagnostics.list;"/>
1200 If conversion fails, more information can be obtained by calling
1202 int cql_transform_error(cql_transform_t ct, char **addinfop);
1204 This function returns the most recently returned numeric
1205 error-code and sets the string-pointer at
1206 <literal>*addinfop</literal> to point to a string containing
1207 additional information about the error that occurred: for
1208 example, if the error code is 15 (``Illegal or unsupported context
1209 set''), the additional information is the name of the requested
1210 context set that was not recognised.
1213 The SRU error-codes may be translated into brief human-readable
1214 error messages using
1216 const char *cql_strerror(int code);
1220 If you wish to be able to produce a PQF result in a different
1221 way, there are two alternatives.
1223 void cql_transform_pr(cql_transform_t ct,
1224 struct cql_node *cn,
1225 void (*pr)(const char *buf, void *client_data),
1228 int cql_transform_FILE(cql_transform_t ct,
1229 struct cql_node *cn, FILE *f);
1231 The former function produces output to a user-defined
1232 output stream. The latter writes the result to an already
1233 open <literal>FILE</literal>.
1236 <sect3 id="cql.to.rpn">
1237 <title>Specification of CQL to RPN mappings</title>
1239 The file supplied to functions
1240 <function>cql_transform_open_FILE</function>,
1241 <function>cql_transform_open_fname</function> follows
1242 a structure found in many Unix utilities.
1243 It consists of mapping specifications - one per line.
1244 Lines starting with <literal>#</literal> are ignored (comments).
1247 Each line is of the form
1249 <replaceable>CQL pattern</replaceable><literal> = </literal> <replaceable> RPN equivalent</replaceable>
1253 An RPN pattern is a simple attribute list. Each attribute pair
1256 [<replaceable>set</replaceable>] <replaceable>type</replaceable><literal>=</literal><replaceable>value</replaceable>
1258 The attribute <replaceable>set</replaceable> is optional.
1259 The <replaceable>type</replaceable> is the attribute type,
1260 <replaceable>value</replaceable> the attribute value.
1263 The character <literal>*</literal> (asterisk) has special meaning
1264 when used in the RPN pattern.
1265 Each occurrence of <literal>*</literal> is substituted with the
1266 CQL matching name (index, relation, qualifier etc).
1267 This facility can be used to copy a CQL name verbatim to the RPN result.
1270 The following CQL patterns are recognized:
1272 <varlistentry><term>
1273 <literal>index.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
1277 This pattern is invoked when a CQL index, such as
1278 dc.title is converted. <replaceable>set</replaceable>
1279 and <replaceable>name</replaceable> are the context set and index
1281 Typically, the RPN specifies an equivalent use attribute.
1284 For terms not bound by an index the pattern
1285 <literal>index.cql.serverChoice</literal> is used.
1286 Here, the prefix <literal>cql</literal> is defined as
1287 <literal>http://www.loc.gov/zing/cql/cql-indexes/v1.0/</literal>.
1288 If this pattern is not defined, the mapping will fail.
1292 <literal>index.</literal><replaceable>set</replaceable><literal>.*</literal>
1293 is used when no other index pattern is matched.
1297 <varlistentry><term>
1298 <literal>qualifier.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
1303 For backwards compatibility, this is recognised as a synonym of
1304 <literal>index.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
1308 <varlistentry><term>
1309 <literal>relation.</literal><replaceable>relation</replaceable>
1313 This pattern specifies how a CQL relation is mapped to RPN.
1314 <replaceable>pattern</replaceable> is name of relation
1315 operator. Since <literal>=</literal> is used as
1316 separator between CQL pattern and RPN, CQL relations
1317 including <literal>=</literal> cannot be
1318 used directly. To avoid a conflict, the names
1319 <literal>ge</literal>,
1320 <literal>eq</literal>,
1321 <literal>le</literal>,
1322 must be used for CQL operators, greater-than-or-equal,
1323 equal, less-than-or-equal respectively.
1324 The RPN pattern is supposed to include a relation attribute.
1327 For terms not bound by a relation, the pattern
1328 <literal>relation.scr</literal> is used. If the pattern
1329 is not defined, the mapping will fail.
1332 The special pattern, <literal>relation.*</literal> is used
1333 when no other relation pattern is matched.
1338 <varlistentry><term>
1339 <literal>relationModifier.</literal><replaceable>mod</replaceable>
1343 This pattern specifies how a CQL relation modifier is mapped to RPN.
1344 The RPN pattern is usually a relation attribute.
1349 <varlistentry><term>
1350 <literal>structure.</literal><replaceable>type</replaceable>
1354 This pattern specifies how a CQL structure is mapped to RPN.
1355 Note that this CQL pattern is somewhat to similar to
1356 CQL pattern <literal>relation</literal>.
1357 The <replaceable>type</replaceable> is a CQL relation.
1360 The pattern, <literal>structure.*</literal> is used
1361 when no other structure pattern is matched.
1362 Usually, the RPN equivalent specifies a structure attribute.
1367 <varlistentry><term>
1368 <literal>position.</literal><replaceable>type</replaceable>
1372 This pattern specifies how the anchor (position) of
1373 CQL is mapped to RPN.
1374 The <replaceable>type</replaceable> is one
1375 of <literal>first</literal>, <literal>any</literal>,
1376 <literal>last</literal>, <literal>firstAndLast</literal>.
1379 The pattern, <literal>position.*</literal> is used
1380 when no other position pattern is matched.
1385 <varlistentry><term>
1386 <literal>set.</literal><replaceable>prefix</replaceable>
1390 This specification defines a CQL context set for a given prefix.
1391 The value on the right hand side is the URI for the set -
1392 <emphasis>not</emphasis> RPN. All prefixes used in
1393 index patterns must be defined this way.
1398 <varlistentry><term>
1399 <literal>set</literal>
1403 This specification defines a default CQL context set for index names.
1404 The value on the right hand side is the URI for the set.
1411 <example id="example.cql.to.rpn.mapping">
1412 <title>CQL to RPN mapping file</title>
1414 This simple file defines two context sets, three indexes and three
1415 relations, a position pattern and a default structure.
1417 <programlisting><![CDATA[
1418 set.cql = http://www.loc.gov/zing/cql/context-sets/cql/v1.1/
1419 set.dc = http://www.loc.gov/zing/cql/dc-indexes/v1.0/
1421 index.cql.serverChoice = 1=1016
1422 index.dc.title = 1=4
1423 index.dc.subject = 1=21
1429 position.any = 3=3 6=1
1435 With the mappings above, the CQL query
1439 is converted to the PQF:
1441 @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer"
1443 by rules <literal>index.cql.serverChoice</literal>,
1444 <literal>relation.scr</literal>, <literal>structure.*</literal>,
1445 <literal>position.any</literal>.
1452 is rejected, since <literal>position.right</literal> is
1458 >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x
1462 @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x"
1466 <example id="example.cql.to.rpn.string">
1467 <title>CQL to RPN string attributes</title>
1469 In this example we allow any index to be passed to RPN as
1472 <programlisting><![CDATA[
1473 # Identifiers for prefixes used in this file. (index.*)
1474 set.cql = info:srw/cql-context-set/1/cql-v1.1
1475 set.rpn = http://bogus/rpn
1476 set = http://bogus/rpn
1478 # The default index when none is specified by the query
1479 index.cql.serverChoice = 1=any
1488 The <literal>http://bogus/rpn</literal> context set is also the default
1489 so we can make queries such as
1493 which is converted to
1495 @attr 2=3 @attr 4=1 @attr 3=3 @attr 1=title "a"
1499 <example id="example.cql.to.rpn.bathprofile">
1500 <title>CQL to RPN using Bath Profile</title>
1502 The file <filename>etc/pqf.properties</filename> has mappings from
1503 the Bath Profile and Dublin Core to RPN.
1504 If YAZ is installed as a package it's usually located
1505 in <filename>/usr/share/yaz/etc</filename> and part of the
1506 development package, such as <literal>libyaz-dev</literal>.
1510 <sect3 id="cql.xcql"><title>CQL to XCQL conversion</title>
1512 Conversion from CQL to XCQL is trivial and does not
1513 require a mapping to be defined.
1514 There three functions to choose from depending on the
1515 way you wish to store the resulting output (XML buffer
1518 int cql_to_xml_buf(struct cql_node *cn, char *out, int max);
1519 void cql_to_xml(struct cql_node *cn,
1520 void (*pr)(const char *buf, void *client_data),
1522 void cql_to_xml_stdio(struct cql_node *cn, FILE *f);
1524 Function <function>cql_to_xml_buf</function> converts
1525 to XCQL and stores result in a user supplied buffer of a given
1529 <function>cql_to_xml</function> writes the result in
1530 a user defined output stream.
1531 <function>cql_to_xml_stdio</function> writes to a
1537 <sect1 id="tools.oid"><title>Object Identifiers</title>
1540 The basic YAZ representation of an OID is an array of integers,
1541 terminated with the value -1. This integer is of type
1542 <literal>Odr_oid</literal>.
1545 Fundamental OID operations and the type <literal>Odr_oid</literal>
1546 are defined in <filename>yaz/oid_util.h</filename>.
1549 An OID can either be declared as a automatic variable or it can
1550 allocated using the memory utilities or ODR/NMEM. It's
1551 guaranteed that an OID can fit in <literal>OID_SIZE</literal> integers.
1553 <example id="tools.oid.bib1.1"><title>Create OID on stack</title>
1555 We can create an OID for the Bib-1 attribute set with:
1557 Odr_oid bib1[OID_SIZE];
1569 And OID may also be filled from a string-based representation using
1570 dots (.). This is achieved by function
1572 int oid_dotstring_to_oid(const char *name, Odr_oid *oid);
1574 This functions returns 0 if name could be converted; -1 otherwise.
1576 <example id="tools.oid.bib1.2"><title>Using oid_oiddotstring_to_oid</title>
1578 We can fill the Bib-1 attribute set OID easier with:
1580 Odr_oid bib1[OID_SIZE];
1581 oid_oiddotstring_to_oid("1.2.840.10003.3.1", bib1);
1586 We can also allocate an OID dynamically on a ODR stream with:
1588 Odr_oid *odr_getoidbystr(ODR o, const char *str);
1590 This creates an OID from string-based representation using dots.
1591 This function take an &odr; stream as parameter. This stream is used to
1592 allocate memory for the data elements, which is released on a
1593 subsequent call to <function>odr_reset()</function> on that stream.
1596 <example id="tools.oid.bib1.3"><title>Using odr_getoidbystr</title>
1598 We can create a OID for the Bib-1 attribute set with:
1600 Odr_oid *bib1 = odr_getoidbystr(odr, "1.2.840.10003.3.1");
1608 char *oid_oid_to_dotstring(const Odr_oid *oid, char *oidbuf)
1610 does the reverse of <function>oid_oiddotstring_to_oid</function>. It
1611 converts an OID to the string-based representation using dots.
1612 The supplied char buffer <literal>oidbuf</literal> holds the resulting
1613 string and must be at least <literal>OID_STR_MAX</literal> in size.
1617 OIDs can be copied with <function>oid_oidcpy</function> which takes
1618 two OID lists as arguments. Alternativly, an OID copy can be allocated
1619 on a ODR stream with:
1621 Odr_oid *odr_oiddup(ODR odr, const Odr_oid *o);
1626 OIDs can be compared with <function>oid_oidcmp</function> which returns
1627 zero if the two OIDs provided are identical; non-zero otherwise.
1630 <sect2 id="tools.oid.database"><title>OID database</title>
1632 From YAZ version 3 and later, the oident system has been replaced
1633 by an OID database. OID database is a misnomer .. the old odient
1634 system was also a database.
1637 The OID database is really just a map between named Object Identifiers
1638 (string) and their OID raw equivalents. Most operations either
1639 convert from string to OID or other way around.
1642 Unfortunately, whenever we supply a string we must also specify the
1643 <emphasis>OID class</emphasis>. The class is necessary because some
1644 strings correspond to multiple OIDs. An example of such a string is
1645 <literal>Bib-1</literal> which may either be an attribute-set
1646 or a diagnostic-set.
1649 Applications using the YAZ database should include
1650 <filename>yaz/oid_db.h</filename>.
1653 A YAZ database handle is of type <literal>yaz_oid_db_t</literal>.
1654 Actually that's a pointer. You need not think deal with that.
1655 YAZ has a built-in database which can be considered "constant" for
1657 We can get hold that by using function <function>yaz_oid_std</function>.
1660 All functions with prefix <function>yaz_string_to_oid</function>
1661 converts from class + string to OID. We have variants of this
1662 operation due to different memory allocation strategies.
1665 All functions with prefix
1666 <function>yaz_oid_to_string</function> converts from OID to string
1670 <example id="tools.oid.bib1.4"><title>Create OID with YAZ DB</title>
1672 We can create an OID for the Bib-1 attribute set on the ODR stream
1676 yaz_string_to_oid_odr(yaz_oid_std(), CLASS_ATTSET, "Bib-1", odr);
1678 This is more complex than using <function>odr_getoidbystr</function>.
1679 You would only use <function>yaz_string_to_oid_odr</function> when the
1680 string (here Bib-1) is supplied by a user or configuration.
1685 <sect2 id="tools.oid.std"><title>Standard OIDs</title>
1688 All the object identifers in the standard OID database as returned
1689 by <function>yaz_oid_std</function> can referenced directly in a
1690 program as a constant OID.
1691 Each constant OID is prefixed with <literal>yaz_oid_</literal> -
1692 followed by OID class (lowercase) - then by OID name (normalized and
1696 See <xref linkend="list-oids"/> for list of all object identifiers
1698 These are declared in <filename>yaz/oid_std.h</filename> but are
1699 included by <filename>yaz/oid_db.h</filename> as well.
1702 <example id="tools.oid.bib1.5"><title>Use a built-in OID</title>
1704 We can allocate our own OID filled with the constant OID for
1707 Odr_oid *bib1 = odr_oiddup(o, yaz_oid_attset_bib1);
1713 <sect1 id="tools.nmem"><title>Nibble Memory</title>
1716 Sometimes when you need to allocate and construct a large,
1717 interconnected complex of structures, it can be a bit of a pain to
1718 release the associated memory again. For the structures describing the
1719 Z39.50 PDUs and related structures, it is convenient to use the
1720 memory-management system of the &odr; subsystem (see
1721 <xref linkend="odr.use"/>). However, in some circumstances
1722 where you might otherwise benefit from using a simple nibble memory
1723 management system, it may be impractical to use
1724 <function>odr_malloc()</function> and <function>odr_reset()</function>.
1725 For this purpose, the memory manager which also supports the &odr;
1726 streams is made available in the NMEM module. The external interface
1727 to this module is given in the <filename>nmem.h</filename> file.
1731 The following prototypes are given:
1735 NMEM nmem_create(void);
1736 void nmem_destroy(NMEM n);
1737 void *nmem_malloc(NMEM n, size_t size);
1738 void nmem_reset(NMEM n);
1739 size_t nmem_total(NMEM n);
1740 void nmem_init(void);
1741 void nmem_exit(void);
1745 The <function>nmem_create()</function> function returns a pointer to a
1746 memory control handle, which can be released again by
1747 <function>nmem_destroy()</function> when no longer needed.
1748 The function <function>nmem_malloc()</function> allocates a block of
1749 memory of the requested size. A call to <function>nmem_reset()</function>
1750 or <function>nmem_destroy()</function> will release all memory allocated
1751 on the handle since it was created (or since the last call to
1752 <function>nmem_reset()</function>. The function
1753 <function>nmem_total()</function> returns the number of bytes currently
1754 allocated on the handle.
1758 The nibble memory pool is shared amongst threads. POSIX
1759 mutex'es and WIN32 Critical sections are introduced to keep the
1760 module thread safe. Function <function>nmem_init()</function>
1761 initializes the nibble memory library and it is called automatically
1762 the first time the <literal>YAZ.DLL</literal> is loaded. &yaz; uses
1763 function <function>DllMain</function> to achieve this. You should
1764 <emphasis>not</emphasis> call <function>nmem_init</function> or
1765 <function>nmem_exit</function> unless you're absolute sure what
1766 you're doing. Note that in previous &yaz; versions you'd have to call
1767 <function>nmem_init</function> yourself.
1772 <sect1 id="tools.log"><title>Log</title>
1774 &yaz; has evolved a fairly complex log system which should be useful both
1775 for debugging &yaz; itself, debugging applications that use &yaz;, and for
1776 production use of those applications.
1779 The log functions are declared in header <filename>yaz/log.h</filename>
1780 and implemented in <filename>src/log.c</filename>.
1781 Due to name clash with syslog and some math utilities the logging
1782 interface has been modified as of YAZ 2.0.29. The obsolete interface
1783 is still available if in header file <filename>yaz/log.h</filename>.
1784 The key points of the interface are:
1787 void yaz_log(int level, const char *fmt, ...)
1789 void yaz_log_init(int level, const char *prefix, const char *name);
1790 void yaz_log_init_file(const char *fname);
1791 void yaz_log_init_level(int level);
1792 void yaz_log_init_prefix(const char *prefix);
1793 void yaz_log_time_format(const char *fmt);
1794 void yaz_log_init_max_size(int mx);
1796 int yaz_log_mask_str(const char *str);
1797 int yaz_log_module_level(const char *name);
1801 The reason for the whole log module is the <function>yaz_log</function>
1802 function. It takes a bitmask indicating the log levels, a
1803 <literal>printf</literal>-like format string, and a variable number of
1808 The <literal>log level</literal> is a bit mask, that says on which level(s)
1809 the log entry should be made, and optionally set some behaviour of the
1810 logging. In the most simple cases, it can be one of <literal>YLOG_FATAL,
1811 YLOG_DEBUG, YLOG_WARN, YLOG_LOG</literal>. Those can be combined with bits
1812 that modify the way the log entry is written:<literal>YLOG_ERRNO,
1813 YLOG_NOTIME, YLOG_FLUSH</literal>.
1814 Most of the rest of the bits are deprecated, and should not be used. Use
1815 the dynamic log levels instead.
1819 Applications that use &yaz;, should not use the LOG_LOG for ordinary
1820 messages, but should make use of the dynamic loglevel system. This consists
1821 of two parts, defining the loglevel and checking it.
1825 To define the log levels, the (main) program should pass a string to
1826 <function>yaz_log_mask_str</function> to define which log levels are to be
1827 logged. This string should be a comma-separated list of log level names,
1828 and can contain both hard-coded names and dynamic ones. The log level
1829 calculation starts with <literal>YLOG_DEFAULT_LEVEL</literal> and adds a bit
1830 for each word it meets, unless the word starts with a '-', in which case it
1831 clears the bit. If the string <literal>'none'</literal> is found,
1832 all bits are cleared. Typically this string comes from the command-line,
1833 often identified by <literal>-v</literal>. The
1834 <function>yaz_log_mask_str</function> returns a log level that should be
1835 passed to <function>yaz_log_init_level</function> for it to take effect.
1839 Each module should check what log bits it should be used, by calling
1840 <function>yaz_log_module_level</function> with a suitable name for the
1841 module. The name is cleared from a preceding path and an extension, if any,
1842 so it is quite possible to use <literal>__FILE__</literal> for it. If the
1843 name has been passed to <function>yaz_log_mask_str</function>, the routine
1844 returns a non-zero bitmask, which should then be used in consequent calls
1845 to yaz_log. (It can also be tested, so as to avoid unnecessary calls to
1846 yaz_log, in time-critical places, or when the log entry would take time
1851 Yaz uses the following dynamic log levels:
1852 <literal>server, session, request, requestdetail</literal> for the server
1854 <literal>zoom</literal> for the zoom client api.
1855 <literal>ztest</literal> for the simple test server.
1856 <literal>malloc, nmem, odr, eventl</literal> for internal debugging of yaz itself.
1857 Of course, any program using yaz is welcome to define as many new ones, as
1862 By default the log is written to stderr, but this can be changed by a call
1863 to <function>yaz_log_init_file</function> or
1864 <function>yaz_log_init</function>. If the log is directed to a file, the
1865 file size is checked at every write, and if it exceeds the limit given in
1866 <function>yaz_log_init_max_size</function>, the log is rotated. The
1867 rotation keeps one old version (with a <literal>.1</literal> appended to
1868 the name). The size defaults to 1GB. Setting it to zero will disable the
1873 A typical yaz-log looks like this
1874 13:23:14-23/11 yaz-ztest(1) [session] Starting session from tcp:127.0.0.1 (pid=30968)
1875 13:23:14-23/11 yaz-ztest(1) [request] Init from 'YAZ' (81) (ver 2.0.28) OK
1876 13:23:17-23/11 yaz-ztest(1) [request] Search Z: @attrset Bib-1 foo OK:7 hits
1877 13:23:22-23/11 yaz-ztest(1) [request] Present: [1] 2+2 OK 2 records returned
1878 13:24:13-23/11 yaz-ztest(1) [request] Close OK
1882 The log entries start with a time stamp. This can be omitted by setting the
1883 <literal>YLOG_NOTIME</literal> bit in the loglevel. This way automatic tests
1884 can be hoped to produce identical log files, that are easy to diff. The
1885 format of the time stamp can be set with
1886 <function>yaz_log_time_format</function>, which takes a format string just
1887 like <function>strftime</function>.
1891 Next in a log line comes the prefix, often the name of the program. For
1892 yaz-based servers, it can also contain the session number. Then
1893 comes one or more logbits in square brackets, depending on the logging
1894 level set by <function>yaz_log_init_level</function> and the loglevel
1895 passed to <function>yaz_log_init_level</function>. Finally comes the format
1896 string and additional values passed to <function>yaz_log</function>
1900 The log level <literal>YLOG_LOGLVL</literal>, enabled by the string
1901 <literal>loglevel</literal>, will log all the log-level affecting
1902 operations. This can come in handy if you need to know what other log
1903 levels would be useful. Grep the logfile for <literal>[loglevel]</literal>.
1907 The log system is almost independent of the rest of &yaz;, the only
1908 important dependence is of <filename>nmem</filename>, and that only for
1909 using the semaphore definition there.
1913 The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At
1914 the same time, the log bit names were changed from
1915 <literal>LOG_something</literal> to <literal>YLOG_something</literal>,
1916 to avoid collision with <filename>syslog.h</filename>.
1921 <sect1 id="marc"><title>MARC</title>
1924 YAZ provides a fast utility for working with MARC records.
1925 Early versions of the MARC utility only allowed decoding of ISO2709.
1926 Today the utility may both encode - and decode to a varity of formats.
1929 #include <yaz/marcdisp.h>
1931 /* create handler */
1932 yaz_marc_t yaz_marc_create(void);
1934 void yaz_marc_destroy(yaz_marc_t mt);
1936 /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */
1937 void yaz_marc_xml(yaz_marc_t mt, int xmlmode);
1938 #define YAZ_MARC_LINE 0
1939 #define YAZ_MARC_SIMPLEXML 1
1940 #define YAZ_MARC_OAIMARC 2
1941 #define YAZ_MARC_MARCXML 3
1942 #define YAZ_MARC_ISO2709 4
1943 #define YAZ_MARC_XCHANGE 5
1944 #define YAZ_MARC_CHECK 6
1945 #define YAZ_MARC_TURBOMARC 7
1947 /* supply iconv handle for character set conversion .. */
1948 void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
1950 /* set debug level, 0=none, 1=more, 2=even more, .. */
1951 void yaz_marc_debug(yaz_marc_t mt, int level);
1953 /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
1954 On success, result in *result with size *rsize. */
1955 int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize,
1956 const char **result, size_t *rsize);
1958 /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
1959 On success, result in WRBUF */
1960 int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf,
1961 int bsize, WRBUF wrbuf);
1966 The synopsis is just a basic subset of all functionality. Refer
1967 to the actual header file <filename>marcdisp.h</filename> for
1972 A MARC conversion handle must be created by using
1973 <function>yaz_marc_create</function> and destroyed
1974 by calling <function>yaz_marc_destroy</function>.
1977 All other function operate on a <literal>yaz_marc_t</literal> handle.
1978 The output is specified by a call to <function>yaz_marc_xml</function>.
1979 The <literal>xmlmode</literal> must be one of
1982 <term>YAZ_MARC_LINE</term>
1985 A simple line-by-line format suitable for display but not
1986 recommend for further (machine) processing.
1992 <term>YAZ_MARC_MARCXML</term>
1995 <ulink url="&url.marcxml;">MARCXML</ulink>.
2001 <term>YAZ_MARC_ISO2709</term>
2004 ISO2709 (sometimes just referred to as "MARC").
2010 <term>YAZ_MARC_XCHANGE</term>
2013 <ulink url="&url.marcxchange;">MarcXchange</ulink>.
2019 <term>YAZ_MARC_CHECK</term>
2022 Pseudo format for validation only. Does not generate
2023 any real output except diagnostics.
2029 <term>YAZ_MARC_TURBOMARC</term>
2032 XML format with same semantics as MARCXML but more compact
2033 and geared towards fast processing with XSLT. Refer to
2034 <xref linkend="tools.turbomarc"/> for more information.
2042 The actual conversion functions are
2043 <function>yaz_marc_decode_buf</function> and
2044 <function>yaz_marc_decode_wrbuf</function> which decodes and encodes
2045 a MARC record. The former function operates on simple buffers, the
2046 stores the resulting record in a WRBUF handle (WRBUF is a simple string
2049 <example id="example.marc.display">
2050 <title>Display of MARC record</title>
2052 The following program snippet illustrates how the MARC API may
2053 be used to convert a MARC record to the line-by-line format:
2054 <programlisting><![CDATA[
2055 void print_marc(const char *marc_buf, int marc_buf_size)
2057 char *result; /* for result buf */
2058 size_t result_len; /* for size of result */
2059 yaz_marc_t mt = yaz_marc_create();
2060 yaz_marc_xml(mt, YAZ_MARC_LINE);
2061 yaz_marc_decode_buf(mt, marc_buf, marc_buf_size,
2062 &result, &result_len);
2063 fwrite(result, result_len, 1, stdout);
2064 yaz_marc_destroy(mt); /* note that result is now freed... */
2070 <sect2 id="tools.turbomarc">
2071 <title>TurboMARC</title>
2073 TurboMARC is yet another XML encoding of a MARC record. The format
2074 was designed for fast processing with XSLT.
2078 Pazpar2 uses XSLT to convert an XML encoded MARC record to an internal
2079 representation. This conversion mostly check the tag of a MARC field
2080 to determine the basic rules in the conversion. This check is
2081 costly when that is tag is encoded as an attribute in MARCXML.
2082 By having the tag value as the element instead, makes processing
2083 many times faster (at least for Libxslt).
2086 TurboMARC is encoded as follows:
2089 Record elements is part of namespace
2090 "<literal>http://www.indexdata.com/turbomarc</literal>".
2093 A record is enclosed in element <literal>r</literal>.
2096 A collection of records is enclosed in element
2097 <literal>collection</literal>.
2100 The leader is encoded as element <literal>l</literal> with the
2101 leader content as its (text) value.
2104 A control field is encoded as element <literal>c</literal> concatenated
2105 with the tag value of the control field if the tag value
2106 matches the regular expression <literal>[a-zA-Z0-9]*</literal>.
2107 If the tag value do not match the regular expression
2108 <literal>[a-zA-Z0-9]*</literal> the control field is encoded
2109 as element <literal>c</literal> and attribute <literal>code</literal>
2110 will hold the tag value.
2111 This rule ensure that in the rare cases where a tag value might
2112 result in a non-wellformed XML YAZ encode it as a coded attribute
2116 The control field content is the the text value of this element.
2117 Indicators are encoded as attribute names
2118 <literal>i1</literal>, <literal>i2</literal>, etc.. and
2119 corresponding values for each indicator.
2122 A data field is encoded as element <literal>d</literal> concatenated
2123 with the tag value of the data field or using the attribute
2124 <literal>code</literal> as described in the rules for control fields.
2125 The children of the data field element is subfield elements.
2126 Each subfield element is encoded as <literal>s</literal>
2127 concatenated with the sub field code.
2128 The text of the subfield element is the contents of the subfield.
2129 Indicators are encoded as attributes for the data field element similar
2130 to the encoding for control fields.
2137 <sect1 id="tools.retrieval">
2138 <title>Retrieval Facility</title>
2140 YAZ version 2.1.20 or later includes a Retrieval facility tool
2141 which allows a SRU/Z39.50 to describe itself and perform record
2142 conversions. The idea is the following:
2147 An SRU/Z39.50 client sends a retrieval request which includes
2148 a combination of the following parameters: syntax (format),
2149 schema (or element set name).
2155 The retrieval facility is invoked with parameters in a
2156 server/proxy. The retrieval facility matches the parameters a set of
2157 "supported" retrieval types.
2158 If there is no match, the retrieval signals an error
2159 (syntax and / or schema not supported).
2165 For a successful match, the backend is invoked with the same
2166 or altered retrieval parameters (syntax, schema). If
2167 a record is received from the backend, it is converted to the
2168 frontend name / syntax.
2174 The resulting record is sent back the client and tagged with
2175 the frontend syntax / schema.
2182 The Retrieval facility is driven by an XML configuration. The
2183 configuration is neither Z39.50 ZeeRex or SRU ZeeRex. But it
2184 should be easy to generate both of them from the XML configuration.
2185 (unfortunately the two versions
2186 of ZeeRex differ substantially in this regard).
2188 <sect2 id="tools.retrieval.format">
2189 <title>Retrieval XML format</title>
2191 All elements should be covered by namespace
2192 <literal>http://indexdata.com/yaz</literal> .
2193 The root element node must be <literal>retrievalinfo</literal>.
2196 The <literal>retrievalinfo</literal> must include one or
2197 more <literal>retrieval</literal> elements. Each
2198 <literal>retrieval</literal> defines specific combination of
2199 syntax, name and identifier supported by this retrieval service.
2202 The <literal>retrieval</literal> element may include any of the
2203 following attributes:
2205 <varlistentry><term><literal>syntax</literal> (REQUIRED)</term>
2208 Defines the record syntax. Possible values is any
2209 of the names defined in YAZ' OID database or a raw
2214 <varlistentry><term><literal>name</literal> (OPTIONAL)</term>
2217 Defines the name of the retrieval format. This can be
2218 any string. For SRU, the value, is equivalent to schema (short-hand);
2219 for Z39.50 it's equivalent to simple element set name.
2220 For YAZ 3.0.24 and later this name may be specified as a glob
2221 expression with operators
2222 <literal>*</literal> and <literal>?</literal>.
2226 <varlistentry><term><literal>identifier</literal> (OPTIONAL)</term>
2229 Defines the URI schema name of the retrieval format. This can be
2230 any string. For SRU, the value, is equivalent to URI schema.
2231 For Z39.50, there is no equivalent.
2238 The <literal>retrieval</literal> may include one
2239 <literal>backend</literal> element. If a <literal>backend</literal>
2240 element is given, it specifies how the records are retrieved by
2241 some backend and how the records are converted from the backend to
2245 The attributes, <literal>name</literal> and <literal>syntax</literal>
2246 may be specified for the <literal>backend</literal> element. These
2247 semantics of these attributes is equivalent to those for the
2248 <literal>retrieval</literal>. However, these values are passed to
2252 The <literal>backend</literal> element may includes one or more
2253 conversion instructions (as children elements). The supported
2256 <varlistentry><term><literal>marc</literal></term>
2259 The <literal>marc</literal> element specifies a conversion
2260 to - and from ISO2709 encoded MARC and
2261 <ulink url="&url.marcxml;">&acro.marcxml;</ulink>/MarcXchange.
2262 The following attributes may be specified:
2265 <varlistentry><term><literal>inputformat</literal> (REQUIRED)</term>
2268 Format of input. Supported values are
2269 <literal>marc</literal> (for ISO2709); and <literal>xml</literal>
2270 for MARCXML/MarcXchange.
2275 <varlistentry><term><literal>outputformat</literal> (REQUIRED)</term>
2278 Format of output. Supported values are
2279 <literal>line</literal> (MARC line format);
2280 <literal>marcxml</literal> (for MARCXML),
2281 <literal>marc</literal> (ISO2709),
2282 <literal>marcxhcange</literal> (for MarcXchange).
2287 <varlistentry><term><literal>inputcharset</literal> (OPTIONAL)</term>
2290 Encoding of input. For XML input formats, this need not
2291 be given, but for ISO2709 based inputformats, this should
2292 be set to the encoding used. For MARC21 records, a common
2293 inputcharset value would be <literal>marc-8</literal>.
2298 <varlistentry><term><literal>outputcharset</literal> (OPTIONAL)</term>
2301 Encoding of output. If outputformat is XML based, it is
2302 strongly recommened to use <literal>utf-8</literal>.
2311 <varlistentry><term><literal>xslt</literal></term>
2314 The <literal>xslt</literal> element specifies a conversion
2315 via &acro.xslt;. The following attributes may be specified:
2318 <varlistentry><term><literal>stylesheet</literal> (REQUIRED)</term>
2333 <sect2 id="tools.retrieval.examples">
2334 <title>Retrieval Facility Examples</title>
2335 <example id="tools.retrieval.marc21">
2336 <title>MARC21 backend</title>
2338 A typical way to use the retrieval facility is to enable XML
2339 for servers that only supports ISO2709 encoded MARC21 records.
2341 <programlisting><![CDATA[
2343 <retrieval syntax="usmarc" name="F"/>
2344 <retrieval syntax="usmarc" name="B"/>
2345 <retrieval syntax="xml" name="marcxml"
2346 identifier="info:srw/schema/1/marcxml-v1.1">
2347 <backend syntax="usmarc" name="F">
2348 <marc inputformat="marc" outputformat="marcxml"
2349 inputcharset="marc-8"/>
2352 <retrieval syntax="xml" name="dc">
2353 <backend syntax="usmarc" name="F">
2354 <marc inputformat="marc" outputformat="marcxml"
2355 inputcharset="marc-8"/>
2356 <xslt stylesheet="MARC21slim2DC.xsl"/>
2363 This means that our frontend supports:
2367 MARC21 F(ull) records.
2372 MARC21 B(rief) records.
2384 Dublin core records.
2391 <sect2 id="tools.retrieval.api">
2394 It should be easy to use the retrieval systems from applications. Refer
2396 <filename>yaz/retrieval.h</filename> and
2397 <filename>yaz/record_conv.h</filename>.
2403 <!-- Keep this comment at the end of the file
2408 sgml-minimize-attributes:nil
2409 sgml-always-quote-attributes:t
2412 sgml-parent-document: "yaz.xml"
2413 sgml-local-catalogs: nil
2414 sgml-namecase-general:t