1 <chapter id="tools"><title>Supporting Tools</title>
4 In support of the service API - primarily the ASN module, which
5 provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains
6 a collection of tools that support the development of applications.
9 <sect1 id="tools.query"><title>Query Syntax Parsers</title>
12 Since the type-1 (RPN) query structure has no direct, useful string
13 representation, every origin application needs to provide some form of
14 mapping from a local query notation or representation to a
15 <token>Z_RPNQuery</token> structure. Some programmers will prefer to
16 construct the query manually, perhaps using
17 <function>odr_malloc()</function> to simplify memory management.
18 The &yaz; distribution includes three separate, query-generating tools
19 that may be of use to you.
22 <sect2 id="PQF"><title>Prefix Query Format</title>
25 Since RPN or reverse polish notation is really just a fancy way of
26 describing a suffix notation format (operator follows operands), it
27 would seem that the confusion is total when we now introduce a prefix
28 notation for RPN. The reason is one of simple laziness - it's somewhat
29 simpler to interpret a prefix format, and this utility was designed
30 for maximum simplicity, to provide a baseline representation for use
31 in simple test applications and scripting environments (like Tcl). The
32 demonstration client included with YAZ uses the PQF.
37 The PQF have been adopted by other parties developing Z39.50
38 software. It is often referred to as Prefix Query Notation
43 The PQF is defined by the pquery module in the YAZ library.
44 There are two sets of function that have similar behavior. First
45 set operates on a PQF parser handle, second set doesn't. First set
46 set of functions are more flexible than the second set. Second set
47 is obsolete and is only provided to ensure backwards compatibility.
50 First set of functions all operate on a PQF parser handle:
53 #include <yaz/pquery.h>
55 YAZ_PQF_Parser yaz_pqf_create (void);
57 void yaz_pqf_destroy (YAZ_PQF_Parser p);
59 Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
61 Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
62 Odr_oid **attributeSetId, const char *qbuf);
65 int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
68 A PQF parser is created and destructed by functions
69 <function>yaz_pqf_create</function> and
70 <function>yaz_pqf_destroy</function> respectively.
71 Function <function>yaz_pqf_parse</function> parses query given
72 by string <literal>qbuf</literal>. If parsing was successful,
73 a Z39.50 RPN Query is returned which is created using ODR stream
74 <literal>o</literal>. If parsing failed, a NULL pointer is
76 Function <function>yaz_pqf_scan</function> takes a scan query in
77 <literal>qbuf</literal>. If parsing was successful, the function
78 returns attributes plus term pointer and modifies
79 <literal>attributeSetId</literal> to hold attribute set for the
80 scan request - both allocated using ODR stream <literal>o</literal>.
81 If parsing failed, yaz_pqf_scan returns a NULL pointer.
82 Error information for bad queries can be obtained by a call to
83 <function>yaz_pqf_error</function> which returns an error code and
84 modifies <literal>*msg</literal> to point to an error description,
85 and modifies <literal>*off</literal> to the offset within last
86 query were parsing failed.
89 The second set of functions are declared as follows:
92 #include <yaz/pquery.h>
94 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
96 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
97 Odr_oid **attributeSetP, const char *qbuf);
99 int p_query_attset (const char *arg);
102 The function <function>p_query_rpn()</function> takes as arguments an
103 &odr; stream (see section <link linkend="odr">The ODR Module</link>)
104 to provide a memory source (the structure created is released on
105 the next call to <function>odr_reset()</function> on the stream), a
106 protocol identifier (one of the constants <token>PROTO_Z3950</token> and
107 <token>PROTO_SR</token>), an attribute set reference, and
108 finally a null-terminated string holding the query string.
111 If the parse went well, <function>p_query_rpn()</function> returns a
112 pointer to a <literal>Z_RPNQuery</literal> structure which can be
113 placed directly into a <literal>Z_SearchRequest</literal>.
114 If parsing failed, due to syntax error, a NULL pointer is returned.
117 The <literal>p_query_attset</literal> specifies which attribute set
118 to use if the query doesn't specify one by the
119 <literal>@attrset</literal> operator.
120 The <literal>p_query_attset</literal> returns 0 if the argument is a
121 valid attribute set specifier; otherwise the function returns -1.
125 The grammar of the PQF is as follows:
129 query ::= top-set query-struct.
131 top-set ::= [ '@attrset' string ]
133 query-struct ::= attr-spec | simple | complex | '@term' term-type query
135 attr-spec ::= '@attr' [ string ] string query-struct
137 complex ::= operator query-struct query-struct.
139 operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
141 simple ::= result-set | term.
143 result-set ::= '@set' string.
147 proximity ::= exclusion distance ordered relation which-code unit-code.
149 exclusion ::= '1' | '0' | 'void'.
151 distance ::= integer.
153 ordered ::= '1' | '0'.
155 relation ::= integer.
157 which-code ::= 'known' | 'private' | integer.
159 unit-code ::= integer.
161 term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
165 You will note that the syntax above is a fairly faithful
166 representation of RPN, except for the Attribute, which has been
167 moved a step away from the term, allowing you to associate one or more
168 attributes with an entire query structure. The parser will
169 automatically apply the given attributes to each term as required.
173 The @attr operator is followed by an attribute specification
174 (<literal>attr-spec</literal> above). The specification consists
175 of an optional attribute set, an attribute type-value pair and
176 a sub-query. The attribute type-value pair is packed in one string:
177 an attribute type, an equals sign, and an attribute value, like this:
178 <literal>@attr 1=1003</literal>.
179 The type is always an integer but the value may be either an
180 integer or a string (if it doesn't start with a digit character).
181 A string attribute-value is encoded as a Type-1 ``complex''
182 attribute with the list of values containing the single string
183 specified, and including no semantic indicators.
187 Version 3 of the Z39.50 specification defines various encoding of terms.
188 Use <literal>@term </literal> <replaceable>type</replaceable>
189 <replaceable>string</replaceable>,
190 where type is one of: <literal>general</literal>,
191 <literal>numeric</literal> or <literal>string</literal>
192 (for InternationalString).
193 If no term type has been given, the <literal>general</literal> form
194 is used. This is the only encoding allowed in both versions 2 and 3
195 of the Z39.50 standard.
198 <sect3 id="PQF-prox">
199 <title>Using Proximity Operators with PQF</title>
202 This is an advanced topic, describing how to construct
203 queries that make very specific requirements on the
204 relative location of their operands.
205 You may wish to skip this section and go straight to
206 <link linkend="pqf-examples">the example PQF queries</link>.
211 Most Z39.50 servers do not support proximity searching, or
212 support only a small subset of the full functionality that
213 can be expressed using the PQF proximity operator. Be
214 aware that the ability to <emphasis>express</emphasis> a
215 query in PQF is no guarantee that any given server will
216 be able to <emphasis>execute</emphasis> it.
222 The proximity operator <literal>@prox</literal> is a special
223 and more restrictive version of the conjunction operator
224 <literal>@and</literal>. Its semantics are described in
225 section 3.7.2 (Proximity) of Z39.50 the standard itself, which
226 can be read on-line at
227 <ulink url="&url.z39.50.proximity;"/>
230 In PQF, the proximity operation is represented by a sequence
233 @prox <replaceable>exclusion</replaceable> <replaceable>distance</replaceable> <replaceable>ordered</replaceable> <replaceable>relation</replaceable> <replaceable>which-code</replaceable> <replaceable>unit-code</replaceable>
235 in which the meanings of the parameters are as described in in
236 the standard, and they can take the following values:
238 <listitem><formalpara><title>exclusion</title><para>
239 0 = false (i.e. the proximity condition specified by the
240 remaining parameters must be satisfied) or
241 1 = true (the proximity condition specified by the
242 remaining parameters must <emphasis>not</emphasis> be
244 </para></formalpara></listitem>
245 <listitem><formalpara><title>distance</title><para>
246 An integer specifying the difference between the locations
247 of the operands: e.g. two adjacent words would have
248 distance=1 since their locations differ by one unit.
249 </para></formalpara></listitem>
250 <listitem><formalpara><title>ordered</title><para>
251 1 = ordered (the operands must occur in the order the
252 query specifies them) or
253 0 = unordered (they may appear in either order).
254 </para></formalpara></listitem>
255 <listitem><formalpara><title>relation</title><para>
256 Recognised values are
260 4 (greaterThanOrEqual),
263 </para></formalpara></listitem>
264 <listitem><formalpara><title>which-code</title><para>
265 <literal>known</literal>
268 (the unit-code parameter is taken from the well-known list
269 of alternatives described in below) or
270 <literal>private</literal>
273 (the unit-code paramater has semantics specific to an
274 out-of-band agreement such as a profile).
275 </para></formalpara></listitem>
276 <listitem><formalpara><title>unit-code</title><para>
277 If the which-code parameter is <literal>known</literal>
278 then the recognised values are
290 If which-code is <literal>private</literal> then the
291 acceptable values are determined by the profile.
292 </para></formalpara></listitem>
294 (The numeric values of the relation and well-known unit-code
295 parameters are taken straight from
296 <ulink url="&url.z39.50.proximity.asn1;"
297 >the ASN.1</ulink> of the proximity structure in the standard.)
301 <sect3 id="pqf-examples"><title>PQF queries</title>
303 <example id="example.pqf.simple.terms">
304 <title>PQF queries using simple terms</title>
313 <example id="pqf.example.pqf.boolean.operators">
314 <title>PQF boolean operators</title>
317 @or "dylan" "zimmerman"
319 @and @or dylan zimmerman when
321 @and when @or dylan zimmerman
325 <example id="example.pqf.result.sets">
326 <title>PQF references to result sets</title>
331 @and @set seta @set setb
335 <example id="example.pqf.attributes">
336 <title>Attributes for terms</title>
341 @attr 1=4 @attr 4=1 "self portrait"
343 @attrset exp1 @attr 1=1 CategoryList
345 @attr gils 1=2008 Copenhagen
347 @attr 1=/book/title computer
351 <example id="example.pqf.proximity">
352 <title>PQF Proximity queries</title>
355 @prox 0 3 1 2 k 2 dylan zimmerman
358 Here the parameters 0, 3, 1, 2, k and 2 represent exclusion,
359 distance, ordered, relation, which-code and unit-code, in that
363 exclusion = 0: the proximity condition must hold
366 distance = 3: the terms must be three units apart
369 ordered = 1: they must occur in the order they are specified
372 relation = 2: lessThanOrEqual (to the distance of 3 units)
375 which-code is ``known'', so the standard unit-codes are used
381 So the whole proximity query means that the words
382 <literal>dylan</literal> and <literal>zimmerman</literal> must
383 both occur in the record, in that order, differing in position
384 by three or fewer words (i.e. with two or fewer words between
385 them.) The query would find ``Bob Dylan, aka. Robert
386 Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman''
387 since the distance in this case is four.
391 <example id="example.pqf.search.term.type">
392 <title>PQF specification of search term type</title>
395 @term string "a UTF-8 string, maybe?"
399 <example id="example.pqf.mixed.queries">
400 <title>PQF mixed queries</title>
403 @or @and bob dylan @set Result-1
405 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
407 @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
411 The last of these examples is a spatial search: in
412 <ulink url="http://www.gils.net/prof_v2.html#sec_7_4"
413 >the GILS attribute set</ulink>,
415 2038 indicates West Bounding Coordinate and
416 2030 indicates East Bounding Coordinate,
417 so the query is for areas extending from -114 degrees
418 to no more than -109 degrees.
425 <sect2 id="CCL"><title>CCL</title>
428 Not all users enjoy typing in prefix query structures and numerical
429 attribute values, even in a minimalistic test client. In the library
430 world, the more intuitive Common Command Language - CCL (ISO 8777)
431 has enjoyed some popularity - especially before the widespread
432 availability of graphical interfaces. It is still useful in
433 applications where you for some reason or other need to provide a
434 symbolic language for expressing boolean query structures.
437 <sect3 id="ccl.syntax">
438 <title>CCL Syntax</title>
441 The CCL parser obeys the following grammar for the FIND argument.
442 The syntax is annotated by in the lines prefixed by
443 <literal>--</literal>.
447 CCL-Find ::= CCL-Find Op Elements
450 Op ::= "and" | "or" | "not"
451 -- The above means that Elements are separated by boolean operators.
453 Elements ::= '(' CCL-Find ')'
456 | Qualifiers Relation Terms
457 | Qualifiers Relation '(' CCL-Find ')'
458 | Qualifiers '=' string '-' string
459 -- Elements is either a recursive definition, a result set reference, a
460 -- list of terms, qualifiers followed by terms, qualifiers followed
461 -- by a recursive definition or qualifiers in a range (lower - upper).
463 Set ::= 'set' = string
464 -- Reference to a result set
466 Terms ::= Terms Prox Term
468 -- Proximity of terms.
472 -- This basically means that a term may include a blank
474 Qualifiers ::= Qualifiers ',' string
476 -- Qualifiers is a list of strings separated by comma
478 Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
479 -- Relational operators. This really doesn't follow the ISO8777
483 -- Proximity operator
487 <example id="example.ccl.queries">
488 <title>CCL queries</title>
490 The following queries are all valid:
502 (dylan and bob) or set=1
506 Assuming that the qualifiers <literal>ti</literal>,
507 <literal>au</literal>
508 and <literal>date</literal> are defined we may use:
514 au=(bob dylan and slow train coming)
516 date>1980 and (ti=((self portrait)))
522 <sect3 id="ccl.qualifiers">
523 <title>CCL Qualifiers</title>
526 Qualifiers are used to direct the search to a particular searchable
527 index, such as title (ti) and author indexes (au). The CCL standard
528 itself doesn't specify a particular set of qualifiers, but it does
529 suggest a few short-hand notations. You can customize the CCL parser
530 to support a particular set of qualifiers to reflect the current target
531 profile. Traditionally, a qualifier would map to a particular
532 use-attribute within the BIB-1 attribute set. It is also
533 possible to set other attributes, such as the structure
538 A CCL profile is a set of predefined CCL qualifiers that may be
539 read from a file or set in the CCL API.
540 The YAZ client reads its CCL qualifiers from a file named
541 <filename>default.bib</filename>. There are four types of
542 lines in a CCL profile: qualifier specification,
543 qualifier alias, comments and directives.
545 <sect4 id="ccl.qualifier.specification">
546 <title>Qualifier specification</title>
548 A qualifier specification is of the form:
552 <replaceable>qualifier-name</replaceable>
553 [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable>
554 [<replaceable>attributeset</replaceable><literal>,</literal>]<replaceable>type</replaceable><literal>=</literal><replaceable>val</replaceable> ...
558 where <replaceable>qualifier-name</replaceable> is the name of the
559 qualifier to be used (eg. <literal>ti</literal>),
560 <replaceable>type</replaceable> is attribute type in the attribute
561 set (Bib-1 is used if no attribute set is given) and
562 <replaceable>val</replaceable> is attribute value.
563 The <replaceable>type</replaceable> can be specified as an
564 integer or as it be specified either as a single-letter:
565 <literal>u</literal> for use,
566 <literal>r</literal> for relation,<literal>p</literal> for position,
567 <literal>s</literal> for structure,<literal>t</literal> for truncation
568 or <literal>c</literal> for completeness.
569 The attributes for the special qualifier name <literal>term</literal>
570 are used when no CCL qualifier is given in a query.
571 <table id="ccl.common.bib1.attributes">
572 <title>Common Bib-1 attributes</title>
574 <colspec colwidth="2*" colname="type"></colspec>
575 <colspec colwidth="9*" colname="description"></colspec>
579 <entry>Description</entry>
584 <entry><literal>u=</literal><replaceable>value</replaceable></entry>
586 Use attribute (1). Common use attributes are
587 1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date,
588 62 Subject, 1003 Author), 1016 Any. Specify value
594 <entry><literal>r=</literal><replaceable>value</replaceable></entry>
596 Relation attribute (2). Common values are
597 1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>,
598 100 phonetic, 101 stem, 102 relevance, 103 always matches.
603 <entry><literal>p=</literal><replaceable>value</replaceable></entry>
605 Position attribute (3). Values: 1 first in field, 2
606 first in any subfield, 3 any position in field.
611 <entry><literal>s=</literal><replaceable>value</replaceable></entry>
613 Structure attribute (4). Values: 1 phrase, 2 word,
614 3 key, 4 year, 5 date, 6 word list, 100 date (un),
615 101 name (norm), 102 name (un), 103 structure, 104 urx,
616 105 free-form-text, 106 document-text, 107 local-number,
617 108 string, 109 numeric string.
622 <entry><literal>t=</literal><replaceable>value</replaceable></entry>
624 Truncation attribute (5). Values: 1 right, 2 left,
625 3 left& right, 100 none, 101 process #, 102 regular-1,
626 103 regular-2, 104 CCL.
631 <entry><literal>c=</literal><replaceable>value</replaceable></entry>
633 Completeness attribute (6). Values: 1 incomplete subfield,
634 2 complete subfield, 3 complete field.
643 Refer to <xref linkend="bib1"/> or the complete
644 <ulink url="&url.z39.50.attset.bib1;">list of Bib-1 attributes</ulink>
647 It is also possible to specify non-numeric attribute values,
648 which are used in combination with certain types.
649 The special combinations are:
651 <table id="ccl.special.attribute.combos">
652 <title>Special attribute combos</title>
654 <colspec colwidth="2*" colname="name"></colspec>
655 <colspec colwidth="9*" colname="description"></colspec>
659 <entry>Description</entry>
664 <entry><literal>s=pw</literal></entry><entry>
665 The structure is set to either word or phrase depending
666 on the number of tokens in a term (phrase-word).
670 <entry><literal>s=al</literal></entry><entry>
671 Each token in the term is ANDed. (and-list).
672 This does not set the structure at all.
676 <row><entry><literal>s=ol</literal></entry><entry>
677 Each token in the term is ORed. (or-list).
678 This does not set the structure at all.
682 <row><entry><literal>r=o</literal></entry><entry>
683 Allows ranges and the operators greather-than, less-than, ...
685 This sets Bib-1 relation attribute accordingly (relation
686 ordered). A query construct is only treated as a range if
687 dash is used and that is surrounded by white-space. So
688 <literal>-1980</literal> is treated as term
689 <literal>"-1980"</literal> not <literal><= 1980</literal>.
690 If <literal>- 1980</literal> is used, however, that is
695 <row><entry><literal>r=r</literal></entry><entry>
696 Similar to <literal>r=o</literal> but assumes that terms
697 are non-negative (not prefixed with <literal>-</literal>).
698 Thus, a dash will always be treated as a range.
699 The construct <literal>1980-1990</literal> is
700 treated as a range with <literal>r=r</literal> but as a
701 single term <literal>"1980-1990"</literal> with
702 <literal>r=o</literal>. The special attribute
703 <literal>r=r</literal> is available in YAZ 2.0.24 or later.
707 <row><entry><literal>t=l</literal></entry><entry>
708 Allows term to be left-truncated.
709 If term is of the form <literal>?x</literal>, the resulting
710 Type-1 term is <literal>x</literal> and truncation is left.
714 <row><entry><literal>t=r</literal></entry><entry>
715 Allows term to be right-truncated.
716 If term is of the form <literal>x?</literal>, the resulting
717 Type-1 term is <literal>x</literal> and truncation is right.
721 <row><entry><literal>t=n</literal></entry><entry>
722 If term is does not include <literal>?</literal>, the
723 truncation attribute is set to none (100).
727 <row><entry><literal>t=b</literal></entry><entry>
728 Allows term to be both left&right truncated.
729 If term is of the form <literal>?x?</literal>, the
730 resulting term is <literal>x</literal> and trunctation is
731 set to both left&right.
738 <example id="example.ccl.profile"><title>CCL profile</title>
740 Consider the following definition:
751 <literal>ti</literal> and <literal>au</literal> both set
752 structure attribute to phrase (s=1).
753 <literal>ti</literal>
754 sets the use-attribute to 4. <literal>au</literal> sets the
756 When no qualifiers are used in the query the structure-attribute is
757 set to free-form-text (105) (rule for <literal>term</literal>).
758 The <literal>date</literal> sets the relation attribute to
759 the relation used in the CCL query and sets the use attribute
763 You can combine attributes. To Search for "ranked title" you
766 ti,ranked=knuth computer
768 which will set relation=ranked, use=title, structure=phrase.
775 is a valid query. But
783 <sect4 id="ccl.qualifier.alias">
784 <title>Qualifier alias</title>
786 A qualifier alias is of the form:
789 <replaceable>q</replaceable>
790 <replaceable>q1</replaceable> <replaceable>q2</replaceable> ..
793 which declares <replaceable>q</replaceable> to
794 be an alias for <replaceable>q1</replaceable>,
795 <replaceable>q2</replaceable>... such that the CCL
796 query <replaceable>q=x</replaceable> is equivalent to
797 <replaceable>q1=x or q2=x or ...</replaceable>.
801 <sect4 id="ccl.comments">
802 <title>Comments</title>
804 Lines with white space or lines that begin with
805 character <literal>#</literal> are treated as comments.
809 <sect4 id="ccl.directives">
810 <title>Directives</title>
812 Directive specifications takes the form
814 <para><literal>@</literal><replaceable>directive</replaceable> <replaceable>value</replaceable>
816 <table id="ccl.directives.table">
817 <title>CCL directives</title>
819 <colspec colwidth="2*" colname="name"></colspec>
820 <colspec colwidth="8*" colname="description"></colspec>
821 <colspec colwidth="1*" colname="default"></colspec>
825 <entry>Description</entry>
826 <entry>Default</entry>
831 <entry>truncation</entry>
832 <entry>Truncation character</entry>
833 <entry><literal>?</literal></entry>
837 <entry>Specifies how multiple fields are to be
838 combined. There are two modes: <literal>or</literal>:
839 multiple qualifier fields are ORed,
840 <literal>merge</literal>: attributes for the qualifier
841 fields are merged and assigned to one term.
843 <entry><literal>merge</literal></entry>
847 <entry>Specificies if CCL operatores and qualifiers should be
848 compared with case sensitivity or not. Specify 0 for
849 case sensitive; 1 for case insensitive.</entry>
850 <entry><literal>0</literal></entry>
855 <entry>Specifies token for CCL operator AND.</entry>
856 <entry><literal>and</literal></entry>
861 <entry>Specifies token for CCL operator OR.</entry>
862 <entry><literal>or</literal></entry>
867 <entry>Specifies token for CCL operator NOT.</entry>
868 <entry><literal>not</literal></entry>
873 <entry>Specifies token for CCL operator SET.</entry>
874 <entry><literal>set</literal></entry>
882 <title>CCL API</title>
884 All public definitions can be found in the header file
885 <filename>ccl.h</filename>. A profile identifier is of type
886 <literal>CCL_bibset</literal>. A profile must be created with the call
887 to the function <function>ccl_qual_mk</function> which returns a profile
888 handle of type <literal>CCL_bibset</literal>.
892 To read a file containing qualifier definitions the function
893 <function>ccl_qual_file</function> may be convenient. This function
894 takes an already opened <literal>FILE</literal> handle pointer as
895 argument along with a <literal>CCL_bibset</literal> handle.
899 To parse a simple string with a FIND query use the function
902 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
903 int *error, int *pos);
906 which takes the CCL profile (<literal>bibset</literal>) and query
907 (<literal>str</literal>) as input. Upon successful completion the RPN
908 tree is returned. If an error occur, such as a syntax error, the integer
909 pointed to by <literal>error</literal> holds the error code and
910 <literal>pos</literal> holds the offset inside query string in which
915 An English representation of the error may be obtained by calling
916 the <literal>ccl_err_msg</literal> function. The error codes are
917 listed in <filename>ccl.h</filename>.
921 To convert the CCL RPN tree (type
922 <literal>struct ccl_rpn_node *</literal>)
923 to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
924 must be used. This function which is part of YAZ is implemented in
925 <filename>yaz-ccl.c</filename>.
926 After calling this function the CCL RPN tree is probably no longer
927 needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
931 A CCL profile may be destroyed by calling the
932 <function>ccl_qual_rm</function> function.
936 The token names for the CCL operators may be changed by setting the
937 globals (all type <literal>char *</literal>)
938 <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
939 <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
940 An operator may have aliases, i.e. there may be more than one name for
941 the operator. To do this, separate each alias with a space character.
945 <sect2 id="cql"><title>CQL</title>
947 <ulink url="&url.cql;">CQL</ulink>
948 - Common Query Language - was defined for the
949 <ulink url="&url.sru;">SRU</ulink> protocol.
950 In many ways CQL has a similar syntax to CCL.
951 The objective of CQL is different. Where CCL aims to be
952 an end-user language, CQL is <emphasis>the</emphasis> protocol
953 query language for SRU.
957 If you are new to CQL, read the
958 <ulink url="&url.cql.intro;">Gentle Introduction</ulink>.
962 The CQL parser in &yaz; provides the following:
966 It parses and validates a CQL query.
971 It generates a C structure that allows you to convert
972 a CQL query to some other query language, such as SQL.
977 The parser converts a valid CQL query to PQF, thus providing a
978 way to use CQL for both SRU servers and Z39.50 targets at the
984 The parser converts CQL to
985 <ulink url="&url.xcql;">XCQL</ulink>.
986 XCQL is an XML representation of CQL.
987 XCQL is part of the SRU specification. However, since SRU
988 supports CQL only, we don't expect XCQL to be widely used.
989 Furthermore, CQL has the advantage over XCQL that it is
995 <sect3 id="cql.parsing"><title>CQL parsing</title>
997 A CQL parser is represented by the <literal>CQL_parser</literal>
998 handle. Its contents should be considered &yaz; internal (private).
1000 #include <yaz/cql.h>
1002 typedef struct cql_parser *CQL_parser;
1004 CQL_parser cql_parser_create(void);
1005 void cql_parser_destroy(CQL_parser cp);
1007 A parser is created by <function>cql_parser_create</function> and
1008 is destroyed by <function>cql_parser_destroy</function>.
1011 To parse a CQL query string, the following function
1014 int cql_parser_string(CQL_parser cp, const char *str);
1016 A CQL query is parsed by the <function>cql_parser_string</function>
1017 which takes a query <parameter>str</parameter>.
1018 If the query was valid (no syntax errors), then zero is returned;
1019 otherwise -1 is returned to indicate a syntax error.
1023 int cql_parser_stream(CQL_parser cp,
1024 int (*getbyte)(void *client_data),
1025 void (*ungetbyte)(int b, void *client_data),
1028 int cql_parser_stdio(CQL_parser cp, FILE *f);
1030 The functions <function>cql_parser_stream</function> and
1031 <function>cql_parser_stdio</function> parses a CQL query
1032 - just like <function>cql_parser_string</function>.
1033 The only difference is that the CQL query can be
1034 fed to the parser in different ways.
1035 The <function>cql_parser_stream</function> uses a generic
1036 byte stream as input. The <function>cql_parser_stdio</function>
1037 uses a <literal>FILE</literal> handle which is opened for reading.
1041 <sect3 id="cql.tree"><title>CQL tree</title>
1043 The the query string is valid, the CQL parser
1044 generates a tree representing the structure of the
1049 struct cql_node *cql_parser_result(CQL_parser cp);
1051 <function>cql_parser_result</function> returns the
1052 a pointer to the root node of the resulting tree.
1055 Each node in a CQL tree is represented by a
1056 <literal>struct cql_node</literal>.
1057 It is defined as follows:
1059 #define CQL_NODE_ST 1
1060 #define CQL_NODE_BOOL 2
1070 struct cql_node *modifiers;
1074 struct cql_node *left;
1075 struct cql_node *right;
1076 struct cql_node *modifiers;
1081 There are two node types: search term (ST) and boolean (BOOL).
1082 A modifier is treated as a search term too.
1085 The search term node has five members:
1089 <literal>index</literal>: index for search term.
1090 If an index is unspecified for a search term,
1091 <literal>index</literal> will be NULL.
1096 <literal>index_uri</literal>: index URi for search term
1097 or NULL if none could be resolved for the index.
1102 <literal>term</literal>: the search term itself.
1107 <literal>relation</literal>: relation for search term.
1112 <literal>relation_uri</literal>: relation URI for search term.
1117 <literal>modifiers</literal>: relation modifiers for search
1118 term. The <literal>modifiers</literal> list itself of cql_nodes
1119 each of type <literal>ST</literal>.
1126 The boolean node represents both <literal>and</literal>,
1127 <literal>or</literal>, not as well as
1132 <literal>left</literal> and <literal>right</literal>: left
1133 - and right operand respectively.
1138 <literal>modifiers</literal>: proximity arguments.
1145 <sect3 id="cql.to.pqf"><title>CQL to PQF conversion</title>
1147 Conversion to PQF (and Z39.50 RPN) is tricky by the fact
1148 that the resulting RPN depends on the Z39.50 target
1149 capabilities (combinations of supported attributes).
1150 In addition, the CQL and SRU operates on index prefixes
1151 (URI or strings), whereas the RPN uses Object Identifiers
1155 The CQL library of &yaz; defines a <literal>cql_transform_t</literal>
1156 type. It represents a particular mapping between CQL and RPN.
1157 This handle is created and destroyed by the functions:
1159 cql_transform_t cql_transform_open_FILE (FILE *f);
1160 cql_transform_t cql_transform_open_fname(const char *fname);
1161 void cql_transform_close(cql_transform_t ct);
1163 The first two functions create a tranformation handle from
1164 either an already open FILE or from a filename respectively.
1167 The handle is destroyed by <function>cql_transform_close</function>
1168 in which case no further reference of the handle is allowed.
1171 When a <literal>cql_transform_t</literal> handle has been created
1172 you can convert to RPN.
1174 int cql_transform_buf(cql_transform_t ct,
1175 struct cql_node *cn, char *out, int max);
1177 This function converts the CQL tree <literal>cn</literal>
1178 using handle <literal>ct</literal>.
1179 For the resulting PQF, you supply a buffer <literal>out</literal>
1180 which must be able to hold at at least <literal>max</literal>
1184 If conversion failed, <function>cql_transform_buf</function>
1185 returns a non-zero SRU error code; otherwise zero is returned
1186 (conversion successful). The meanings of the numeric error
1187 codes are listed in the SRU specifications at
1188 <ulink url="&url.sru.diagnostics.list;"/>
1191 If conversion fails, more information can be obtained by calling
1193 int cql_transform_error(cql_transform_t ct, char **addinfop);
1195 This function returns the most recently returned numeric
1196 error-code and sets the string-pointer at
1197 <literal>*addinfop</literal> to point to a string containing
1198 additional information about the error that occurred: for
1199 example, if the error code is 15 (``Illegal or unsupported context
1200 set''), the additional information is the name of the requested
1201 context set that was not recognised.
1204 The SRU error-codes may be translated into brief human-readable
1205 error messages using
1207 const char *cql_strerror(int code);
1211 If you wish to be able to produce a PQF result in a different
1212 way, there are two alternatives.
1214 void cql_transform_pr(cql_transform_t ct,
1215 struct cql_node *cn,
1216 void (*pr)(const char *buf, void *client_data),
1219 int cql_transform_FILE(cql_transform_t ct,
1220 struct cql_node *cn, FILE *f);
1222 The former function produces output to a user-defined
1223 output stream. The latter writes the result to an already
1224 open <literal>FILE</literal>.
1227 <sect3 id="cql.to.rpn">
1228 <title>Specification of CQL to RPN mappings</title>
1230 The file supplied to functions
1231 <function>cql_transform_open_FILE</function>,
1232 <function>cql_transform_open_fname</function> follows
1233 a structure found in many Unix utilities.
1234 It consists of mapping specifications - one per line.
1235 Lines starting with <literal>#</literal> are ignored (comments).
1238 Each line is of the form
1240 <replaceable>CQL pattern</replaceable><literal> = </literal> <replaceable> RPN equivalent</replaceable>
1244 An RPN pattern is a simple attribute list. Each attribute pair
1247 [<replaceable>set</replaceable>] <replaceable>type</replaceable><literal>=</literal><replaceable>value</replaceable>
1249 The attribute <replaceable>set</replaceable> is optional.
1250 The <replaceable>type</replaceable> is the attribute type,
1251 <replaceable>value</replaceable> the attribute value.
1254 The character <literal>*</literal> (asterisk) has special meaning
1255 when used in the RPN pattern.
1256 Each occurrence of <literal>*</literal> is substituted with the
1257 CQL matching name (index, relation, qualifier etc).
1258 This facility can be used to copy a CQL name verbatim to the RPN result.
1261 The following CQL patterns are recognized:
1263 <varlistentry><term>
1264 <literal>index.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
1268 This pattern is invoked when a CQL index, such as
1269 dc.title is converted. <replaceable>set</replaceable>
1270 and <replaceable>name</replaceable> are the context set and index
1272 Typically, the RPN specifies an equivalent use attribute.
1275 For terms not bound by an index the pattern
1276 <literal>index.cql.serverChoice</literal> is used.
1277 Here, the prefix <literal>cql</literal> is defined as
1278 <literal>http://www.loc.gov/zing/cql/cql-indexes/v1.0/</literal>.
1279 If this pattern is not defined, the mapping will fail.
1283 <literal>index.</literal><replaceable>set</replaceable><literal>.*</literal>
1284 is used when no other index pattern is matched.
1288 <varlistentry><term>
1289 <literal>qualifier.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
1294 For backwards compatibility, this is recognised as a synonym of
1295 <literal>index.</literal><replaceable>set</replaceable><literal>.</literal><replaceable>name</replaceable>
1299 <varlistentry><term>
1300 <literal>relation.</literal><replaceable>relation</replaceable>
1304 This pattern specifies how a CQL relation is mapped to RPN.
1305 <replaceable>pattern</replaceable> is name of relation
1306 operator. Since <literal>=</literal> is used as
1307 separator between CQL pattern and RPN, CQL relations
1308 including <literal>=</literal> cannot be
1309 used directly. To avoid a conflict, the names
1310 <literal>ge</literal>,
1311 <literal>eq</literal>,
1312 <literal>le</literal>,
1313 must be used for CQL operators, greater-than-or-equal,
1314 equal, less-than-or-equal respectively.
1315 The RPN pattern is supposed to include a relation attribute.
1318 For terms not bound by a relation, the pattern
1319 <literal>relation.scr</literal> is used. If the pattern
1320 is not defined, the mapping will fail.
1323 The special pattern, <literal>relation.*</literal> is used
1324 when no other relation pattern is matched.
1329 <varlistentry><term>
1330 <literal>relationModifier.</literal><replaceable>mod</replaceable>
1334 This pattern specifies how a CQL relation modifier is mapped to RPN.
1335 The RPN pattern is usually a relation attribute.
1340 <varlistentry><term>
1341 <literal>structure.</literal><replaceable>type</replaceable>
1345 This pattern specifies how a CQL structure is mapped to RPN.
1346 Note that this CQL pattern is somewhat to similar to
1347 CQL pattern <literal>relation</literal>.
1348 The <replaceable>type</replaceable> is a CQL relation.
1351 The pattern, <literal>structure.*</literal> is used
1352 when no other structure pattern is matched.
1353 Usually, the RPN equivalent specifies a structure attribute.
1358 <varlistentry><term>
1359 <literal>position.</literal><replaceable>type</replaceable>
1363 This pattern specifies how the anchor (position) of
1364 CQL is mapped to RPN.
1365 The <replaceable>type</replaceable> is one
1366 of <literal>first</literal>, <literal>any</literal>,
1367 <literal>last</literal>, <literal>firstAndLast</literal>.
1370 The pattern, <literal>position.*</literal> is used
1371 when no other position pattern is matched.
1376 <varlistentry><term>
1377 <literal>set.</literal><replaceable>prefix</replaceable>
1381 This specification defines a CQL context set for a given prefix.
1382 The value on the right hand side is the URI for the set -
1383 <emphasis>not</emphasis> RPN. All prefixes used in
1384 index patterns must be defined this way.
1389 <varlistentry><term>
1390 <literal>set</literal>
1394 This specification defines a default CQL context set for index names.
1395 The value on the right hand side is the URI for the set.
1402 <example id="example.cql.to.rpn.mapping">
1403 <title>CQL to RPN mapping file</title>
1405 This simple file defines two context sets, three indexes and three
1406 relations, a position pattern and a default structure.
1408 <programlisting><![CDATA[
1409 set.cql = http://www.loc.gov/zing/cql/context-sets/cql/v1.1/
1410 set.dc = http://www.loc.gov/zing/cql/dc-indexes/v1.0/
1412 index.cql.serverChoice = 1=1016
1413 index.dc.title = 1=4
1414 index.dc.subject = 1=21
1420 position.any = 3=3 6=1
1426 With the mappings above, the CQL query
1430 is converted to the PQF:
1432 @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer"
1434 by rules <literal>index.cql.serverChoice</literal>,
1435 <literal>relation.scr</literal>, <literal>structure.*</literal>,
1436 <literal>position.any</literal>.
1443 is rejected, since <literal>position.right</literal> is
1449 >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x
1453 @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x"
1457 <example id="example.cql.to.rpn.string">
1458 <title>CQL to RPN string attributes</title>
1460 In this example we allow any index to be passed to RPN as
1463 <programlisting><![CDATA[
1464 # Identifiers for prefixes used in this file. (index.*)
1465 set.cql = info:srw/cql-context-set/1/cql-v1.1
1466 set.rpn = http://bogus/rpn
1467 set = http://bogus/rpn
1469 # The default index when none is specified by the query
1470 index.cql.serverChoice = 1=any
1479 The <literal>http://bogus/rpn</literal> context set is also the default
1480 so we can make queries such as
1484 which is converted to
1486 @attr 2=3 @attr 4=1 @attr 3=3 @attr 1=title "a"
1490 <example id="example.cql.to.rpn.bathprofile">
1491 <title>CQL to RPN using Bath Profile</title>
1493 The file <filename>etc/pqf.properties</filename> has mappings from
1494 the Bath Profile and Dublin Core to RPN.
1495 If YAZ is installed as a package it's usually located
1496 in <filename>/usr/share/yaz/etc</filename> and part of the
1497 development package, such as <literal>libyaz-dev</literal>.
1501 <sect3 id="cql.xcql"><title>CQL to XCQL conversion</title>
1503 Conversion from CQL to XCQL is trivial and does not
1504 require a mapping to be defined.
1505 There three functions to choose from depending on the
1506 way you wish to store the resulting output (XML buffer
1509 int cql_to_xml_buf(struct cql_node *cn, char *out, int max);
1510 void cql_to_xml(struct cql_node *cn,
1511 void (*pr)(const char *buf, void *client_data),
1513 void cql_to_xml_stdio(struct cql_node *cn, FILE *f);
1515 Function <function>cql_to_xml_buf</function> converts
1516 to XCQL and stores result in a user supplied buffer of a given
1520 <function>cql_to_xml</function> writes the result in
1521 a user defined output stream.
1522 <function>cql_to_xml_stdio</function> writes to a
1528 <sect1 id="tools.oid"><title>Object Identifiers</title>
1531 The basic YAZ representation of an OID is an array of integers,
1532 terminated with the value -1. This integer is of type
1533 <literal>Odr_oid</literal>.
1536 Fundamental OID operations and the type <literal>Odr_oid</literal>
1537 are defined in <filename>yaz/oid_util.h</filename>.
1540 An OID can either be declared as a automatic variable or it can
1541 allocated using the memory utilities or ODR/NMEM. It's
1542 guaranteed that an OID can fit in <literal>OID_SIZE</literal> integers.
1544 <example id="tools.oid.bib1.1"><title>Create OID on stack</title>
1546 We can create an OID for the Bib-1 attribute set with:
1548 Odr_oid bib1[OID_SIZE];
1560 And OID may also be filled from a string-based representation using
1561 dots (.). This is achieved by function
1563 int oid_dotstring_to_oid(const char *name, Odr_oid *oid);
1565 This functions returns 0 if name could be converted; -1 otherwise.
1567 <example id="tools.oid.bib1.2"><title>Using oid_oiddotstring_to_oid</title>
1569 We can fill the Bib-1 attribute set OID easier with:
1571 Odr_oid bib1[OID_SIZE];
1572 oid_oiddotstring_to_oid("1.2.840.10003.3.1", bib1);
1577 We can also allocate an OID dynamically on a ODR stream with:
1579 Odr_oid *odr_getoidbystr(ODR o, const char *str);
1581 This creates an OID from string-based representation using dots.
1582 This function take an &odr; stream as parameter. This stream is used to
1583 allocate memory for the data elements, which is released on a
1584 subsequent call to <function>odr_reset()</function> on that stream.
1587 <example id="tools.oid.bib1.3"><title>Using odr_getoidbystr</title>
1589 We can create a OID for the Bib-1 attribute set with:
1591 Odr_oid *bib1 = odr_getoidbystr(odr, "1.2.840.10003.3.1");
1599 char *oid_oid_to_dotstring(const Odr_oid *oid, char *oidbuf)
1601 does the reverse of <function>oid_oiddotstring_to_oid</function>. It
1602 converts an OID to the string-based representation using dots.
1603 The supplied char buffer <literal>oidbuf</literal> holds the resulting
1604 string and must be at least <literal>OID_STR_MAX</literal> in size.
1608 OIDs can be copied with <function>oid_oidcpy</function> which takes
1609 two OID lists as arguments. Alternativly, an OID copy can be allocated
1610 on a ODR stream with:
1612 Odr_oid *odr_oiddup(ODR odr, const Odr_oid *o);
1617 OIDs can be compared with <function>oid_oidcmp</function> which returns
1618 zero if the two OIDs provided are identical; non-zero otherwise.
1621 <sect2 id="tools.oid.database"><title>OID database</title>
1623 From YAZ version 3 and later, the oident system has been replaced
1624 by an OID database. OID database is a misnomer .. the old odient
1625 system was also a database.
1628 The OID database is really just a map between named Object Identifiers
1629 (string) and their OID raw equivalents. Most operations either
1630 convert from string to OID or other way around.
1633 Unfortunately, whenever we supply a string we must also specify the
1634 <emphasis>OID class</emphasis>. The class is necessary because some
1635 strings correspond to multiple OIDs. An example of such a string is
1636 <literal>Bib-1</literal> which may either be an attribute-set
1637 or a diagnostic-set.
1640 Applications using the YAZ database should include
1641 <filename>yaz/oid_db.h</filename>.
1644 A YAZ database handle is of type <literal>yaz_oid_db_t</literal>.
1645 Actually that's a pointer. You need not think deal with that.
1646 YAZ has a built-in database which can be considered "constant" for
1648 We can get hold that by using function <function>yaz_oid_std</function>.
1651 All functions with prefix <function>yaz_string_to_oid</function>
1652 converts from class + string to OID. We have variants of this
1653 operation due to different memory allocation strategies.
1656 All functions with prefix
1657 <function>yaz_oid_to_string</function> converts from OID to string
1661 <example id="tools.oid.bib1.4"><title>Create OID with YAZ DB</title>
1663 We can create an OID for the Bib-1 attribute set on the ODR stream
1667 yaz_string_to_oid_odr(yaz_oid_std(), CLASS_ATTSET, "Bib-1", odr);
1669 This is more complex than using <function>odr_getoidbystr</function>.
1670 You would only use <function>yaz_string_to_oid_odr</function> when the
1671 string (here Bib-1) is supplied by a user or configuration.
1676 <sect2 id="tools.oid.std"><title>Standard OIDs</title>
1679 All the object identifers in the standard OID database as returned
1680 by <function>yaz_oid_std</function> can referenced directly in a
1681 program as a constant OID.
1682 Each constant OID is prefixed with <literal>yaz_oid_</literal> -
1683 followed by OID class (lowercase) - then by OID name (normalized and
1687 See <xref linkend="list-oids"/> for list of all object identifiers
1689 These are declared in <filename>yaz/oid_std.h</filename> but are
1690 included by <filename>yaz/oid_db.h</filename> as well.
1693 <example id="tools.oid.bib1.5"><title>Use a built-in OID</title>
1695 We can allocate our own OID filled with the constant OID for
1698 Odr_oid *bib1 = odr_oiddup(o, yaz_oid_attset_bib1);
1704 <sect1 id="tools.nmem"><title>Nibble Memory</title>
1707 Sometimes when you need to allocate and construct a large,
1708 interconnected complex of structures, it can be a bit of a pain to
1709 release the associated memory again. For the structures describing the
1710 Z39.50 PDUs and related structures, it is convenient to use the
1711 memory-management system of the &odr; subsystem (see
1712 <xref linkend="odr.use"/>). However, in some circumstances
1713 where you might otherwise benefit from using a simple nibble memory
1714 management system, it may be impractical to use
1715 <function>odr_malloc()</function> and <function>odr_reset()</function>.
1716 For this purpose, the memory manager which also supports the &odr;
1717 streams is made available in the NMEM module. The external interface
1718 to this module is given in the <filename>nmem.h</filename> file.
1722 The following prototypes are given:
1726 NMEM nmem_create(void);
1727 void nmem_destroy(NMEM n);
1728 void *nmem_malloc(NMEM n, size_t size);
1729 void nmem_reset(NMEM n);
1730 size_t nmem_total(NMEM n);
1731 void nmem_init(void);
1732 void nmem_exit(void);
1736 The <function>nmem_create()</function> function returns a pointer to a
1737 memory control handle, which can be released again by
1738 <function>nmem_destroy()</function> when no longer needed.
1739 The function <function>nmem_malloc()</function> allocates a block of
1740 memory of the requested size. A call to <function>nmem_reset()</function>
1741 or <function>nmem_destroy()</function> will release all memory allocated
1742 on the handle since it was created (or since the last call to
1743 <function>nmem_reset()</function>. The function
1744 <function>nmem_total()</function> returns the number of bytes currently
1745 allocated on the handle.
1749 The nibble memory pool is shared amongst threads. POSIX
1750 mutex'es and WIN32 Critical sections are introduced to keep the
1751 module thread safe. Function <function>nmem_init()</function>
1752 initializes the nibble memory library and it is called automatically
1753 the first time the <literal>YAZ.DLL</literal> is loaded. &yaz; uses
1754 function <function>DllMain</function> to achieve this. You should
1755 <emphasis>not</emphasis> call <function>nmem_init</function> or
1756 <function>nmem_exit</function> unless you're absolute sure what
1757 you're doing. Note that in previous &yaz; versions you'd have to call
1758 <function>nmem_init</function> yourself.
1763 <sect1 id="tools.log"><title>Log</title>
1765 &yaz; has evolved a fairly complex log system which should be useful both
1766 for debugging &yaz; itself, debugging applications that use &yaz;, and for
1767 production use of those applications.
1770 The log functions are declared in header <filename>yaz/log.h</filename>
1771 and implemented in <filename>src/log.c</filename>.
1772 Due to name clash with syslog and some math utilities the logging
1773 interface has been modified as of YAZ 2.0.29. The obsolete interface
1774 is still available if in header file <filename>yaz/log.h</filename>.
1775 The key points of the interface are:
1778 void yaz_log(int level, const char *fmt, ...)
1780 void yaz_log_init(int level, const char *prefix, const char *name);
1781 void yaz_log_init_file(const char *fname);
1782 void yaz_log_init_level(int level);
1783 void yaz_log_init_prefix(const char *prefix);
1784 void yaz_log_time_format(const char *fmt);
1785 void yaz_log_init_max_size(int mx);
1787 int yaz_log_mask_str(const char *str);
1788 int yaz_log_module_level(const char *name);
1792 The reason for the whole log module is the <function>yaz_log</function>
1793 function. It takes a bitmask indicating the log levels, a
1794 <literal>printf</literal>-like format string, and a variable number of
1799 The <literal>log level</literal> is a bit mask, that says on which level(s)
1800 the log entry should be made, and optionally set some behaviour of the
1801 logging. In the most simple cases, it can be one of <literal>YLOG_FATAL,
1802 YLOG_DEBUG, YLOG_WARN, YLOG_LOG</literal>. Those can be combined with bits
1803 that modify the way the log entry is written:<literal>YLOG_ERRNO,
1804 YLOG_NOTIME, YLOG_FLUSH</literal>.
1805 Most of the rest of the bits are deprecated, and should not be used. Use
1806 the dynamic log levels instead.
1810 Applications that use &yaz;, should not use the LOG_LOG for ordinary
1811 messages, but should make use of the dynamic loglevel system. This consists
1812 of two parts, defining the loglevel and checking it.
1816 To define the log levels, the (main) program should pass a string to
1817 <function>yaz_log_mask_str</function> to define which log levels are to be
1818 logged. This string should be a comma-separated list of log level names,
1819 and can contain both hard-coded names and dynamic ones. The log level
1820 calculation starts with <literal>YLOG_DEFAULT_LEVEL</literal> and adds a bit
1821 for each word it meets, unless the word starts with a '-', in which case it
1822 clears the bit. If the string <literal>'none'</literal> is found,
1823 all bits are cleared. Typically this string comes from the command-line,
1824 often identified by <literal>-v</literal>. The
1825 <function>yaz_log_mask_str</function> returns a log level that should be
1826 passed to <function>yaz_log_init_level</function> for it to take effect.
1830 Each module should check what log bits it should be used, by calling
1831 <function>yaz_log_module_level</function> with a suitable name for the
1832 module. The name is cleared from a preceding path and an extension, if any,
1833 so it is quite possible to use <literal>__FILE__</literal> for it. If the
1834 name has been passed to <function>yaz_log_mask_str</function>, the routine
1835 returns a non-zero bitmask, which should then be used in consequent calls
1836 to yaz_log. (It can also be tested, so as to avoid unnecessary calls to
1837 yaz_log, in time-critical places, or when the log entry would take time
1842 Yaz uses the following dynamic log levels:
1843 <literal>server, session, request, requestdetail</literal> for the server
1845 <literal>zoom</literal> for the zoom client api.
1846 <literal>ztest</literal> for the simple test server.
1847 <literal>malloc, nmem, odr, eventl</literal> for internal debugging of yaz itself.
1848 Of course, any program using yaz is welcome to define as many new ones, as
1853 By default the log is written to stderr, but this can be changed by a call
1854 to <function>yaz_log_init_file</function> or
1855 <function>yaz_log_init</function>. If the log is directed to a file, the
1856 file size is checked at every write, and if it exceeds the limit given in
1857 <function>yaz_log_init_max_size</function>, the log is rotated. The
1858 rotation keeps one old version (with a <literal>.1</literal> appended to
1859 the name). The size defaults to 1GB. Setting it to zero will disable the
1864 A typical yaz-log looks like this
1865 13:23:14-23/11 yaz-ztest(1) [session] Starting session from tcp:127.0.0.1 (pid=30968)
1866 13:23:14-23/11 yaz-ztest(1) [request] Init from 'YAZ' (81) (ver 2.0.28) OK
1867 13:23:17-23/11 yaz-ztest(1) [request] Search Z: @attrset Bib-1 foo OK:7 hits
1868 13:23:22-23/11 yaz-ztest(1) [request] Present: [1] 2+2 OK 2 records returned
1869 13:24:13-23/11 yaz-ztest(1) [request] Close OK
1873 The log entries start with a time stamp. This can be omitted by setting the
1874 <literal>YLOG_NOTIME</literal> bit in the loglevel. This way automatic tests
1875 can be hoped to produce identical log files, that are easy to diff. The
1876 format of the time stamp can be set with
1877 <function>yaz_log_time_format</function>, which takes a format string just
1878 like <function>strftime</function>.
1882 Next in a log line comes the prefix, often the name of the program. For
1883 yaz-based servers, it can also contain the session number. Then
1884 comes one or more logbits in square brackets, depending on the logging
1885 level set by <function>yaz_log_init_level</function> and the loglevel
1886 passed to <function>yaz_log_init_level</function>. Finally comes the format
1887 string and additional values passed to <function>yaz_log</function>
1891 The log level <literal>YLOG_LOGLVL</literal>, enabled by the string
1892 <literal>loglevel</literal>, will log all the log-level affecting
1893 operations. This can come in handy if you need to know what other log
1894 levels would be useful. Grep the logfile for <literal>[loglevel]</literal>.
1898 The log system is almost independent of the rest of &yaz;, the only
1899 important dependence is of <filename>nmem</filename>, and that only for
1900 using the semaphore definition there.
1904 The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At
1905 the same time, the log bit names were changed from
1906 <literal>LOG_something</literal> to <literal>YLOG_something</literal>,
1907 to avoid collision with <filename>syslog.h</filename>.
1912 <sect1 id="marc"><title>MARC</title>
1915 YAZ provides a fast utility for working with MARC records.
1916 Early versions of the MARC utility only allowed decoding of ISO2709.
1917 Today the utility may both encode - and decode to a varity of formats.
1920 #include <yaz/marcdisp.h>
1922 /* create handler */
1923 yaz_marc_t yaz_marc_create(void);
1925 void yaz_marc_destroy(yaz_marc_t mt);
1927 /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */
1928 void yaz_marc_xml(yaz_marc_t mt, int xmlmode);
1929 #define YAZ_MARC_LINE 0
1930 #define YAZ_MARC_SIMPLEXML 1
1931 #define YAZ_MARC_OAIMARC 2
1932 #define YAZ_MARC_MARCXML 3
1933 #define YAZ_MARC_ISO2709 4
1934 #define YAZ_MARC_XCHANGE 5
1935 #define YAZ_MARC_CHECK 6
1936 #define YAZ_MARC_TURBOMARC 7
1938 /* supply iconv handle for character set conversion .. */
1939 void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
1941 /* set debug level, 0=none, 1=more, 2=even more, .. */
1942 void yaz_marc_debug(yaz_marc_t mt, int level);
1944 /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
1945 On success, result in *result with size *rsize. */
1946 int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize,
1947 const char **result, size_t *rsize);
1949 /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
1950 On success, result in WRBUF */
1951 int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf,
1952 int bsize, WRBUF wrbuf);
1957 The synopsis is just a basic subset of all functionality. Refer
1958 to the actual header file <filename>marcdisp.h</filename> for
1963 A MARC conversion handle must be created by using
1964 <function>yaz_marc_create</function> and destroyed
1965 by calling <function>yaz_marc_destroy</function>.
1968 All other function operate on a <literal>yaz_marc_t</literal> handle.
1969 The output is specified by a call to <function>yaz_marc_xml</function>.
1970 The <literal>xmlmode</literal> must be one of
1973 <term>YAZ_MARC_LINE</term>
1976 A simple line-by-line format suitable for display but not
1977 recommend for further (machine) processing.
1983 <term>YAZ_MARC_MARCXML</term>
1986 <ulink url="&url.marcxml;">MARCXML</ulink>.
1992 <term>YAZ_MARC_ISO2709</term>
1995 ISO2709 (sometimes just referred to as "MARC").
2001 <term>YAZ_MARC_XCHANGE</term>
2004 <ulink url="&url.marcxchange;">MarcXchange</ulink>.
2010 <term>YAZ_MARC_CHECK</term>
2013 Pseudo format for validation only. Does not generate
2014 any real output except diagnostics.
2020 <term>YAZ_MARC_TURBOMARC</term>
2023 XML format with same semantics as MARCXML but more compact
2024 and geared towards fast processing with XSLT. Refer to
2025 <xref linkend="tools.turbomarc"/> for more information.
2033 The actual conversion functions are
2034 <function>yaz_marc_decode_buf</function> and
2035 <function>yaz_marc_decode_wrbuf</function> which decodes and encodes
2036 a MARC record. The former function operates on simple buffers, the
2037 stores the resulting record in a WRBUF handle (WRBUF is a simple string
2040 <example id="example.marc.display">
2041 <title>Display of MARC record</title>
2043 The following program snippet illustrates how the MARC API may
2044 be used to convert a MARC record to the line-by-line format:
2045 <programlisting><![CDATA[
2046 void print_marc(const char *marc_buf, int marc_buf_size)
2048 char *result; /* for result buf */
2049 size_t result_len; /* for size of result */
2050 yaz_marc_t mt = yaz_marc_create();
2051 yaz_marc_xml(mt, YAZ_MARC_LINE);
2052 yaz_marc_decode_buf(mt, marc_buf, marc_buf_size,
2053 &result, &result_len);
2054 fwrite(result, result_len, 1, stdout);
2055 yaz_marc_destroy(mt); /* note that result is now freed... */
2061 <sect2 id="tools.turbomarc">
2062 <title>TurboMARC</title>
2064 TurboMARC is yet another XML encoding of a MARC record. The format
2065 was designed for fast processing with XSLT.
2069 Pazpar2 uses XSLT to convert an XML encoded MARC record to an internal
2070 representation. This conversion mostly check the tag of a MARC field
2071 to determine the basic rules in the conversion. This check is
2072 costly when that is tag is encoded as an attribute in MARCXML.
2073 By having the tag value as the element instead, makes processing
2074 many times faster (at least for Libxslt).
2077 TurboMARC is encoded as follows:
2080 Record elements is part of namespace
2081 "<literal>http://www.indexdata.com/turbomarc</literal>".
2084 A record is enclosed in element <literal>r</literal>.
2087 A collection of records is enclosed in element
2088 <literal>collection</literal>.
2091 The leader is encoded as element <literal>l</literal> with the
2092 leader content as its (text) value.
2095 A control field is encoded as element <literal>c</literal> concatenated
2096 with the tag value of the control field if the tag value
2097 matches the regular expression <literal>[a-zA-Z0-9]*</literal>.
2098 If the tag value do not match the regular expression
2099 <literal>[a-zA-Z0-9]*</literal> the control field is encoded
2100 as element <literal>c</literal> and attribute <literal>code</literal>
2101 will hold the tag value.
2102 This rule ensure that in the rare cases where a tag value might
2103 result in a non-wellformed XML YAZ encode it as a coded attribute
2107 The control field content is the the text value of this element.
2108 Indicators are encoded as attribute names
2109 <literal>i1</literal>, <literal>i2</literal>, etc.. and
2110 corresponding values for each indicator.
2113 A data field is encoded as element <literal>d</literal> concatenated
2114 with the tag value of the data field or using the attribute
2115 <literal>code</literal> as described in the rules for control fields.
2116 The children of the data field element is subfield elements.
2117 Each subfield element is encoded as <literal>s</literal>
2118 concatenated with the sub field code.
2119 The text of the subfield element is the contents of the subfield.
2120 Indicators are encoded as attributes for the data field element similar
2121 to the encoding for control fields.
2128 <sect1 id="tools.retrieval">
2129 <title>Retrieval Facility</title>
2131 YAZ version 2.1.20 or later includes a Retrieval facility tool
2132 which allows a SRU/Z39.50 to describe itself and perform record
2133 conversions. The idea is the following:
2138 An SRU/Z39.50 client sends a retrieval request which includes
2139 a combination of the following parameters: syntax (format),
2140 schema (or element set name).
2146 The retrieval facility is invoked with parameters in a
2147 server/proxy. The retrieval facility matches the parameters a set of
2148 "supported" retrieval types.
2149 If there is no match, the retrieval signals an error
2150 (syntax and / or schema not supported).
2156 For a successful match, the backend is invoked with the same
2157 or altered retrieval parameters (syntax, schema). If
2158 a record is received from the backend, it is converted to the
2159 frontend name / syntax.
2165 The resulting record is sent back the client and tagged with
2166 the frontend syntax / schema.
2173 The Retrieval facility is driven by an XML configuration. The
2174 configuration is neither Z39.50 ZeeRex or SRU ZeeRex. But it
2175 should be easy to generate both of them from the XML configuration.
2176 (unfortunately the two versions
2177 of ZeeRex differ substantially in this regard).
2179 <sect2 id="tools.retrieval.format">
2180 <title>Retrieval XML format</title>
2182 All elements should be covered by namespace
2183 <literal>http://indexdata.com/yaz</literal> .
2184 The root element node must be <literal>retrievalinfo</literal>.
2187 The <literal>retrievalinfo</literal> must include one or
2188 more <literal>retrieval</literal> elements. Each
2189 <literal>retrieval</literal> defines specific combination of
2190 syntax, name and identifier supported by this retrieval service.
2193 The <literal>retrieval</literal> element may include any of the
2194 following attributes:
2196 <varlistentry><term><literal>syntax</literal> (REQUIRED)</term>
2199 Defines the record syntax. Possible values is any
2200 of the names defined in YAZ' OID database or a raw
2205 <varlistentry><term><literal>name</literal> (OPTIONAL)</term>
2208 Defines the name of the retrieval format. This can be
2209 any string. For SRU, the value, is equivalent to schema (short-hand);
2210 for Z39.50 it's equivalent to simple element set name.
2211 For YAZ 3.0.24 and later this name may be specified as a glob
2212 expression with operators
2213 <literal>*</literal> and <literal>?</literal>.
2217 <varlistentry><term><literal>identifier</literal> (OPTIONAL)</term>
2220 Defines the URI schema name of the retrieval format. This can be
2221 any string. For SRU, the value, is equivalent to URI schema.
2222 For Z39.50, there is no equivalent.
2229 The <literal>retrieval</literal> may include one
2230 <literal>backend</literal> element. If a <literal>backend</literal>
2231 element is given, it specifies how the records are retrieved by
2232 some backend and how the records are converted from the backend to
2236 The attributes, <literal>name</literal> and <literal>syntax</literal>
2237 may be specified for the <literal>backend</literal> element. These
2238 semantics of these attributes is equivalent to those for the
2239 <literal>retrieval</literal>. However, these values are passed to
2243 The <literal>backend</literal> element may includes one or more
2244 conversion instructions (as children elements). The supported
2247 <varlistentry><term><literal>marc</literal></term>
2250 The <literal>marc</literal> element specifies a conversion
2251 to - and from ISO2709 encoded MARC and
2252 <ulink url="&url.marcxml;">&acro.marcxml;</ulink>/MarcXchange.
2253 The following attributes may be specified:
2256 <varlistentry><term><literal>inputformat</literal> (REQUIRED)</term>
2259 Format of input. Supported values are
2260 <literal>marc</literal> (for ISO2709); and <literal>xml</literal>
2261 for MARCXML/MarcXchange.
2266 <varlistentry><term><literal>outputformat</literal> (REQUIRED)</term>
2269 Format of output. Supported values are
2270 <literal>line</literal> (MARC line format);
2271 <literal>marcxml</literal> (for MARCXML),
2272 <literal>marc</literal> (ISO2709),
2273 <literal>marcxhcange</literal> (for MarcXchange).
2278 <varlistentry><term><literal>inputcharset</literal> (OPTIONAL)</term>
2281 Encoding of input. For XML input formats, this need not
2282 be given, but for ISO2709 based inputformats, this should
2283 be set to the encoding used. For MARC21 records, a common
2284 inputcharset value would be <literal>marc-8</literal>.
2289 <varlistentry><term><literal>outputcharset</literal> (OPTIONAL)</term>
2292 Encoding of output. If outputformat is XML based, it is
2293 strongly recommened to use <literal>utf-8</literal>.
2302 <varlistentry><term><literal>xslt</literal></term>
2305 The <literal>xslt</literal> element specifies a conversion
2306 via &acro.xslt;. The following attributes may be specified:
2309 <varlistentry><term><literal>stylesheet</literal> (REQUIRED)</term>
2324 <sect2 id="tools.retrieval.examples">
2325 <title>Retrieval Facility Examples</title>
2326 <example id="tools.retrieval.marc21">
2327 <title>MARC21 backend</title>
2329 A typical way to use the retrieval facility is to enable XML
2330 for servers that only supports ISO2709 encoded MARC21 records.
2332 <programlisting><![CDATA[
2334 <retrieval syntax="usmarc" name="F"/>
2335 <retrieval syntax="usmarc" name="B"/>
2336 <retrieval syntax="xml" name="marcxml"
2337 identifier="info:srw/schema/1/marcxml-v1.1">
2338 <backend syntax="usmarc" name="F">
2339 <marc inputformat="marc" outputformat="marcxml"
2340 inputcharset="marc-8"/>
2343 <retrieval syntax="xml" name="dc">
2344 <backend syntax="usmarc" name="F">
2345 <marc inputformat="marc" outputformat="marcxml"
2346 inputcharset="marc-8"/>
2347 <xslt stylesheet="MARC21slim2DC.xsl"/>
2354 This means that our frontend supports:
2358 MARC21 F(ull) records.
2363 MARC21 B(rief) records.
2375 Dublin core records.
2382 <sect2 id="tools.retrieval.api">
2385 It should be easy to use the retrieval systems from applications. Refer
2387 <filename>yaz/retrieval.h</filename> and
2388 <filename>yaz/record_conv.h</filename>.
2394 <!-- Keep this comment at the end of the file
2399 sgml-minimize-attributes:nil
2400 sgml-always-quote-attributes:t
2403 sgml-parent-document: "yaz.xml"
2404 sgml-local-catalogs: nil
2405 sgml-namecase-general:t