1 <!-- $Header: /home/cvsroot/yaz/doc/tools.xml,v 1.1 2001-01-04 13:36:25 adam Exp $ -->
2 <chapter><title>Supporting Tools</title>
5 In support of the service API - primarily the ASN module, which
6 provides the programmatic interface to the Z39.50 APDUs, YAZ contains
7 a collection of tools that support the development of applications.
10 <sect1><title>Query Syntax Parsers</title>
13 Since the type-1 (RPN) query structure has no direct, useful string
14 representation, every origin application needs to provide some form of
15 mapping from a local query notation or representation to a
16 <token>Z_RPNQuery</token> structure. Some programmers will prefer to
17 construct the query manually, perhaps using <function>odr_malloc()</function>
18 to simplify memory management. The &yaz; distribution includes two separate,
19 query-generating tools that may be of use to you.
22 <sect2><title id="PQF">Prefix Query Format</title>
25 Since RPN or reverse polish notation is really just a fancy way of
26 describing a suffix notation format (operator follows operands), it
27 would seem that the confusion is total when we now introduce a prefix
28 notation for RPN. The reason is one of simple laziness - it's somewhat
29 simpler to interpret a prefix format, and this utility was designed
30 for maximum simplicity, to provide a baseline representation for use
31 in simple test applications and scripting environments (like Tcl). The
32 demonstration client included with YAZ uses the PQF.
35 The PQF is defined by the pquery module in the YAZ library. The
36 <filename>pquery.h</filename> file provides the declaration of the functions
39 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
41 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
42 Odr_oid **attributeSetP, const char *qbuf);
44 int p_query_attset (const char *arg);
47 The function <function>p_query_rpn()</function> takes as arguments an
48 &odr; stream (see section <link linkend="odr">The ODR Module</link>)
49 to provide a memory source (the structure created is released on
50 the next call to <function>odr_reset()</function> on the stream), a
51 protocol identifier (one of the constants <token>PROTO_Z3950</token> and
52 <token>PROTO_SR</token>), an attribute set
53 reference, and finally a null-terminated string holding the query
57 If the parse went well, <function>p_query_rpn()</function> returns a
58 pointer to a <literal>Z_RPNQuery</literal> structure which can be
59 placed directly into a <literal>Z_SearchRequest</literal>.
63 The <literal>p_query_attset</literal> specifies which attribute set to use if
64 the query doesn't specify one by the <literal>@attrset</literal> operator.
65 The <literal>p_query_attset</literal> returns 0 if the argument is a
66 valid attribute set specifier; otherwise the function returns -1.
70 The grammar of the PQF is as follows:
74 Query ::= [ AttSet ] QueryStruct.
78 QueryStruct ::= { Attribute } Simple | Complex.
80 Attribute ::= '@attr' AttributeType '=' AttributeValue.
82 AttributeType ::= integer.
84 AttributeValue ::= integer.
86 Complex ::= Operator QueryStruct QueryStruct.
88 Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity.
90 Simple ::= ResultSet | Term.
92 ResultSet ::= '@set' string.
94 Term ::= string | '"' string '"'.
96 Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode.
98 Exclusion ::= '1' | '0' | 'void'.
100 Distance ::= integer.
102 Ordered ::= '1' | '0'.
104 Relation ::= integer.
106 WhichCode ::= 'known' | 'private' | integer.
108 UnitCode ::= integer.
112 You will note that the syntax above is a fairly faithful
113 representation of RPN, except for the Attibute, which has been
114 moved a step away from the term, allowing you to associate one or more
115 attributes with an entire query structure. The parser will
116 automatically apply the given attributes to each term as required.
120 The following are all examples of valid queries in the PQF.
128 @or "dylan" "zimmerman"
132 @or @and bob dylan @set Result-1
134 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
136 @attr 4=1 @attr 1=4 "self portrait"
138 @prox 0 3 1 2 k 2 dylan zimmerman
142 <sect2><title id="CCL">Common Command Language</title>
145 Not all users enjoy typing in prefix query structures and numerical
146 attribute values, even in a minimalistic test client. In the library
147 world, the more intuitive Common Command Language (or ISO 8777) has
148 enjoyed some popularity - especially before the widespread
149 availability of graphical interfaces. It is still useful in
150 applications where you for some reason or other need to provide a
151 symbolic language for expressing boolean query structures.
155 The EUROPAGATE research project working under the Libraries programme
156 of the European Commission's DG XIII has, amongst other useful tools,
157 implemented a general-purpose CCL parser which produces an output
158 structure that can be trivially converted to the internal RPN
159 representation of YAZ (The <literal>Z_RPNQuery</literal> structure).
160 Since the CCL utility - along with the rest of the software
161 produced by EUROPAGATE - is made freely available on a liberal license, it
162 is included as a supplement to YAZ.
165 <sect3><title>CCL Syntax</title>
168 The CCL parser obeys the following grammar for the FIND argument.
169 The syntax is annotated by in the lines prefixed by
170 <literal>‐‐</literal>.
174 CCL-Find ::= CCL-Find Op Elements
177 Op ::= "and" | "or" | "not"
178 -- The above means that Elements are separated by boolean operators.
180 Elements ::= '(' CCL-Find ')'
183 | Qualifiers Relation Terms
184 | Qualifiers Relation '(' CCL-Find ')'
185 | Qualifiers '=' string '-' string
186 -- Elements is either a recursive definition, a result set reference, a
187 -- list of terms, qualifiers followed by terms, qualifiers followed
188 -- by a recursive definition or qualifiers in a range (lower - upper).
190 Set ::= 'set' = string
191 -- Reference to a result set
193 Terms ::= Terms Prox Term
195 -- Proximity of terms.
199 -- This basically means that a term may include a blank
201 Qualifiers ::= Qualifiers ',' string
203 -- Qualifiers is a list of strings separated by comma
205 Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
206 -- Relational operators. This really doesn't follow the ISO8777
210 -- Proximity operator
215 The following queries are all valid:
227 (dylan and bob) or set=1
231 Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
232 and <literal>date</literal> are defined we may use:
238 au=(bob dylan and slow train coming)
240 date>1980 and (ti=((self portrait)))
245 <sect3><title>CCL Qualifiers</title>
248 Qualifiers are used to direct the search to a particular searchable
249 index, such as title (ti) and author indexes (au). The CCL standard
250 itself doesn't specify a particular set of qualifiers, but it does
251 suggest a few short-hand notations. You can customize the CCL parser
252 to support a particular set of qualifiers to relect the current target
253 profile. Traditionally, a qualifier would map to a particular
254 use-attribute within the BIB-1 attribute set. However, you could also
255 define qualifiers that would set, for example, the
260 Consider a scenario where the target support ranked searches in the
261 title-index. In this case, the user could specify
265 ti,ranked=knuth computer
268 and the <literal>ranked</literal> would map to structure=free-form-text
269 (4=105) and the <literal>ti</literal> would map to title (1=4).
273 A "profile" with a set predefined CCL qualifiers can be read from a
274 file. The YAZ client reads its CCL qualifiers from a file named
275 <filename>default.bib</filename>. Each line in the file has the form:
279 <replaceable>qualifier-name</replaceable>
280 <replaceable>type</replaceable>=<replaceable>val</replaceable> <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
284 where <replaceable>qualifier-name</replaceable> is the name of the
285 qualifier to be used (eg. <literal>ti</literal>),
286 <replaceable>type</replaceable> is a BIB-1 category type and
287 <replaceable>val</replaceable> is the corresponding BIB-1 attribute value.
288 The <replaceable>type</replaceable> can be either numeric or it may be
289 either <literal>u</literal> (use), <literal>r</literal> (relation),
290 <literal>p</literal> (position), <literal>s</literal> (structure),
291 <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
292 The <replaceable>qualifier-name</replaceable> <literal>term</literal> has a
293 special meaning. The types and values for this definition is used when
294 <emphasis>no</emphasis> qualifiers are present.
298 Consider the following definition:
307 Two qualifiers are defined, <literal>ti</literal> and <literal>au</literal>.
308 They both set the structure-attribute to phrase (1). <literal>ti</literal>
309 sets the use-attribute to 4. <literal>au</literal> sets the use-attribute
310 to 1. When no qualifiers are used in the query the structure-attribute is
311 set to free-form-text (105).
315 <sect3><title>CCL API</title>
317 All public definitions can be found in the header file
318 <filename>ccl.h</filename>. A profile identifier is of type
319 <literal>CCL_bibset</literal>. A profile must be created with the call to
320 the function <function>ccl_qual_mk</function> which returns a profile
321 handle of type <literal>CCL_bibset</literal>.
325 To read a file containing qualifier definitions the function
326 <function>ccl_qual_file</function> may be convenient. This function takes
327 an already opened <literal>FILE</literal> handle pointer as argument
328 along with a <literal>CCL_bibset</literal> handle.
332 To parse a simple string with a FIND query use the function
335 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
336 int *error, int *pos);
339 which takes the CCL profile (<literal>bibset</literal>) and query
340 (<literal>str</literal>) as input. Upon successful completion the RPN
341 tree is returned. If an error eccur, such as a syntax error, the integer
342 pointed to by <literal>error</literal> holds the error code and
343 <literal>pos</literal> holds the offset inside query string in which
348 An english representation of the error may be obtained by calling
349 the <literal>ccl_err_msg</literal> function. The error codes are listed in
350 <filename>ccl.h</filename>.
354 To convert the CCL RPN tree (type <literal>struct ccl_rpn_node *</literal>)
355 to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
356 must be used. This function which is part of YAZ is implemented in
357 <filename>yaz-ccl.c</filename>.
358 After calling this function the CCL RPN tree is probably no longer
359 needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
363 A CCL profile may be destroyed by calling the <function>ccl_qual_rm</function>
368 The token names for the CCL operators may be changed by setting the
369 globals (all type <literal>char *</literal>)
370 <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
371 <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
372 An operator may have aliases, i.e. there may be more than one name for
373 the operator. To do this, separate each alias with a space character.
378 <sect1><title>Object Identifiers</title>
381 The basic YAZ representation of an OID is an array of integers,
382 terminated with the value -1. The &odr; module provides two
383 utility-functions to create and copy this type of data elements:
387 Odr_oid *odr_getoidbystr(ODR o, char *str);
391 Creates an OID based on a string-based representation using dots (.)
392 to separate elements in the OID.
396 Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
400 Creates a copy of the OID referenced by the <emphasis>o</emphasis> parameter.
401 Both functions take an &odr; stream as parameter. This stream is used to
402 allocate memory for the data elements, which is released on a
403 subsequent call to <function>odr_reset()</function> on that stream.
407 The OID module provides a higher-level representation of the
408 family of object identifers which describe the Z39.50 protocol and its
409 related objects. The definition of the module interface is given in
410 the <filename>oid.h</filename> file.
414 The interface is mainly based on the <literal>oident</literal> structure. The
415 definition of this structure looks like this:
419 typedef struct oident
424 int oidsuffix[OID_SIZE];
430 The proto field takes one of the values
439 If you don't care about talking to SR-based implementations (few
440 exist, and they may become fewer still if and when the ISO SR and ANSI
441 Z39.50 documents are merged into a single standard), you can ignore
442 this field on incoming packages, and always set it to PROTO_Z3950
443 for outgoing packages.
447 The oclass field takes one of the values
469 corresponding to the OID classes defined by the Z39.50 standard.
471 Finally, the value field takes one of the values
529 again, corresponding to the specific OIDs defined by the standard.
533 The desc field contains a brief, mnemonic name for the OID in question.
541 struct oident *oid_getentbyoid(int *o);
545 takes as argument an OID, and returns a pointer to a static area
546 containing an <literal>oident</literal> structure. You typically use
547 this function when you receive a PDU containing an OID, and you wish
548 to branch out depending on the specific OID value.
556 int *oid_ent_to_oid(struct oident *ent, int *dst);
560 Takes as argument an <literal>oident</literal> structure - in which
561 the <literal>proto</literal>, <literal>oclass</literal>/, and
562 <literal>value</literal> fields are assumed to be set correctly -
563 and returns a pointer to a the buffer as given by <literal>dst</literal>
565 representation of the corresponding OID. The function returns
566 NULL and the array dst is unchanged if a mapping couldn't place.
567 The array <literal>dst</literal> should be at least of size
568 <literal>OID_SIZE</literal>.
572 The <function>oid_ent_to_oid()</function> function can be used whenever
573 you need to prepare a PDU containing one or more OIDs. The separation of
574 the <literal>protocol</literal> element from the remainer of the
575 OID-description makes it simple to write applications that can
576 communicate with either Z39.50 or OSI SR-based applications.
584 oid_value oid_getvalbyname(const char *name);
588 takes as argument a mnemonic OID name, and returns the
589 <literal>/value</literal> field of the first entry in the database that
590 contains the given name in its <literal>desc</literal> field.
594 Finally, the module provides the following utility functions, whose
595 meaning should be obvious:
599 void oid_oidcpy(int *t, int *s);
600 void oid_oidcat(int *t, int *s);
601 int oid_oidcmp(int *o1, int *o2);
602 int oid_oidlen(int *o);
607 The OID module has been criticized - and perhaps rightly so
608 - for needlessly abstracting the
609 representation of OIDs. Other toolkits use a simple
610 string-representation of OIDs with good results. In practice, we have
611 found the interface comfortable and quick to work with, and it is a
612 simple matter (for what it's worth) to create applications compatible with
613 both ISO SR and Z39.50. Finally, the use of the <literal>/oident</literal>
614 database is by no means mandatory. You can easily create your
615 own system for representing OIDs, as long as it is compatible with the
616 low-level integer-array representation of the ODR module.
622 <sect1><title>Nibble Memory</title>
625 Sometimes when you need to allocate and construct a large,
626 interconnected complex of structures, it can be a bit of a pain to
627 release the associated memory again. For the structures describing the
628 Z39.50 PDUs and related structures, it is convenient to use the
629 memory-management system of the &odr; subsystem (see
630 <link linkend="odr-use">Using ODR</link>). However, in some circumstances
631 where you might otherwise benefit from using a simple nibble memory
632 management system, it may be impractical to use
633 <function>odr_malloc()</function> and <function>odr_reset()</function>.
634 For this purpose, the memory manager which also supports the &odr; streams
635 is made available in the NMEM module. The external interface to this module is given in the <filename>nmem.h</filename> file.
639 The following prototypes are given:
643 NMEM nmem_create(void);
644 void nmem_destroy(NMEM n);
645 void *nmem_malloc(NMEM n, int size);
646 void nmem_reset(NMEM n);
647 int nmem_total(NMEM n);
648 void nmem_init(void);
652 The <function>nmem_create()</function> function returns a pointer to a
653 memory control handle, which can be released again by
654 <function>nmem_destroy()</function> when no longer needed.
655 The function <function>nmem_malloc()</function> allocates a block of
656 memory of the requested size. A call to <function>nmem_reset()</function> or
657 <function>nmem_destroy()</function> will release all memory allocated on
658 the handle since it was created (or since the last call to
659 <function>nmem_reset()</function>. The function
660 <function>nmem_total()</function> returns the number of bytes currently
661 allocated on the handle.
666 The nibble memory pool is shared amonst threads. POSIX
667 mutex'es and WIN32 Critical sections are introduced to keep the
668 module thread safe. On WIN32 function <function>nmem_init()</function>
669 initialises the Critical Section handle and should be called once before any
670 other nmem function is used.