1 <!-- $Id: tools.xml,v 1.2 2001-07-19 23:29:40 adam Exp $ -->
2 <chapter><title>Supporting Tools</title>
5 In support of the service API - primarily the ASN module, which
6 provides the programmatic interface to the Z39.50 APDUs, YAZ contains
7 a collection of tools that support the development of applications.
10 <sect1><title>Query Syntax Parsers</title>
13 Since the type-1 (RPN) query structure has no direct, useful string
14 representation, every origin application needs to provide some form of
15 mapping from a local query notation or representation to a
16 <token>Z_RPNQuery</token> structure. Some programmers will prefer to
17 construct the query manually, perhaps using
18 <function>odr_malloc()</function> to simplify memory management.
19 The &yaz; distribution includes two separate, query-generating tools
20 that may be of use to you.
23 <sect2><title id="PQF">Prefix Query Format</title>
26 Since RPN or reverse polish notation is really just a fancy way of
27 describing a suffix notation format (operator follows operands), it
28 would seem that the confusion is total when we now introduce a prefix
29 notation for RPN. The reason is one of simple laziness - it's somewhat
30 simpler to interpret a prefix format, and this utility was designed
31 for maximum simplicity, to provide a baseline representation for use
32 in simple test applications and scripting environments (like Tcl). The
33 demonstration client included with YAZ uses the PQF.
36 The PQF is defined by the pquery module in the YAZ library. The
37 <filename>pquery.h</filename> file provides the declaration of the
41 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
43 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
44 Odr_oid **attributeSetP, const char *qbuf);
46 int p_query_attset (const char *arg);
49 The function <function>p_query_rpn()</function> takes as arguments an
50 &odr; stream (see section <link linkend="odr">The ODR Module</link>)
51 to provide a memory source (the structure created is released on
52 the next call to <function>odr_reset()</function> on the stream), a
53 protocol identifier (one of the constants <token>PROTO_Z3950</token> and
54 <token>PROTO_SR</token>), an attribute set
55 reference, and finally a null-terminated string holding the query
59 If the parse went well, <function>p_query_rpn()</function> returns a
60 pointer to a <literal>Z_RPNQuery</literal> structure which can be
61 placed directly into a <literal>Z_SearchRequest</literal>.
65 The <literal>p_query_attset</literal> specifies which attribute set
66 to use if the query doesn't specify one by the
67 <literal>@attrset</literal> operator.
68 The <literal>p_query_attset</literal> returns 0 if the argument is a
69 valid attribute set specifier; otherwise the function returns -1.
73 The grammar of the PQF is as follows:
77 Query ::= [ AttSet ] QueryStruct.
81 QueryStruct ::= { Attribute } Simple | Complex.
83 Attribute ::= '@attr' AttributeType '=' AttributeValue.
85 AttributeType ::= integer.
87 AttributeValue ::= integer.
89 Complex ::= Operator QueryStruct QueryStruct.
91 Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity.
93 Simple ::= ResultSet | Term.
95 ResultSet ::= '@set' string.
97 Term ::= string | '"' string '"'.
99 Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode.
101 Exclusion ::= '1' | '0' | 'void'.
103 Distance ::= integer.
105 Ordered ::= '1' | '0'.
107 Relation ::= integer.
109 WhichCode ::= 'known' | 'private' | integer.
111 UnitCode ::= integer.
115 You will note that the syntax above is a fairly faithful
116 representation of RPN, except for the Attibute, which has been
117 moved a step away from the term, allowing you to associate one or more
118 attributes with an entire query structure. The parser will
119 automatically apply the given attributes to each term as required.
123 The following are all examples of valid queries in the PQF.
131 @or "dylan" "zimmerman"
135 @or @and bob dylan @set Result-1
137 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
139 @attr 4=1 @attr 1=4 "self portrait"
141 @prox 0 3 1 2 k 2 dylan zimmerman
145 <sect2><title id="CCL">Common Command Language</title>
148 Not all users enjoy typing in prefix query structures and numerical
149 attribute values, even in a minimalistic test client. In the library
150 world, the more intuitive Common Command Language (or ISO 8777) has
151 enjoyed some popularity - especially before the widespread
152 availability of graphical interfaces. It is still useful in
153 applications where you for some reason or other need to provide a
154 symbolic language for expressing boolean query structures.
158 The EUROPAGATE research project working under the Libraries programme
159 of the European Commission's DG XIII has, amongst other useful tools,
160 implemented a general-purpose CCL parser which produces an output
161 structure that can be trivially converted to the internal RPN
162 representation of YAZ (The <literal>Z_RPNQuery</literal> structure).
163 Since the CCL utility - along with the rest of the software
164 produced by EUROPAGATE - is made freely available on a liberal license, it
165 is included as a supplement to YAZ.
168 <sect3><title>CCL Syntax</title>
171 The CCL parser obeys the following grammar for the FIND argument.
172 The syntax is annotated by in the lines prefixed by
173 <literal>‐‐</literal>.
177 CCL-Find ::= CCL-Find Op Elements
180 Op ::= "and" | "or" | "not"
181 -- The above means that Elements are separated by boolean operators.
183 Elements ::= '(' CCL-Find ')'
186 | Qualifiers Relation Terms
187 | Qualifiers Relation '(' CCL-Find ')'
188 | Qualifiers '=' string '-' string
189 -- Elements is either a recursive definition, a result set reference, a
190 -- list of terms, qualifiers followed by terms, qualifiers followed
191 -- by a recursive definition or qualifiers in a range (lower - upper).
193 Set ::= 'set' = string
194 -- Reference to a result set
196 Terms ::= Terms Prox Term
198 -- Proximity of terms.
202 -- This basically means that a term may include a blank
204 Qualifiers ::= Qualifiers ',' string
206 -- Qualifiers is a list of strings separated by comma
208 Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
209 -- Relational operators. This really doesn't follow the ISO8777
213 -- Proximity operator
218 The following queries are all valid:
230 (dylan and bob) or set=1
234 Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
235 and <literal>date</literal> are defined we may use:
241 au=(bob dylan and slow train coming)
243 date>1980 and (ti=((self portrait)))
248 <sect3><title>CCL Qualifiers</title>
251 Qualifiers are used to direct the search to a particular searchable
252 index, such as title (ti) and author indexes (au). The CCL standard
253 itself doesn't specify a particular set of qualifiers, but it does
254 suggest a few short-hand notations. You can customize the CCL parser
255 to support a particular set of qualifiers to relect the current target
256 profile. Traditionally, a qualifier would map to a particular
257 use-attribute within the BIB-1 attribute set. However, you could also
258 define qualifiers that would set, for example, the
263 Consider a scenario where the target support ranked searches in the
264 title-index. In this case, the user could specify
268 ti,ranked=knuth computer
271 and the <literal>ranked</literal> would map to structure=free-form-text
272 (4=105) and the <literal>ti</literal> would map to title (1=4).
276 A "profile" with a set predefined CCL qualifiers can be read from a
277 file. The YAZ client reads its CCL qualifiers from a file named
278 <filename>default.bib</filename>. Each line in the file has the form:
282 <replaceable>qualifier-name</replaceable>
283 <replaceable>type</replaceable>=<replaceable>val</replaceable>
284 <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
288 where <replaceable>qualifier-name</replaceable> is the name of the
289 qualifier to be used (eg. <literal>ti</literal>),
290 <replaceable>type</replaceable> is a BIB-1 category type and
291 <replaceable>val</replaceable> is the corresponding BIB-1 attribute
293 The <replaceable>type</replaceable> can be either numeric or it may be
294 either <literal>u</literal> (use), <literal>r</literal> (relation),
295 <literal>p</literal> (position), <literal>s</literal> (structure),
296 <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
297 The <replaceable>qualifier-name</replaceable> <literal>term</literal>
298 has a special meaning.
299 The types and values for this definition is used when
300 <emphasis>no</emphasis> qualifiers are present.
304 Consider the following definition:
313 Two qualifiers are defined, <literal>ti</literal> and
314 <literal>au</literal>.
315 They both set the structure-attribute to phrase (1).
316 <literal>ti</literal>
317 sets the use-attribute to 4. <literal>au</literal> sets the
319 When no qualifiers are used in the query the structure-attribute is
320 set to free-form-text (105).
324 <sect3><title>CCL API</title>
326 All public definitions can be found in the header file
327 <filename>ccl.h</filename>. A profile identifier is of type
328 <literal>CCL_bibset</literal>. A profile must be created with the call
329 to the function <function>ccl_qual_mk</function> which returns a profile
330 handle of type <literal>CCL_bibset</literal>.
334 To read a file containing qualifier definitions the function
335 <function>ccl_qual_file</function> may be convenient. This function
336 takes an already opened <literal>FILE</literal> handle pointer as
337 argument along with a <literal>CCL_bibset</literal> handle.
341 To parse a simple string with a FIND query use the function
344 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
345 int *error, int *pos);
348 which takes the CCL profile (<literal>bibset</literal>) and query
349 (<literal>str</literal>) as input. Upon successful completion the RPN
350 tree is returned. If an error eccur, such as a syntax error, the integer
351 pointed to by <literal>error</literal> holds the error code and
352 <literal>pos</literal> holds the offset inside query string in which
357 An english representation of the error may be obtained by calling
358 the <literal>ccl_err_msg</literal> function. The error codes are
359 listed in <filename>ccl.h</filename>.
363 To convert the CCL RPN tree (type
364 <literal>struct ccl_rpn_node *</literal>)
365 to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
366 must be used. This function which is part of YAZ is implemented in
367 <filename>yaz-ccl.c</filename>.
368 After calling this function the CCL RPN tree is probably no longer
369 needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
373 A CCL profile may be destroyed by calling the
374 <function>ccl_qual_rm</function> function.
378 The token names for the CCL operators may be changed by setting the
379 globals (all type <literal>char *</literal>)
380 <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
381 <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
382 An operator may have aliases, i.e. there may be more than one name for
383 the operator. To do this, separate each alias with a space character.
388 <sect1><title>Object Identifiers</title>
391 The basic YAZ representation of an OID is an array of integers,
392 terminated with the value -1. The &odr; module provides two
393 utility-functions to create and copy this type of data elements:
397 Odr_oid *odr_getoidbystr(ODR o, char *str);
401 Creates an OID based on a string-based representation using dots (.)
402 to separate elements in the OID.
406 Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
410 Creates a copy of the OID referenced by the <emphasis>o</emphasis>
412 Both functions take an &odr; stream as parameter. This stream is used to
413 allocate memory for the data elements, which is released on a
414 subsequent call to <function>odr_reset()</function> on that stream.
418 The OID module provides a higher-level representation of the
419 family of object identifers which describe the Z39.50 protocol and its
420 related objects. The definition of the module interface is given in
421 the <filename>oid.h</filename> file.
425 The interface is mainly based on the <literal>oident</literal> structure.
426 The definition of this structure looks like this:
430 typedef struct oident
435 int oidsuffix[OID_SIZE];
441 The proto field takes one of the values
450 If you don't care about talking to SR-based implementations (few
451 exist, and they may become fewer still if and when the ISO SR and ANSI
452 Z39.50 documents are merged into a single standard), you can ignore
453 this field on incoming packages, and always set it to PROTO_Z3950
454 for outgoing packages.
458 The oclass field takes one of the values
480 corresponding to the OID classes defined by the Z39.50 standard.
482 Finally, the value field takes one of the values
540 again, corresponding to the specific OIDs defined by the standard.
544 The desc field contains a brief, mnemonic name for the OID in question.
552 struct oident *oid_getentbyoid(int *o);
556 takes as argument an OID, and returns a pointer to a static area
557 containing an <literal>oident</literal> structure. You typically use
558 this function when you receive a PDU containing an OID, and you wish
559 to branch out depending on the specific OID value.
567 int *oid_ent_to_oid(struct oident *ent, int *dst);
571 Takes as argument an <literal>oident</literal> structure - in which
572 the <literal>proto</literal>, <literal>oclass</literal>/, and
573 <literal>value</literal> fields are assumed to be set correctly -
574 and returns a pointer to a the buffer as given by <literal>dst</literal>
576 representation of the corresponding OID. The function returns
577 NULL and the array dst is unchanged if a mapping couldn't place.
578 The array <literal>dst</literal> should be at least of size
579 <literal>OID_SIZE</literal>.
583 The <function>oid_ent_to_oid()</function> function can be used whenever
584 you need to prepare a PDU containing one or more OIDs. The separation of
585 the <literal>protocol</literal> element from the remainer of the
586 OID-description makes it simple to write applications that can
587 communicate with either Z39.50 or OSI SR-based applications.
595 oid_value oid_getvalbyname(const char *name);
599 takes as argument a mnemonic OID name, and returns the
600 <literal>/value</literal> field of the first entry in the database that
601 contains the given name in its <literal>desc</literal> field.
605 Finally, the module provides the following utility functions, whose
606 meaning should be obvious:
610 void oid_oidcpy(int *t, int *s);
611 void oid_oidcat(int *t, int *s);
612 int oid_oidcmp(int *o1, int *o2);
613 int oid_oidlen(int *o);
618 The OID module has been criticized - and perhaps rightly so
619 - for needlessly abstracting the
620 representation of OIDs. Other toolkits use a simple
621 string-representation of OIDs with good results. In practice, we have
622 found the interface comfortable and quick to work with, and it is a
623 simple matter (for what it's worth) to create applications compatible
624 with both ISO SR and Z39.50. Finally, the use of the
625 <literal>/oident</literal> database is by no means mandatory.
626 You can easily create your own system for representing OIDs, as long
627 as it is compatible with the low-level integer-array representation
634 <sect1><title>Nibble Memory</title>
637 Sometimes when you need to allocate and construct a large,
638 interconnected complex of structures, it can be a bit of a pain to
639 release the associated memory again. For the structures describing the
640 Z39.50 PDUs and related structures, it is convenient to use the
641 memory-management system of the &odr; subsystem (see
642 <link linkend="odr-use">Using ODR</link>). However, in some circumstances
643 where you might otherwise benefit from using a simple nibble memory
644 management system, it may be impractical to use
645 <function>odr_malloc()</function> and <function>odr_reset()</function>.
646 For this purpose, the memory manager which also supports the &odr;
647 streams is made available in the NMEM module. The external interface
648 to this module is given in the <filename>nmem.h</filename> file.
652 The following prototypes are given:
656 NMEM nmem_create(void);
657 void nmem_destroy(NMEM n);
658 void *nmem_malloc(NMEM n, int size);
659 void nmem_reset(NMEM n);
660 int nmem_total(NMEM n);
661 void nmem_init(void);
665 The <function>nmem_create()</function> function returns a pointer to a
666 memory control handle, which can be released again by
667 <function>nmem_destroy()</function> when no longer needed.
668 The function <function>nmem_malloc()</function> allocates a block of
669 memory of the requested size. A call to <function>nmem_reset()</function>
670 or <function>nmem_destroy()</function> will release all memory allocated
671 on the handle since it was created (or since the last call to
672 <function>nmem_reset()</function>. The function
673 <function>nmem_total()</function> returns the number of bytes currently
674 allocated on the handle.
679 The nibble memory pool is shared amonst threads. POSIX
680 mutex'es and WIN32 Critical sections are introduced to keep the
681 module thread safe. On WIN32 function <function>nmem_init()</function>
682 initialises the Critical Section handle and should be called once
683 before any other nmem function is used.
690 <!-- Keep this comment at the end of the file
695 sgml-minimize-attributes:nil
696 sgml-always-quote-attributes:t
699 sgml-parent-document: "yaz.xml"
700 sgml-local-catalogs: "../../docbook/docbook.cat"
701 sgml-namecase-general:t