1 <!-- $Id: tools.xml,v 1.11 2002-05-30 20:57:31 adam Exp $ -->
2 <chapter id="tools"><title>Supporting Tools</title>
5 In support of the service API - primarily the ASN module, which
6 provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains
7 a collection of tools that support the development of applications.
10 <sect1 id="tools.query"><title>Query Syntax Parsers</title>
13 Since the type-1 (RPN) query structure has no direct, useful string
14 representation, every origin application needs to provide some form of
15 mapping from a local query notation or representation to a
16 <token>Z_RPNQuery</token> structure. Some programmers will prefer to
17 construct the query manually, perhaps using
18 <function>odr_malloc()</function> to simplify memory management.
19 The &yaz; distribution includes two separate, query-generating tools
20 that may be of use to you.
23 <sect2><title id="PQF">Prefix Query Format</title>
26 Since RPN or reverse polish notation is really just a fancy way of
27 describing a suffix notation format (operator follows operands), it
28 would seem that the confusion is total when we now introduce a prefix
29 notation for RPN. The reason is one of simple laziness - it's somewhat
30 simpler to interpret a prefix format, and this utility was designed
31 for maximum simplicity, to provide a baseline representation for use
32 in simple test applications and scripting environments (like Tcl). The
33 demonstration client included with YAZ uses the PQF.
36 The PQF is defined by the pquery module in the YAZ library. The
37 <filename>pquery.h</filename> file provides the declaration of the
41 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
43 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
44 Odr_oid **attributeSetP, const char *qbuf);
46 int p_query_attset (const char *arg);
49 The function <function>p_query_rpn()</function> takes as arguments an
50 &odr; stream (see section <link linkend="odr">The ODR Module</link>)
51 to provide a memory source (the structure created is released on
52 the next call to <function>odr_reset()</function> on the stream), a
53 protocol identifier (one of the constants <token>PROTO_Z3950</token> and
54 <token>PROTO_SR</token>), an attribute set reference, and
55 finally a null-terminated string holding the query string.
58 If the parse went well, <function>p_query_rpn()</function> returns a
59 pointer to a <literal>Z_RPNQuery</literal> structure which can be
60 placed directly into a <literal>Z_SearchRequest</literal>.
64 The <literal>p_query_attset</literal> specifies which attribute set
65 to use if the query doesn't specify one by the
66 <literal>@attrset</literal> operator.
67 The <literal>p_query_attset</literal> returns 0 if the argument is a
68 valid attribute set specifier; otherwise the function returns -1.
72 The grammar of the PQF is as follows:
76 query ::= top-set query-struct.
78 top-set ::= [ '@attrset' string ]
80 query-struct ::= attr-spec | simple | complex
82 attr-spec ::= '@attr' [ string ] string query-struct
84 complex ::= operator query-struct query-struct.
86 operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
88 simple ::= result-set | term.
90 result-set ::= '@set' string.
94 proximity ::= exclusion distance ordered relation which-code unit-code.
96 exclusion ::= '1' | '0' | 'void'.
100 ordered ::= '1' | '0'.
102 relation ::= integer.
104 which-code ::= 'known' | 'private' | integer.
106 unit-code ::= integer.
110 You will note that the syntax above is a fairly faithful
111 representation of RPN, except for the Attribute, which has been
112 moved a step away from the term, allowing you to associate one or more
113 attributes with an entire query structure. The parser will
114 automatically apply the given attributes to each term as required.
118 The following are all examples of valid queries in the PQF.
126 @or "dylan" "zimmerman"
130 @or @and bob dylan @set Result-1
132 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
134 @attr 4=1 @attr 1=4 "self portrait"
136 @prox 0 3 1 2 k 2 dylan zimmerman
138 @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
142 <sect2><title id="CCL">Common Command Language</title>
145 Not all users enjoy typing in prefix query structures and numerical
146 attribute values, even in a minimalistic test client. In the library
147 world, the more intuitive Common Command Language (or ISO 8777) has
148 enjoyed some popularity - especially before the widespread
149 availability of graphical interfaces. It is still useful in
150 applications where you for some reason or other need to provide a
151 symbolic language for expressing boolean query structures.
155 The <ulink url="http://europagate.dtv.dk/">EUROPAGATE</ulink>
156 research project working under the Libraries programme
157 of the European Commission's DG XIII has, amongst other useful tools,
158 implemented a general-purpose CCL parser which produces an output
159 structure that can be trivially converted to the internal RPN
160 representation of &yaz; (The <literal>Z_RPNQuery</literal> structure).
161 Since the CCL utility - along with the rest of the software
162 produced by EUROPAGATE - is made freely available on a liberal
163 license, it is included as a supplement to &yaz;.
166 <sect3><title>CCL Syntax</title>
169 The CCL parser obeys the following grammar for the FIND argument.
170 The syntax is annotated by in the lines prefixed by
171 <literal>‐‐</literal>.
175 CCL-Find ::= CCL-Find Op Elements
178 Op ::= "and" | "or" | "not"
179 -- The above means that Elements are separated by boolean operators.
181 Elements ::= '(' CCL-Find ')'
184 | Qualifiers Relation Terms
185 | Qualifiers Relation '(' CCL-Find ')'
186 | Qualifiers '=' string '-' string
187 -- Elements is either a recursive definition, a result set reference, a
188 -- list of terms, qualifiers followed by terms, qualifiers followed
189 -- by a recursive definition or qualifiers in a range (lower - upper).
191 Set ::= 'set' = string
192 -- Reference to a result set
194 Terms ::= Terms Prox Term
196 -- Proximity of terms.
200 -- This basically means that a term may include a blank
202 Qualifiers ::= Qualifiers ',' string
204 -- Qualifiers is a list of strings separated by comma
206 Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
207 -- Relational operators. This really doesn't follow the ISO8777
211 -- Proximity operator
216 The following queries are all valid:
228 (dylan and bob) or set=1
232 Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
233 and <literal>date</literal> are defined we may use:
239 au=(bob dylan and slow train coming)
241 date>1980 and (ti=((self portrait)))
246 <sect3><title>CCL Qualifiers</title>
249 Qualifiers are used to direct the search to a particular searchable
250 index, such as title (ti) and author indexes (au). The CCL standard
251 itself doesn't specify a particular set of qualifiers, but it does
252 suggest a few short-hand notations. You can customize the CCL parser
253 to support a particular set of qualifiers to reflect the current target
254 profile. Traditionally, a qualifier would map to a particular
255 use-attribute within the BIB-1 attribute set. However, you could also
256 define qualifiers that would set, for example, the
261 Consider a scenario where the target support ranked searches in the
262 title-index. In this case, the user could specify
266 ti,ranked=knuth computer
269 and the <literal>ranked</literal> would map to relation=relevance
270 (2=102) and the <literal>ti</literal> would map to title (1=4).
274 A "profile" with a set predefined CCL qualifiers can be read from a
275 file. The YAZ client reads its CCL qualifiers from a file named
276 <filename>default.bib</filename>. Each line in the file has the form:
280 <replaceable>qualifier-name</replaceable>
281 <replaceable>type</replaceable>=<replaceable>val</replaceable>
282 <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
286 where <replaceable>qualifier-name</replaceable> is the name of the
287 qualifier to be used (eg. <literal>ti</literal>),
288 <replaceable>type</replaceable> is a BIB-1 category type and
289 <replaceable>val</replaceable> is the corresponding BIB-1 attribute
291 The <replaceable>type</replaceable> can be either numeric or it may be
292 either <literal>u</literal> (use), <literal>r</literal> (relation),
293 <literal>p</literal> (position), <literal>s</literal> (structure),
294 <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
295 The <replaceable>qualifier-name</replaceable> <literal>term</literal>
296 has a special meaning.
297 The types and values for this definition is used when
298 <emphasis>no</emphasis> qualifiers are present.
302 Consider the following definition:
311 Two qualifiers are defined, <literal>ti</literal> and
312 <literal>au</literal>.
313 They both set the structure-attribute to phrase (1).
314 <literal>ti</literal>
315 sets the use-attribute to 4. <literal>au</literal> sets the
317 When no qualifiers are used in the query the structure-attribute is
318 set to free-form-text (105).
322 <sect3><title>CCL API</title>
324 All public definitions can be found in the header file
325 <filename>ccl.h</filename>. A profile identifier is of type
326 <literal>CCL_bibset</literal>. A profile must be created with the call
327 to the function <function>ccl_qual_mk</function> which returns a profile
328 handle of type <literal>CCL_bibset</literal>.
332 To read a file containing qualifier definitions the function
333 <function>ccl_qual_file</function> may be convenient. This function
334 takes an already opened <literal>FILE</literal> handle pointer as
335 argument along with a <literal>CCL_bibset</literal> handle.
339 To parse a simple string with a FIND query use the function
342 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
343 int *error, int *pos);
346 which takes the CCL profile (<literal>bibset</literal>) and query
347 (<literal>str</literal>) as input. Upon successful completion the RPN
348 tree is returned. If an error occur, such as a syntax error, the integer
349 pointed to by <literal>error</literal> holds the error code and
350 <literal>pos</literal> holds the offset inside query string in which
355 An English representation of the error may be obtained by calling
356 the <literal>ccl_err_msg</literal> function. The error codes are
357 listed in <filename>ccl.h</filename>.
361 To convert the CCL RPN tree (type
362 <literal>struct ccl_rpn_node *</literal>)
363 to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
364 must be used. This function which is part of YAZ is implemented in
365 <filename>yaz-ccl.c</filename>.
366 After calling this function the CCL RPN tree is probably no longer
367 needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
371 A CCL profile may be destroyed by calling the
372 <function>ccl_qual_rm</function> function.
376 The token names for the CCL operators may be changed by setting the
377 globals (all type <literal>char *</literal>)
378 <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
379 <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
380 An operator may have aliases, i.e. there may be more than one name for
381 the operator. To do this, separate each alias with a space character.
386 <sect1 id="tools.oid"><title>Object Identifiers</title>
389 The basic YAZ representation of an OID is an array of integers,
390 terminated with the value -1. The &odr; module provides two
391 utility-functions to create and copy this type of data elements:
395 Odr_oid *odr_getoidbystr(ODR o, char *str);
399 Creates an OID based on a string-based representation using dots (.)
400 to separate elements in the OID.
404 Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
408 Creates a copy of the OID referenced by the <emphasis>o</emphasis>
410 Both functions take an &odr; stream as parameter. This stream is used to
411 allocate memory for the data elements, which is released on a
412 subsequent call to <function>odr_reset()</function> on that stream.
416 The OID module provides a higher-level representation of the
417 family of object identifiers which describe the Z39.50 protocol and its
418 related objects. The definition of the module interface is given in
419 the <filename>oid.h</filename> file.
423 The interface is mainly based on the <literal>oident</literal> structure.
424 The definition of this structure looks like this:
428 typedef struct oident
433 int oidsuffix[OID_SIZE];
439 The proto field takes one of the values
448 If you don't care about talking to SR-based implementations (few
449 exist, and they may become fewer still if and when the ISO SR and ANSI
450 Z39.50 documents are merged into a single standard), you can ignore
451 this field on incoming packages, and always set it to PROTO_Z3950
452 for outgoing packages.
456 The oclass field takes one of the values
478 corresponding to the OID classes defined by the Z39.50 standard.
480 Finally, the value field takes one of the values
538 again, corresponding to the specific OIDs defined by the standard.
542 The desc field contains a brief, mnemonic name for the OID in question.
550 struct oident *oid_getentbyoid(int *o);
554 takes as argument an OID, and returns a pointer to a static area
555 containing an <literal>oident</literal> structure. You typically use
556 this function when you receive a PDU containing an OID, and you wish
557 to branch out depending on the specific OID value.
565 int *oid_ent_to_oid(struct oident *ent, int *dst);
569 Takes as argument an <literal>oident</literal> structure - in which
570 the <literal>proto</literal>, <literal>oclass</literal>/, and
571 <literal>value</literal> fields are assumed to be set correctly -
572 and returns a pointer to a the buffer as given by <literal>dst</literal>
574 representation of the corresponding OID. The function returns
575 NULL and the array dst is unchanged if a mapping couldn't place.
576 The array <literal>dst</literal> should be at least of size
577 <literal>OID_SIZE</literal>.
581 The <function>oid_ent_to_oid()</function> function can be used whenever
582 you need to prepare a PDU containing one or more OIDs. The separation of
583 the <literal>protocol</literal> element from the remainder of the
584 OID-description makes it simple to write applications that can
585 communicate with either Z39.50 or OSI SR-based applications.
593 oid_value oid_getvalbyname(const char *name);
597 takes as argument a mnemonic OID name, and returns the
598 <literal>/value</literal> field of the first entry in the database that
599 contains the given name in its <literal>desc</literal> field.
603 Finally, the module provides the following utility functions, whose
604 meaning should be obvious:
608 void oid_oidcpy(int *t, int *s);
609 void oid_oidcat(int *t, int *s);
610 int oid_oidcmp(int *o1, int *o2);
611 int oid_oidlen(int *o);
616 The OID module has been criticized - and perhaps rightly so
617 - for needlessly abstracting the
618 representation of OIDs. Other toolkits use a simple
619 string-representation of OIDs with good results. In practice, we have
620 found the interface comfortable and quick to work with, and it is a
621 simple matter (for what it's worth) to create applications compatible
622 with both ISO SR and Z39.50. Finally, the use of the
623 <literal>/oident</literal> database is by no means mandatory.
624 You can easily create your own system for representing OIDs, as long
625 as it is compatible with the low-level integer-array representation
632 <sect1 id="tools.nmem"><title>Nibble Memory</title>
635 Sometimes when you need to allocate and construct a large,
636 interconnected complex of structures, it can be a bit of a pain to
637 release the associated memory again. For the structures describing the
638 Z39.50 PDUs and related structures, it is convenient to use the
639 memory-management system of the &odr; subsystem (see
640 <link linkend="odr-use">Using ODR</link>). However, in some circumstances
641 where you might otherwise benefit from using a simple nibble memory
642 management system, it may be impractical to use
643 <function>odr_malloc()</function> and <function>odr_reset()</function>.
644 For this purpose, the memory manager which also supports the &odr;
645 streams is made available in the NMEM module. The external interface
646 to this module is given in the <filename>nmem.h</filename> file.
650 The following prototypes are given:
654 NMEM nmem_create(void);
655 void nmem_destroy(NMEM n);
656 void *nmem_malloc(NMEM n, int size);
657 void nmem_reset(NMEM n);
658 int nmem_total(NMEM n);
659 void nmem_init(void);
660 void nmem_exit(void);
664 The <function>nmem_create()</function> function returns a pointer to a
665 memory control handle, which can be released again by
666 <function>nmem_destroy()</function> when no longer needed.
667 The function <function>nmem_malloc()</function> allocates a block of
668 memory of the requested size. A call to <function>nmem_reset()</function>
669 or <function>nmem_destroy()</function> will release all memory allocated
670 on the handle since it was created (or since the last call to
671 <function>nmem_reset()</function>. The function
672 <function>nmem_total()</function> returns the number of bytes currently
673 allocated on the handle.
677 The nibble memory pool is shared amongst threads. POSIX
678 mutex'es and WIN32 Critical sections are introduced to keep the
679 module thread safe. Function <function>nmem_init()</function>
680 initializes the nibble memory library and it is called automatically
681 the first time the <literal>YAZ.DLL</literal> is loaded. &yaz; uses
682 function <function>DllMain</function> to achieve this. You should
683 <emphasis>not</emphasis> call <function>nmem_init</function> or
684 <function>nmem_exit</function> unless you're absolute sure what
685 you're doing. Note that in previous &yaz; versions you'd have to call
686 <function>nmem_init</function> yourself.
692 <!-- Keep this comment at the end of the file
697 sgml-minimize-attributes:nil
698 sgml-always-quote-attributes:t
701 sgml-parent-document: "yaz.xml"
702 sgml-local-catalogs: nil
703 sgml-namecase-general:t