1 <!doctype linuxdoc system>
3 <title>Specifying and Using Application (Database) Profiles
4 <author>Index Data, <tt/info@index.ping.dk/
7 YAZ includes a subsystem to manage complex database records, driven
8 by a set of configuration tables that reflect a given profile.
9 Multiple database profiles can coexeist in the same server, or even
10 the same database. The record management system is responsible for
11 associating a given record with a specific profile, and processing it
12 accordingly. This document describes the various file formats for data
13 and configuration files which are understood by the module.
22 <item>The subsystem descibed herein is under development. Not
23 everything may work exactly as decribed, and details of the interface
24 may change as the module matures.
26 <item>The exact workings of the subsystem may depend on the
27 application in which it is used. This document focuses on the use of
28 the module in the <bf/ZServer/ system which is distributed by Index
29 Data as a companion to <bf/YAZ/.
35 The retrieval facilities of Z39.50 are extremely flexible and powerful.
36 They allow any level of structuring of database records. They allow
37 controlled re-use of attribute sets (for searching) and tag sets (for
38 retrieval) between application profiles; they allow precise selection
39 of the desired sub-elements of a database record; they allow different
40 variants of a given data element to be represented and selected in a
41 structured way; and finally they allow the exchange of any type and
42 amount of data to be represented in a single database record.
44 These powerful retrieval facilities are a recent addition to the
45 protocol, and along with the flexible searching facilities, they make
46 the protocol an extremely capable tool for precise, structured
47 access to information systems. The retrieval facilities add new
48 levels of flexibility and control to the protocol, which add to its
49 value outside of its traditional domain of the library systems world.
51 The new facilities, however, also add new complexity to the protocol,
52 which is already troubles by a too-steep learning curve. We have seen
53 many good projects severely hindered or even thwarted by the sheer
54 complexity of implementing the Z39.50 protocol.
56 At the same time, we feel that the most complex and powerful
57 facilities of the protocol (Explain, structured retrieval, etc.), are
58 also what the protocol needs to become more widespread, and to fulfill
59 what we perceive to be its most noble potential: To provide
60 everybody with standardised, well-structured access to the
61 information resources of the world.
63 The purpose of <bf/YAZ/, then, and of this module as well, is to
64 <it/simplify/ the use of the protocol for programmers and
65 administrators, by providing simple APIs and configuration systems to
66 access the functionality of the protocol. The <bf/Retrieval/ module
67 deals specifically with the advanced retrieval functions which were
68 added to the protocol with version 3, or Z39.50-1994.
72 <sect1>External Data (record) Representation
75 The <bf/Retrieval/ module will eventually support a wide range of
76 input formats, ranging from MARC data to USENET news archives. This
77 section introduces what we think of as the <it/canonical/ format - the
78 one that gives the most general access to the various elements of the
79 retrieval functionality.
81 The basic model presented by the Z39.50 retrieval system is that of a
82 recursively defined tree structure, containing a list of tagged elements,
83 which may in turn contain either data or more lists of tagged elements, and
86 We elect to represent this structuring externally by using an
87 &dquot;SGML-like&dquot; syntax. The <it/internal/ representation will
90 Consider a record describing an information resource (such a record is
91 sometimes known as a <it/locator record/). It might contain a field
92 describing the distributor of the information resource, which might in
93 turn be partitioned into various fields providing details about the
94 distributor, like this:
98 <Name> USGS/WRD &etago;Name>
99 <Organization> USGS/WRD &etago;Organization>
101 U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW
102 &etago;Street-Address>
103 <City> ALBUQUERQUE &etago;City>
104 <State> NM &etago;State>
105 <Zip-Code> 87102 &etago;Zip-Code>
106 <Country> USA &etago;Country>
107 <Telephone> (505) 766-5560 &etago;Telephone>
111 This is how data that the retrieval module reads from an input file
114 Depending on the database profile that is being used, it is likely
115 that the data won't look like this when it's transmitted from the
116 server to the client, however. Typically, the client will prefer to
117 receive the data in a more rigid syntax, such as USMARC or GRS-1. To
118 save transmissiontime and avoid ambiguities of language, the
119 individual tags or field names, above, might be translated into
120 numbers which are known by both the client and the server (by
121 referring to a tag set).
123 The retrieval module supports various types of conversions that might
124 be carried out by the server based on requests from the client. To do
125 this, it needs a set of configuration files to describe the
126 application profile that the given record adheres to.
128 <sect1>The Abstract Syntax
131 The abstract syntax definition (ARS) is the focal point of the
132 application profile description. For a given profile, it may state any
133 or all of the following:
136 <item>The object identifier of the database schema associated with the
137 profile, so that it can be referred to by the client.
139 <item>The attribute set (which can possibly be a compound of multiple
140 sets) which applies in the profile. This is used when indexing and
141 searching the records belonging to the given profile.
143 <item>The Tag set (again, this can consist of several different sets).
144 This is used when reading the records from a file, to recognize the
145 different tags, and when transmitting the record to the client -
146 mapping the tags to their numerical representation, if they are
149 <item>The variant set which is used in the profile. This provides a
150 vocabulary for specifying the <it/types/ of data that appear inside
153 <item>Element set names, which are a shorthand way for the client to
154 ask for a subset of the data elements contained in a record. Element
155 set names, in the retrieval module, are mapped to <it/element
156 specifications/, which contain information equivalent to the
157 <it/Espec-1/ syntax of Z39.50.
159 <item>Map tables, which may specify mappings to <it/other/ database
160 profiles, if desired.
162 <item>Possibly, a set of rules describing the mapping of elements to a
165 <item>A list of element description (this is the actual ARS of the
166 profile), which lists the ways in which the various tags can be used
167 and organized hierarchically.
170 Several of the entries above simply refer to other files, which describe the
173 <sect>The Configuration Files
176 This section describes the syntax and use of the various tables which
177 are used by the retrieval module.
179 The number of different file types may appear daunting at first, but
180 each type corresponds fairly clearly to a single aspect of the Z39.50
181 retrieval facilities. Further, the average database administrator
182 who's simply reusing an existing profile for which tables already
183 exist, shouldn't have to worry too much about these tables.
185 <sect1>The Abstract Syntax (.abs) Files
188 The name of this file type is slightly misleading, since, apart from
189 the actual abstract syntax of the profile, it also includes most of
190 the other definitions that go into a database profile.
192 When a record in the canonical, SGML-like format is read from a file
193 or from the database, the first tag of the file should reference the
194 profile that governs the layout of the record. If the first tag of the
195 record is <tt><gils></tt>, the system will look for the profile
196 definition in the file <tt/gils.abs/. Profile definitions are cached,
197 so they only have to be read once during the lifespan of the current
200 The file may contain the following directives:
203 <tag>name <it/symbolic-name/</tag> This provides a shorthand name or
204 description for the profile. Mostly useful for diagnostic purposes.
206 <tag>reference <it/OID-name/</tag> The reference name of the OID for
207 the profile. The reference names can be found in the <bf/util/
210 <tag>attset <it/filename/</tag> The attribute set that is used for
211 indexing and searching records belonging to this profile.
213 <tag>tagset <it/filename/</tag> The tag set (if any) that describe
214 that fields of the records.
216 <tag>varset <it/filename/</tag> The variant set used in the profile.
218 <tag>maptab <it/filename/</tag> (repeatable) This points to a
219 conversion table that might be used if the client asks for the record
220 in a different schema from the native one.
222 <tag>marc <it/filename/</tag> Points to a file containing parameters
223 for representing the record contents in the ISO2709 syntax. Read the
224 description of the MARC representation facility below.
226 <tag>esetname <it/name filename/</tag> (repeatable) Associates the
227 given element set name with an element selection file. If an (@) is
228 given in place of the filename, this corresponds to a null mapping for
229 the given element set name.
231 <tag>elm <it/path name attribute/</tag> (repeatable) Adds an element
232 to the abstract record syntax of the schema. The <it/path/ follows the
233 syntax which is suggested by the Z39.50 document - that is, a sequence
234 of tags separated by slashes (/). Each tag is given as a
235 comma-separated pair of tag type and -value surrounded by parenthesis.
236 The <it/name/ is the name of the element, and the <it/attribute/
237 specifies what attribute to use when indexing the element. A ! in
238 place of the attribute name is equivalent to specifying an attribute
239 name identical to the element name. A - in place of the attribute name
240 specifies that no indexing is to take place for the given element.
244 NOTE: The mechanism for controlling indexing is inadequate for
245 complex databases, and will probably be moved into a separate
246 configuration table eventually.
249 The following is an excerpt from the abstract syntax file for the GILS
254 reference GILS-schema
259 maptab gils-usmarc.map
263 esetname VARIANT gils-variant.est # for WAIS-compliance
264 esetname B gils-b.est
265 esetname G gils-g.est
266 esetname W gils-b.est
271 elm (1,14) localControlNumber Local-number
272 elm (1,16) dateOfLastModification Date/time-last-modified
274 elm (4,1) controlIdentifier Identifier-standard
275 elm (2,6) abstract Abstract
277 elm (4,52) originator -
278 elm (4,53) accessConstraints !
279 elm (4,54) useConstraints !
280 elm (4,70) availability -
281 elm (4,70)/(4,90) distributor -
282 elm (4,70)/(4,90)/(2,7) distributorName !
283 elm (4,70)/(4,90)/(2,10 distributorOrganization !
284 elm (4,70)/(4,90)/(4,2) distributorStreetAddress !
285 elm (4,70)/(4,90)/(4,3) distributorCity !
288 <sect1>The Attribute Set (.att) Files
291 This file type describes the <bf/Use/ elements of an attribute set.
292 It contains the following directives.
296 <tag>name <it/symbolic-name/</tag> This provides a shorthand name or
297 description for the attribute set. Mostly useful for diagnostic purposes.
299 <tag>reference <it/OID-name/</tag> The reference name of the OID for
300 the attribute set. The reference names can be found in the <bf/util/
303 <tag>ordinal <it/integer/</tag> This value will be used to represent the
304 attribute set in the index. Care should be taken that each attribute
305 set has a unique ordinal value.
307 <tag>include <it/filename/</tag> This directive, which can be
308 repeated, is used to include another attribute set as a part of the
309 current one. This is used when a new attribute set is defined as an
310 extension to another set. For instance, many new attribute sets are
311 defined as extensions to the <bf/bib-1/ set. This is an important
312 feature of the retrieval system of Z39.50, as it ensures the highest
313 possible level of interoperability, as those access points of your
314 database which are derived from the external set (say, bib-1) can be used
315 even by clients who are unaware of the new set.
317 <tag>att <it/att-value att-name [local-value]/</tag> This
319 introduces a new attribute to the set. The attribute value is stored
320 in the index (unless a <it/local-value/ is given, in which case this
321 is stored). The name is used to refer to the attribute from the
322 <it/abstract syntax/.
325 This is an excerpt from the GILS attribute set definition. Notice how
326 the file describing the <it/bib-1/ attribute set is referenced.
330 reference GILS-attset
334 att 2001 distributorName
335 att 2002 indexTermsControlled
337 att 2004 accessConstraints
338 att 2005 useConstraints
341 <sect1>The Tag Set (.tag) Files
344 This file type defines the tagset of the profile, possibly by
345 referencing other tag sets (most tag sets, for instance, will include
346 tagsetG and tagsetM from the Z39.50 specification. The file may
347 contain the following directives.
350 <tag>name <it/symbolic-name/</tag> This provides a shorthand name or
351 description for the tag set. Mostly useful for diagnostic purposes.
353 <tag>reference <it/OID-name/</tag> The reference name of the OID for
354 the tag set. The reference names can be found in the <bf/util/
357 <tag>type <it/integer/</tag> The type number of the tag within the schema
360 <tag>include <it/filename/</tag> (repeatable) This directive is used
361 to include the definitions of other tag sets into the current one.
363 <tag>tag <it/number names type/</tag> (repeatable) Introduces a new
364 tag to the set. The <it/number/ is the tag number as used in the protocol
365 (there is currently no mechanism for specifying string tags at this
366 point, but this would be quick work to add). The <it/names/ parameter
367 is a list of names by which the tag should be recognized in the input
368 file format. The names should be separated by slashes (/). The
369 <it/type/ is th recommended datatype of the tag. It should be one of
377 <item>generalizedtime
385 The following is an excerpt from the TagsetG definition file.
394 tag 3 publicationPlace string
395 tag 4 publicationDate string
396 tag 5 documentId string
397 tag 6 abstract string
399 tag 8 date generalizedtime
400 tag 9 bodyOfDisplay string
401 tag 10 organization string
404 <sect1>The Variant Set (.var) Files
407 The variant set file is a straightforward representation of the
408 variant set definitions associated with the protocol. At present, only
409 the <it/Variant-1/ set is known.
411 These are the directives allowed in the file.
414 <tag>name <it/symbolic-name/</tag> This provides a shorthand name or
415 description for the variant set. Mostly useful for diagnostic purposes.
417 <tag>reference <it/OID-name/</tag> The reference name of the OID for
418 the variant set, if one is required. The reference names can be found
419 in the <bf/util/ module of <bf/YAZ/.
421 <tag>class <it/integer class-name/</tag> (repeatable) Introduces a new
422 class to the variant set.
424 <tag>type <it/integer type-name datatype/</tag> (repeatable) Addes a
425 new type to the current class (the one introduced by the most recent
426 <bf/class/ directive. The type names belong to the same name space as
427 the one used in the tag set definition file.
430 The following is an excerpt from the file describing the variant set
439 type 1 variantId octetstring
448 <sect1>The Element Set (.est) Files
451 The element set specification files describe a selection of a subset
452 of the elements of a database record. The element selection mechanism
453 is equivalent to the one supplied by the <it/Espec-1/ syntax of the
454 Z39.50 specification. In fact, the internal representation of an
455 element set specification is identical to the <it/Espec-1/ structure,
456 and we'll refer you to the description of that structure for most of
457 the detailed semantics of the directives below.
460 NOTE: Not all of the Espec-1 functionality has been implemented yet.
461 The fields that are mentioned below all work as expected, unless
465 The directives available in the element set file are as follows:
468 <tag>defaultVariantSetId <it/OID-name/</tag> If variants are used in
469 the following, this should provide the name of the variantset used
470 (it's not currently possible to specify a different set in the
471 individual variant request). In almost all cases (certainly all
472 profiles known to us), the name <tt/Variant-1/ should be given here.
474 <tag>defaultVariantRequest <it/variant-request/</tag> This directive
475 provides a default variant request for
476 use when the individual element requests (see below) do not contain a
477 variant request. Variant requests consist of a blank-separated list of
478 variant components. A variant compont is a comma-separated,
479 parenthesized triple of variant class, type, and value (the two former
480 values being represented as integers). The value can currently only be
481 entered as a string (this will change to depend on the definition of
482 the variant in question). The special value (@) is interpreted as a
485 <tag>simpleElement <it/path ['variant' variant-request]/</tag>
486 This corresponds to a simple element request in <it/Espec-1/. The
487 path consists of a sequence of tag-selectors, where each of these can
491 <item>A simple tag, consisting of a comma-separated type-value pair in
492 parenthesis, possibly followed by a colon (:) followed by an
493 occurrences-specification (see below). The tag-value can be a number
494 or a string. If the first character is an apostrophe ('), this forces
495 the value to be interpreted as a string, even if it appears to be numerical.
496 <item>A WildThing, represented as a question mark (?), possibly
497 followed by a colon (:) followed by an occurrences specification (see
499 <item>A WildPath, represented as an asterisk (*). Note that the last
500 element of the path should not be a wildPath.
503 The occurrences-specification can be either the string <tt/all/, the
504 string <tt/last/, or an explicit value-range. The value-range is
505 represented as an integer (the starting point), possibly followed by a
506 plus (+) and a second integer (the number of elements, default being
509 The variant-request has the same syntax as the defaultVariantRequest
510 above. Note that it may sometimes be useful to give an empty variant
511 request, simlply to disable the default for a specific set of fields
512 (we aren't certain if this is proper <it/Espec-1/, but it works in
513 this implementation).
516 The following is an example of an element specification belonging to
528 <sect1>The Schema Mapping (.map) Files
531 Sometimes, the client might want wish to receive a database record in
532 a schema that differs from the native schema of the record. For
533 instance, a client might only know how to process WAIS records, while
534 the database record is represented in a more specific schema, such as
535 GILS. In this module, a mapping of data to one of the MARC formats is
536 also thought of as a schema mapping (mapping the elements of the
537 record into fields consistent with the given MARC specification, prior
538 to actually converting the data to the ISO2709). This use of the
539 object identifier for USMARC as a schema identifier represents an
540 overloading of the OID which might not be entirely proper. However,
541 it represents the dual role of schema identifier/record syntax which
542 is assumed by the MARC family in Z39.50.
545 NOTE: The schema-mapping functions are so far limited to a
546 straightforward mapping of elements. This should be extended with
547 mechanisms for conversions of the element contents, and conditional
548 mappings of elements based on the record contents.
551 These are the directives of the schema mapping file format:
554 <tag>targetName <it/name/</tag> A symbolic name for the target schema
555 of the table. Useful mostly for diagnostic purposes.
557 <tag>targetRef <it/OID-name/</tag> An OID name for the target schema.
558 This is used, for instance, by a server receiving a request to present
559 a record in a different schema from the native one. The name, again,
560 is found in the <bf/oid/ module of <bf/YAZ/.
562 <tag>map <it/element-name target-path/</tag> (repeatable) Adds
563 an element mapping rule to the table.
566 <sect1>The MARC (ISO2709) Representation (.mar) Files
569 This file provides rules for representing a record in the ISO2709
570 format. The rules pertain mostly to the values of the constant-length
571 header of the record.
573 <it>NOTE: This will be described better.</it>
578 Copyright © 1995, Index Data.
580 Permission to use, copy, modify, distribute, and sell this software and
581 its documentation, in whole or in part, for any purpose, is hereby granted,
584 1. This copyright and permission notice appear in all copies of the
585 software and its documentation. Notices of copyright or attribution
586 which appear at the beginning of any file must remain unchanged.
588 2. The names of Index Data or the individual authors may not be used to
589 endorse or promote products derived from this software without specific
590 prior written permission.
592 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND,
593 EXPRESS, IMPLIED, OR OTHERWISE, INCLUDING WITHOUT LIMITATION, ANY
594 WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
595 IN NO EVENT SHALL INDEX DATA BE LIABLE FOR ANY SPECIAL, INCIDENTAL,
596 INDIRECT OR CONSEQUENTIAL DAMAGES OF ANY KIND, OR ANY DAMAGES
597 WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER OR
598 NOT ADVISED OF THE POSSIBILITY OF DAMAGE, AND ON ANY THEORY OF
599 LIABILITY, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
602 <sect>About Index Data
605 Index Data is a consulting and software-development enterprise that
606 specialises in library and information management systems. Our
607 interests and expertise span a broad range of related fields, and one
608 of our primary, long-term objectives is the development of a powerful
609 information management
610 system with open network interfaces and hypermedia capabilities.
612 We make this software available free of charge, on a fairly unrestrictive
613 license; as a service to the networking community, and to further the
614 development of quality software for open network communication.
616 We'll be happy to answer questions about the software, and about ourselves
622 DK-2200 København N&nl
629 Email: info@index.ping.dk