1 <?xml version="1.0 encoding="iso-8859-1" standalone="no" ?>
2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
5 <!-- $Id: marc_indexing.xml,v 1.2 2003-08-21 10:29:18 adam Exp $ -->
7 <book id="marc_indexing">
9 <title>Indexing of MARC records by Zebra</title>
11 <simpara>Zebra is suitable for distribution of MARC records via Z39.50. We
12 have a several possibilities to describe the indexing process of MARC records.
13 This document shows these possibilities.
19 <title>Simple indexing of MARC records</title>
20 <para>Simple indexing is not described yet.</para>
23 <chapter id="extended">
24 <title>Extended indexing of MARC records</title>
26 <para>Extended indexing of MARC records will help you if you need index a
27 combination of subfields, or index only a part of the whole field,
28 or use during indexing process embedded fields of MARC record.
31 <para>Extended indexing of MARC records additionally allows:
35 <para>to index data in LEADER of MARC record</para>
39 <para>to index data in control fields (with fixed length)</para>
43 <para>to use during indexing the values of indicators</para>
47 <para>to index linked fields for UNIMARC based formats</para>
53 <note><para>In compare with simple indexing process the extended indexing
54 may increase (about 2-3 times) the time of indexing process for MARC
55 records.</para></note>
58 <title>The index-formula</title>
60 <para>At the beginning, we have to define the term <emphasis>index-formula</emphasis>
61 for MARC records. This term helps to understand the notation of extended indexing of MARC records
62 by Zebra. Our definition is based on the document <ulink url="http://www.rba.ru/rusmarc/soft/Z39-50.htm">"The
63 table of conformity for Z39.50 use attributes and RUSMARC fields"</ulink>.
64 The document is available only in russian language.</para>
66 <para>The <emphasis>index-formula</emphasis> is the combination of subfields presented in such way:</para>
69 71-00$a, $g, $h ($c){.$b ($c)} , (1)
72 <para>We know that Zebra supports a Bib-1 attribute - right truncation.
73 In this case, the <emphasis>index-formula</emphasis> (1) consists from
74 forms, defined in the same way as (1)</para>
82 <note><para>The original MARC record may be without some elements, which included in <emphasis>index-formula</emphasis>.</para>
85 <para>This notation includes such operands as:
90 <listitem><para>It means whitespace character.</para></listitem>
95 <listitem><para>The position may contain any value, defined by MARC format.
96 For example, <emphasis>index-formula</emphasis></para>
102 <para>includes</para>
115 <listitem><para>The repeatable elements are defined in figure-brackets {}. For example,
116 <emphasis>index-formula</emphasis></para>
120 71-00$a, $g, $h ($c){.$b ($c)} , (3)
123 <para>includes</para>
126 71-00$a, $g, $h ($c). $b ($c)
127 71-00$a, $g, $h ($c). $b ($c). $b ($c)
128 71-00$a, $g, $h ($c). $b ($c). $b ($c). $b ($c)
135 <note><para>All another operands are the same as accepted in MARC world.</para>
140 <sect1 id="notation">
141 <title>Notation of <emphasis>index-formula</emphasis> for Zebra</title>
144 <para>Extended indexing overloads <literal>path</literal> of
145 <literal>elm</literal> definition in abstract syntax file of Zebra
146 (<literal>.abs</literal> file). It means that names beginning with
147 <literal>"mc-"</literal> are interpreted by Zebra as
148 <emphasis>index-formula</emphasis>. The database index is created and
149 linked with <emphasis>access point</emphasis> (Bib-1 use attribute)
150 according to this formula.</para>
152 <para>For example, <emphasis>index-formula</emphasis></para>
155 71-00$a, $g, $h ($c){.$b ($c)} , (4)
158 <para>in <literal>.abs</literal> file looks like:</para>
161 mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)}
165 <para>The notation of <emphasis>index-formula</emphasis> uses the operands:
170 <listitem><para>It means whitespace character.</para></listitem>
175 <listitem><para>The position may contain any value, defined by MARC format. For example,
176 <emphasis>index-formula</emphasis></para>
182 <para>matches <literal>mc-70._1_$a,_$g_</literal> and includes</para>
194 <listitem><para>The repeatable elements are defined in figure-brackets {}. For example,
195 <emphasis>index-formula</emphasis></para>
198 71#00$a, $g, $h ($c) {.$b ($c)} , (6)
201 <para>matches <literal>mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)}</literal> and
205 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_)
206 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_).$b_(_$c_)
207 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_).$b_(_$c_).$b_(_$c_)
214 <term><...></term>
215 <listitem><para>Embedded <emphasis>index-formula</emphasis> (for linked fields) is between <>. For example,
216 <emphasis>index-formula</emphasis></para>
219 4--#-$170-#1$a, $g ($c) , (7)
222 <para>matches <literal>mc-4.._._$1<70._1_$a,_$g_(_$c_)>_</literal> and
226 463_._$1<70._1_$a,_$g_(_$c_)>_
235 <para>All another operands are the same as accepted in MARC world.</para>
239 <title>Examples</title>
246 <para>indexing LEADER</para>
248 <para>You need to use keyword "ldr" to index leader. For example, indexing data from 6th
249 and 7th position of LEADER</para>
252 elm mc-ldr[6] Record-type !
253 elm mc-ldr[7] Bib-level !
260 <para>indexing data from control fields</para>
262 <para>indexing date (the time added to database)</para>
265 elm mc-008[0-5] Date/time-added-to-db !
268 <para>or for RUSMARC (this data included in 100th field)</para>
271 elm mc-100___$a[0-7]_ Date/time-added-to-db !
278 <para>using indicators while indexing</para>
280 <para>For RUSMARC <emphasis>index-formula</emphasis>
281 <literal>70-#1$a, $g</literal> matches</para>
284 elm 70._1_$a,_$g_ Author !:w,!:p
287 <para>When Zebra finds a field according to <literal>"70."</literal> pattern it checks
288 the indicators. In this case the value of first indicator doesn't mater, but
289 the value of second one must be whitespace, in another case a field is not
296 <para>indexing embedded (linked) fields for UNIMARC based formats</para>
298 <para>For RUSMARC <emphasis>index-formula</emphasis>
299 <literal>4--#-$170-#1$a, $g ($c)</literal> matches</para>
302 elm mc-4.._._$1<70._1_$a,_$g_(_$c_)>_ Author !:w,!:p
305 <para>Data are extracted from record if the field matches to
306 <literal>"4.._."</literal> pattern and data in linked field match to embedded
307 <emphasis>index-formula</emphasis> <literal>70._1_$a,_$g_(_$c_)</literal>.</para>