doc/querymodel.xml

   1  <chapter id="querymodel">
   2   <title>Query Model</title>
   3
   4   <section id="querymodel-overview">
   5    <title>Query Model Overview</title>
   6
   7    <section id="querymodel-query-languages">
   8     <title>Query Languages</title>
   9
  10     <para>
  11      &zebra; is born as a networking Information Retrieval engine adhering
  12      to the international standards
  13      <ulink url="&url.z39.50;">&acro.z3950;</ulink> and
  14      <ulink url="&url.sru;">&acro.sru;</ulink>,
  15      and implement the
  16      type-1 Reverse Polish Notation (&acro.rpn;) query
  17      model defined there.
  18      Unfortunately, this model has only defined a binary
  19      encoded representation, which is used as transport packaging in
  20      the &acro.z3950; protocol layer. This representation is not human
  21      readable, nor defines any convenient way to specify queries.
  22     </para>
  23     <para>
  24      Since the type-1 (&acro.rpn;)
  25      query structure has no direct, useful string
  26      representation, every client application needs to provide some
  27      form of mapping from a local query notation or representation to it.
  28     </para>
  29
  30
  31     <section id="querymodel-query-languages-pqf">
  32      <title>Prefix Query Format (&acro.pqf;)</title>
  33      <para>
  34       Index Data has defined a textual representation in the
  35       <ulink url="&url.yaz.pqf;">Prefix Query Format</ulink>, short
  36       <emphasis>&acro.pqf;</emphasis>, which maps
  37       one-to-one to binary encoded
  38       <emphasis>type-1 &acro.rpn;</emphasis> queries.
  39       &acro.pqf; has been adopted by other
  40       parties developing &acro.z3950; software, and is often referred to as
  41       <emphasis>Prefix Query Notation</emphasis>, or in short
  42       &acro.pqn;. See
  43       <xref linkend="querymodel-rpn"/> for further explanations and
  44       descriptions of &zebra;'s capabilities.
  45      </para>
  46     </section>
  47
  48     <section id="querymodel-query-languages-cql">
  49      <title>Common Query Language (&acro.cql;)</title>
  50      <para>
  51       The query model of the type-1 &acro.rpn;,
  52       expressed in &acro.pqf;/&acro.pqn; is natively supported.
  53       On the other hand, the default &acro.sru;
  54       web services <emphasis>Common Query Language</emphasis>
  55       <ulink url="&url.cql;">&acro.cql;</ulink> is not natively supported.
  56      </para>
  57      <para>
  58       &zebra; can be configured to understand and map &acro.cql; to &acro.pqf;. See
  59       <xref linkend="querymodel-cql-to-pqf"/>.
  60      </para>
  61     </section>
  62
  63    </section>
  64
  65    <section id="querymodel-operation-types">
  66     <title>Operation types</title>
  67     <para>
  68      &zebra; supports all of the three different
  69      &acro.z3950;/&acro.sru; operations defined in the
  70      standards: explain, search,
  71      and scan. A short description of the
  72      functionality and purpose of each is quite in order here.
  73     </para>
  74
  75     <section id="querymodel-operation-type-explain">
  76      <title>Explain Operation</title>
  77      <para>
  78       The <emphasis>syntax</emphasis> of &acro.z3950;/&acro.sru; queries is
  79       well known to any client, but the specific
  80       <emphasis>semantics</emphasis> - taking into account a
  81       particular servers functionalities and abilities - must be
  82       discovered from case to case. Enters the
  83       explain operation, which provides the means for learning which
  84       <emphasis>fields</emphasis> (also called
  85       <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>)
  86       are provided, which default parameter the server uses, which
  87       retrieve document formats are defined, and which specific parts
  88       of the general query model are supported.
  89      </para>
  90      <para>
  91       The &acro.z3950; embeds the explain operation
  92       by performing a
  93       search in the magic
  94       <literal>IR-Explain-1</literal> database;
  95       see <xref linkend="querymodel-exp1"/>.
  96      </para>
  97      <para>
  98       In &acro.sru;, explain is an entirely  separate
  99       operation, which returns an ZeeRex &acro.xml; record according to the
 100       structure defined by the protocol.
 101      </para>
 102      <para>
 103       In both cases, the information gathered through
 104       explain operations can be used to
 105       auto-configure a client user interface to the servers
 106       capabilities.
 107      </para>
 108     </section>
 109
 110     <section id="querymodel-operation-type-search">
 111      <title>Search Operation</title>
 112      <para>
 113       Search and retrieve interactions are the raison d'être.
 114       They are used to query the remote database and
 115       return search result documents.  Search queries span from
 116       simple free text searches to nested complex boolean queries,
 117       targeting specific indexes, and possibly enhanced with many
 118       query semantic specifications. Search interactions are the heart
 119       and soul of &acro.z3950;/&acro.sru; servers.
 120      </para>
 121     </section>
 122
 123     <section id="querymodel-operation-type-scan">
 124      <title>Scan Operation</title>
 125      <para>
 126       The scan operation is a helper functionality,
 127       which operates on one index or access point a time.
 128      </para>
 129      <para>
 130       It provides
 131       the means to investigate the content of specific indexes.
 132       Scanning an index returns a handful of terms actually found in
 133       the indexes, and in addition the scan
 134       operation returns the number of documents indexed by each term.
 135       A search client can use this information to propose proper
 136       spelling of search terms, to auto-fill search boxes, or to
 137       display  controlled vocabularies.
 138      </para>
 139     </section>
 140
 141    </section>
 142
 143   </section>
 144
 145   <section id="querymodel-rpn">
 146    <title>&acro.rpn; queries and semantics</title>
 147    <para>
 148     The <ulink url="&url.yaz.pqf;">&acro.pqf; grammar</ulink>
 149     is documented in the &yaz; manual, and shall not be
 150     repeated here. This textual &acro.pqf; representation
 151     is not transmitted to &zebra; during search, but it is in the
 152     client mapped to the equivalent &acro.z3950; binary
 153     query parse tree.
 154    </para>
 155
 156    <section id="querymodel-rpn-tree">
 157     <title>&acro.rpn; tree structure</title>
 158     <para>
 159      The &acro.rpn; parse tree - or the equivalent textual representation in &acro.pqf; -
 160      may start with one specification of the
 161      <emphasis>attribute set</emphasis> used. Following is a query
 162      tree, which
 163      consists of <emphasis>atomic query parts (&acro.apt;)</emphasis> or
 164      <emphasis>named result sets</emphasis>, eventually
 165      paired by <emphasis>boolean binary operators</emphasis>, and
 166      finally  <emphasis>recursively combined </emphasis> into
 167      complex query trees.
 168     </para>
 169
 170     <section id="querymodel-attribute-sets">
 171      <title>Attribute sets</title>
 172      <para>
 173       Attribute sets define the exact meaning and semantics of queries
 174       issued. &zebra; comes with some predefined attribute set
 175       definitions, others can easily be defined and added to the
 176       configuration.
 177      </para>
 178
 179      <table id="querymodel-attribute-sets-table" frame="top">
 180       <title>Attribute sets predefined in &zebra;</title>
 181       <tgroup cols="4">
 182        <thead>
 183         <row>
 184          <entry>Attribute set</entry>
 185          <entry>&acro.pqf; notation (Short hand)</entry>
 186          <entry>Status</entry>
 187          <entry>Notes</entry>
 188         </row>
 189        </thead>
 190
 191        <tbody>
 192         <row>
 193          <entry>Explain</entry>
 194          <entry><literal>exp-1</literal></entry>
 195          <entry>Special attribute set used on the special automagic
 196           <literal>IR-Explain-1</literal> database to gain information on
 197           server capabilities, database names, and database
 198           and semantics.</entry>
 199          <entry>predefined</entry>
 200         </row>
 201         <row>
 202          <entry>&acro.bib1;</entry>
 203          <entry><literal>bib-1</literal></entry>
 204          <entry>Standard &acro.pqf; query language attribute set which defines the
 205           semantics of &acro.z3950; searching. In addition, all of the
 206           non-use attributes (types 2-14) define the hard-wired
 207           &zebra; internal query
 208           processing.</entry>
 209          <entry>default</entry>
 210         </row>
 211         <row>
 212          <entry>GILS</entry>
 213          <entry><literal>gils</literal></entry>
 214          <entry>Extension to the &acro.bib1; attribute set.</entry>
 215          <entry>predefined</entry>
 216         </row>
 217        </tbody>
 218       </tgroup>
 219      </table>
 220
 221      <para>
 222       The use attributes (type 1) mappings  the
 223       predefined attribute sets are found in the
 224       attribute set configuration files <filename>tab/*.att</filename>.
 225      </para>
 226
 227      <note>
 228       <para>
 229        The &zebra; internal query processing is modeled after
 230        the &acro.bib1; attribute set, and the non-use
 231        attributes type 2-6 are hard-wired in. It is therefore essential
 232        to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
 233       </para>
 234      </note>
 235
 236     </section>
 237
 238     <section id="querymodel-boolean-operators">
 239      <title>Boolean operators</title>
 240      <para>
 241       A pair of sub query trees, or of atomic queries, is combined
 242       using the standard boolean operators into new query trees.
 243       Thus, boolean operators are always internal nodes in the query tree.
 244      </para>
 245
 246      <table id="querymodel-boolean-operators-table" frame="top">
 247       <title>Boolean operators</title>
 248       <tgroup cols="3">
 249        <thead>
 250         <row>
 251          <entry>Keyword</entry>
 252          <entry>Operator</entry>
 253          <entry>Description</entry>
 254         </row>
 255        </thead>
 256        <tbody>
 257         <row><entry><literal>@and</literal></entry>
 258          <entry>binary AND operator</entry>
 259          <entry>Set intersection of two atomic queries hit sets</entry>
 260         </row>
 261         <row><entry><literal>@or</literal></entry>
 262          <entry>binary OR operator</entry>
 263          <entry>Set union of two atomic queries hit sets</entry>
 264         </row>
 265         <row><entry><literal>@not</literal></entry>
 266          <entry>binary AND NOT operator</entry>
 267          <entry>Set complement of two atomic queries hit sets</entry>
 268         </row>
 269         <row><entry><literal>@prox</literal></entry>
 270          <entry>binary PROXIMITY operator</entry>
 271          <entry>Set intersection of two atomic queries hit sets. In
 272           addition, the intersection set is purged for all
 273           documents which do not satisfy the requested query
 274           term proximity. Usually a proper subset of the AND
 275           operation.</entry>
 276         </row>
 277        </tbody>
 278       </tgroup>
 279      </table>
 280
 281      <para>
 282       For example, we can combine the terms
 283       <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
 284       into different searches in the default index of the default
 285       attribute set as follows.
 286       Querying for the union of all documents containing the
 287       terms <emphasis>information</emphasis> OR
 288       <emphasis>retrieval</emphasis>:
 289       <screen>
 290        Z> find @or information retrieval
 291       </screen>
 292      </para>
 293      <para>
 294       Querying for the intersection of all documents containing the
 295       terms <emphasis>information</emphasis> AND
 296       <emphasis>retrieval</emphasis>:
 297       The hit set is a subset of the corresponding
 298       OR query.
 299       <screen>
 300        Z> find @and information retrieval
 301       </screen>
 302      </para>
 303      <para>
 304       Querying for the intersection of all documents containing the
 305       terms <emphasis>information</emphasis> AND
 306       <emphasis>retrieval</emphasis>, taking proximity into account:
 307       The hit set is a subset of the corresponding
 308       AND query
 309       (see the <ulink url="&url.yaz.pqf;">&acro.pqf; grammar</ulink> for
 310       details on the proximity operator):
 311       <screen>
 312        Z> find @prox 0 3 0 2 k 2 information retrieval
 313       </screen>
 314      </para>
 315      <para>
 316       Querying for the intersection of all documents containing the
 317       terms <emphasis>information</emphasis> AND
 318       <emphasis>retrieval</emphasis>, in the same order and near each
 319       other as described in the term list.
 320       The hit set is a subset of the corresponding
 321       PROXIMITY query.
 322       <screen>
 323        Z> find "information retrieval"
 324       </screen>
 325      </para>
 326     </section>
 327
 328
 329     <section id="querymodel-atomic-queries">
 330      <title>Atomic queries (&acro.apt;)</title>
 331      <para>
 332       Atomic queries are the query parts which work on one access point
 333       only. These consist of <emphasis>an attribute list</emphasis>
 334       followed by a <emphasis>single term</emphasis> or a
 335       <emphasis>quoted term list</emphasis>, and are often called
 336       <emphasis>Attributes-Plus-Terms (&acro.apt;)</emphasis> queries.
 337      </para>
 338      <para>
 339       Atomic (&acro.apt;) queries are always leaf nodes in the &acro.pqf; query tree.
 340       UN-supplied non-use attributes types 2-12 are either inherited from
 341       higher nodes in the query tree, or are set to &zebra;'s default values.
 342       See <xref linkend="querymodel-bib1"/> for details.
 343      </para>
 344
 345      <table id="querymodel-atomic-queries-table" frame="top">
 346       <title>Atomic queries (&acro.apt;)</title>
 347       <tgroup cols="3">
 348        <thead>
 349         <row>
 350          <entry>Name</entry>
 351          <entry>Type</entry>
 352          <entry>Notes</entry>
 353         </row>
 354        </thead>
 355        <tbody>
 356         <row>
 357          <entry><emphasis>attribute list</emphasis></entry>
 358          <entry>List of <emphasis>orthogonal</emphasis> attributes</entry>
 359          <entry>Any of the orthogonal attribute types may be omitted,
 360           these are inherited from higher query tree nodes, or if not
 361           inherited, are set to the default &zebra; configuration values.
 362          </entry>
 363         </row>
 364         <row>
 365          <entry><emphasis>term</emphasis></entry>
 366          <entry>single <emphasis>term</emphasis>
 367           or <emphasis>quoted term list</emphasis>   </entry>
 368          <entry>Here the search terms or list of search terms is added
 369           to the query</entry>
 370         </row>
 371        </tbody>
 372       </tgroup>
 373      </table>
 374      <para>
 375       Querying for the term <emphasis>information</emphasis> in the
 376       default index using the default attribute set, the server choice
 377       of access point/index, and the default non-use attributes.
 378       <screen>
 379        Z> find information
 380       </screen>
 381      </para>
 382      <para>
 383       Equivalent query fully specified including all default values:
 384       <screen>
 385        Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information
 386       </screen>
 387      </para>
 388
 389      <para>
 390       Finding all documents which have the term
 391       <emphasis>debussy</emphasis> in the title field.
 392       <screen>
 393        Z> find @attr 1=4 debussy
 394       </screen>
 395      </para>
 396
 397      <para>
 398       The <emphasis>scan</emphasis> operation is only supported with
 399       atomic &acro.apt; queries, as it is bound to one access point at a
 400       time. Boolean query trees are not allowed during
 401       <emphasis>scan</emphasis>.
 402      </para>
 403
 404      <para>
 405       For example, we might want to scan the title index, starting with
 406       the term
 407       <emphasis>debussy</emphasis>, and displaying this and the
 408       following terms in lexicographic order:
 409       <screen>
 410        Z> scan @attr 1=4 debussy
 411       </screen>
 412      </para>
 413     </section>
 414
 415
 416     <section id="querymodel-resultset">
 417      <title>Named Result Sets</title>
 418      <para>
 419       Named result sets are supported in &zebra;, and result sets can be
 420       used as operands without limitations. It follows that named
 421       result sets are leaf nodes in the &acro.pqf; query tree, exactly as
 422       atomic &acro.apt; queries are.
 423      </para>
 424      <para>
 425       After the execution of a search, the result set is available at
 426       the server, such that the client can use it for subsequent
 427       searches or retrieval requests. The Z30.50 standard actually
 428       stresses the fact that result sets are volatile. It may cease
 429       to exist at any time point after search, and the server will
 430       send a diagnostic to the effect that the requested
 431       result set does not exist any more.
 432      </para>
 433
 434      <para>
 435       Defining a named result set and re-using it in the next query,
 436       using <application>yaz-client</application>. Notice that the client, not
 437       the server, assigns the string '1' to the
 438       named result set.
 439       <screen>
 440        Z> f @attr 1=4 mozart
 441        ...
 442        Number of hits: 43, setno 1
 443        ...
 444        Z> f @and @set 1 @attr 1=4 amadeus
 445        ...
 446        Number of hits: 14, setno 2
 447       </screen>
 448      </para>
 449
 450      <note>
 451       <para>
 452        Named result sets are only supported by the &acro.z3950; protocol.
 453        The &acro.sru; web service is stateless, and therefore the notion of
 454        named result sets does not exist when accessing a &zebra; server by
 455        the &acro.sru; protocol.
 456       </para>
 457      </note>
 458     </section>
 459
 460     <section id="querymodel-use-string">
 461      <title>&zebra;'s special access point of type 'string'</title>
 462      <para>
 463       The numeric <emphasis>use (type 1)</emphasis> attribute is usually
 464       referred to from a given
 465       attribute set. In addition, &zebra; let you use
 466       <emphasis>any internal index
 467        name defined in your configuration</emphasis>
 468       as use attribute value. This is a great feature for
 469       debugging, and when you do
 470       not need the complexity of defined use attribute values. It is
 471       the preferred way of accessing &zebra; indexes directly.
 472      </para>
 473      <para>
 474       Finding all documents which have the term list "information
 475       retrieval" in an &zebra; index, using its internal full string
 476       name. Scanning the same index.
 477       <screen>
 478        Z> find @attr 1=sometext "information retrieval"
 479        Z> scan @attr 1=sometext aterm
 480       </screen>
 481      </para>
 482      <para>
 483       Searching or scanning
 484       the bib-1 use attribute 54 using its string name:
 485       <screen>
 486        Z> find @attr 1=Code-language eng
 487        Z> scan @attr 1=Code-language ""
 488       </screen>
 489      </para>
 490      <para>
 491       It is possible to search
 492       in any silly string index - if it's defined in your
 493       indexing rules and can be parsed by the &acro.pqf; parser.
 494       This is definitely not the recommended use of
 495       this facility, as it might confuse your users with some very
 496       unexpected results.
 497       <screen>
 498        Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
 499       </screen>
 500      </para>
 501      <para>
 502       See also <xref linkend="querymodel-pqf-apt-mapping"/> for details, and
 503       <xref linkend="zebrasrv-sru"/>
 504       for the &acro.sru; &acro.pqf; query extension using string names as a fast
 505       debugging facility.
 506      </para>
 507     </section>
 508
 509     <section id="querymodel-use-xpath">
 510      <title>&zebra;'s special access point of type 'XPath'
 511       for &acro.grs1; filters</title>
 512      <para>
 513       As we have seen above, it is possible (albeit seldom a great
 514       idea) to emulate
 515       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
 516       search by defining <emphasis>use (type 1)</emphasis>
 517       <emphasis>string</emphasis> attributes which in appearance
 518       <emphasis>resemble XPath queries</emphasis>. There are two
 519       problems with this approach: first, the XPath-look-alike has to
 520       be defined at indexing time, no new undefined
 521       XPath queries can entered at search time, and second, it might
 522       confuse users very much that an XPath-alike index name in fact
 523       gets populated from a possible entirely different &acro.xml; element
 524       than it pretends to access.
 525      </para>
 526      <para>
 527       When using the &acro.grs1; Record Model
 528       (see  <xref linkend="grs"/>), we have the
 529       possibility to embed <emphasis>life</emphasis>
 530       XPath expressions
 531       in the &acro.pqf; queries, which are here called
 532       <emphasis>use (type 1)</emphasis> <emphasis>xpath</emphasis>
 533       attributes. You must enable the
 534       <literal>xpath enable</literal> directive in your
 535       <literal>.abs</literal> configuration files.
 536      </para>
 537      <note>
 538       <para>
 539        Only a <emphasis>very</emphasis> restricted subset of the
 540        <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
 541        standard is supported as the &acro.grs1; record model is simpler than
 542        a full &acro.xml; &acro.dom; structure. See the following examples for
 543        possibilities.
 544       </para>
 545      </note>
 546      <para>
 547       Finding all documents which have the term "content"
 548       inside a text node found in a specific &acro.xml; &acro.dom;
 549       <emphasis>subtree</emphasis>, whose starting element is
 550       addressed by XPath.
 551       <screen>
 552        Z> find @attr 1=/root content
 553        Z> find @attr 1=/root/first content
 554       </screen>
 555       <emphasis>Notice that the
 556        XPath must be absolute, i.e., must start with '/', and that the
 557        XPath <literal>descendant-or-self</literal> axis followed by a
 558        text node selection <literal>text()</literal> is implicitly
 559        appended to the stated XPath.
 560       </emphasis>
 561       It follows that the above searches are interpreted as:
 562       <screen>
 563        Z> find @attr 1=/root//text() content
 564        Z> find @attr 1=/root/first//text() content
 565       </screen>
 566      </para>
 567
 568      <para>
 569       Searching inside attribute strings is possible:
 570       <screen>
 571        Z> find @attr 1=/link/@creator morten
 572       </screen>
 573      </para>
 574
 575      <para>
 576       Filter the addressing XPath by a predicate working on exact
 577       string values in
 578       attributes (in the &acro.xml; sense) can be done: return all those docs which
 579       have the term "english" contained in one of all text sub nodes of
 580       the subtree defined by the XPath
 581       <literal>/record/title[@lang='en']</literal>. And similar
 582       predicate filtering.
 583       <screen>
 584        Z> find @attr 1=/record/title[@lang='en'] english
 585        Z> find @attr 1=/link[@creator='sisse'] sibelius
 586        Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius
 587       </screen>
 588      </para>
 589
 590      <para>
 591       Combining numeric indexes, boolean expressions,
 592       and xpath based searches is possible:
 593       <screen>
 594        Z> find @attr 1=/record/title @and foo bar
 595        Z> find @and @attr 1=/record/title foo @attr 1=4 bar
 596       </screen>
 597      </para>
 598      <para>
 599       Escaping &acro.pqf; keywords and other non-parseable XPath constructs
 600       with <literal>'{ }'</literal> to prevent client-side &acro.pqf; parsing
 601       syntax errors:
 602       <screen>
 603        Z> find @attr {1=/root/first[@attr='danish']} content
 604        Z> find @attr {1=/record/@set} oai
 605       </screen>
 606      </para>
 607      <warning>
 608       <para>
 609        It is worth mentioning that these dynamic performed XPath
 610        queries are a performance bottleneck, as no optimized
 611        specialized indexes can be used. Therefore, avoid the use of
 612        this facility when speed is essential, and the database content
 613        size is medium to large.
 614       </para>
 615      </warning>
 616     </section>
 617    </section>
 618
 619    <section id="querymodel-exp1">
 620     <title>Explain Attribute Set</title>
 621     <para>
 622      The &acro.z3950; standard defines the
 623      <ulink url="&url.z39.50.explain;">Explain</ulink> attribute set
 624      Exp-1, which is used to discover information
 625      about a server's search semantics and functional capabilities
 626      &zebra; exposes a  "classic"
 627      Explain database by base name <literal>IR-Explain-1</literal>, which
 628      is populated with system internal information.
 629     </para>
 630     <para>
 631      The attribute-set <literal>exp-1</literal> consists of a single
 632      use attribute (type 1).
 633     </para>
 634     <para>
 635      In addition, the non-Use
 636      &acro.bib1; attributes, that is, the types
 637      <emphasis>Relation</emphasis>, <emphasis>Position</emphasis>,
 638      <emphasis>Structure</emphasis>, <emphasis>Truncation</emphasis>,
 639      and <emphasis>Completeness</emphasis> are imported from
 640      the &acro.bib1; attribute set, and may be used
 641      within any explain query.
 642     </para>
 643
 644     <section id="querymodel-exp1-use">
 645      <title>Use Attributes (type = 1)</title>
 646      <para>
 647       The following Explain search attributes are supported:
 648       <literal>ExplainCategory</literal> (@attr 1=1),
 649       <literal>DatabaseName</literal> (@attr 1=3),
 650       <literal>DateAdded</literal> (@attr 1=9),
 651       <literal>DateChanged</literal>(@attr 1=10).
 652      </para>
 653      <para>
 654       A search in the use attribute  <literal>ExplainCategory</literal>
 655       supports only these predefined values:
 656       <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 657       <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 658      </para>
 659      <para>
 660       See <filename>tab/explain.att</filename> and the
 661       <ulink url="&url.z39.50;">&acro.z3950;</ulink> standard
 662       for more information.
 663      </para>
 664     </section>
 665
 666     <section id="querymodel-examples">
 667      <title>Explain searches with yaz-client</title>
 668      <para>
 669       Classic Explain only defines retrieval of Explain information
 670       via ASN.1. Practically no &acro.z3950; clients supports this. Fortunately
 671       they don't have to - &zebra; allows retrieval of this information
 672       in other formats:
 673       &acro.sutrs;, &acro.xml;,
 674       &acro.grs1; and  <literal>ASN.1</literal> Explain.
 675      </para>
 676
 677      <para>
 678       List supported categories to find out which explain commands are
 679       supported:
 680       <screen>
 681        Z> base IR-Explain-1
 682        Z> find @attr exp1 1=1 categorylist
 683        Z> form sutrs
 684        Z> show 1+2
 685       </screen>
 686      </para>
 687
 688      <para>
 689       Get target info, that is, investigate which databases exist at
 690       this server endpoint:
 691       <screen>
 692        Z> base IR-Explain-1
 693        Z> find @attr exp1 1=1 targetinfo
 694        Z> form xml
 695        Z> show 1+1
 696        Z> form grs-1
 697        Z> show 1+1
 698        Z> form sutrs
 699        Z> show 1+1
 700       </screen>
 701      </para>
 702
 703      <para>
 704       List all supported databases, the number of hits
 705       is the number of databases found, which most commonly are the
 706       following two:
 707       the <literal>Default</literal> and the
 708       <literal>IR-Explain-1</literal> databases.
 709       <screen>
 710        Z> base IR-Explain-1
 711        Z> find @attr exp1 1=1 databaseinfo
 712        Z> form sutrs
 713        Z> show 1+2
 714       </screen>
 715      </para>
 716
 717      <para>
 718       Get database info record for database <literal>Default</literal>.
 719       <screen>
 720        Z> base IR-Explain-1
 721        Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 722       </screen>
 723       Identical query with explicitly specified attribute set:
 724       <screen>
 725        Z> base IR-Explain-1
 726        Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 727       </screen>
 728      </para>
 729
 730      <para>
 731       Get attribute details record for database
 732       <literal>Default</literal>.
 733       This query is very useful to study the internal &zebra; indexes.
 734       If records have been indexed using the <literal>alvis</literal>
 735       &acro.xslt; filter, the string representation names of the known indexes can be
 736       found.
 737       <screen>
 738        Z> base IR-Explain-1
 739        Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 740       </screen>
 741       Identical query with explicitly specified attribute set:
 742       <screen>
 743        Z> base IR-Explain-1
 744        Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 745       </screen>
 746      </para>
 747     </section>
 748
 749    </section>
 750
 751    <section id="querymodel-bib1">
 752     <title>&acro.bib1; Attribute Set</title>
 753     <para>
 754      Most of the information contained in this section is an excerpt of
 755      the ATTRIBUTE SET &acro.bib1; (&acro.z3950;-1995) SEMANTICS
 756      found at <ulink url="&url.z39.50.attset.bib1.1995;">. The &acro.bib1;
 757       Attribute Set Semantics</ulink> from 1995, also in an updated
 758      <ulink url="&url.z39.50.attset.bib1;">&acro.bib1;
 759       Attribute Set</ulink>
 760      version from 2003. Index Data is not the copyright holder of this
 761      information, except for the configuration details, the listing of
 762      &zebra;'s capabilities, and the example queries.
 763     </para>
 764
 765
 766     <section id="querymodel-bib1-use">
 767      <title>Use Attributes (type 1)</title>
 768
 769      <para>
 770       A use attribute specifies an access point for any atomic query.
 771       These access points are highly dependent on the attribute set used
 772       in the query, and are user configurable using the following
 773       default configuration files:
 774       <filename>tab/bib1.att</filename>,
 775       <filename>tab/dan1.att</filename>,
 776       <filename>tab/explain.att</filename>, and
 777       <filename>tab/gils.att</filename>.
 778      </para>
 779      <para>
 780       For example, some few &acro.bib1; use
 781       attributes from the  <filename>tab/bib1.att</filename> are:
 782       <screen>
 783        att 1               Personal-name
 784        att 2               Corporate-name
 785        att 3               Conference-name
 786        att 4               Title
 787        ...
 788        att 1009            Subject-name-personal
 789        att 1010            Body-of-text
 790        att 1011            Date/time-added-to-db
 791        ...
 792        att 1016            Any
 793        att 1017            Server-choice
 794        att 1018            Publisher
 795        ...
 796        att 1035            Anywhere
 797        att 1036            Author-Title-Subject
 798       </screen>
 799      </para>
 800      <para>
 801       New attribute sets can be added by adding new
 802       <filename>tab/*.att</filename> configuration files, which need to
 803       be sourced in the main configuration <filename>zebra.cfg</filename>.
 804      </para>
 805      <para>
 806       In addition, &zebra; allows the access of
 807       <emphasis>internal index names</emphasis> and <emphasis>dynamic
 808        XPath</emphasis> as use attributes; see
 809       <xref linkend="querymodel-use-string"/> and
 810       <xref linkend="querymodel-use-xpath"/>.
 811      </para>
 812
 813      <para>
 814       Phrase search for <emphasis>information retrieval</emphasis> in
 815       the title-register, scanning the same register afterwards:
 816       <screen>
 817        Z> find @attr 1=4 "information retrieval"
 818        Z> scan @attr 1=4 information
 819       </screen>
 820      </para>
 821     </section>
 822
 823    </section>
 824
 825
 826    <section id="querymodel-bib1-nonuse">
 827     <title>&zebra; general Bib1 Non-Use Attributes (type 2-6)</title>
 828
 829     <section id="querymodel-bib1-relation">
 830      <title>Relation Attributes (type 2)</title>
 831
 832      <para>
 833       Relation attributes describe the relationship of the access
 834       point (left side
 835       of the relation) to the search term as qualified by the attributes (right
 836       side of the relation), e.g., Date-publication &lt;= 1975.
 837      </para>
 838
 839      <table id="querymodel-bib1-relation-table" frame="top">
 840       <title>Relation Attributes (type 2)</title>
 841       <tgroup cols="3">
 842        <thead>
 843         <row>
 844          <entry>Relation</entry>
 845          <entry>Value</entry>
 846          <entry>Notes</entry>
 847         </row>
 848        </thead>
 849        <tbody>
 850         <row>
 851          <entry>Less than</entry>
 852          <entry>1</entry>
 853          <entry>supported</entry>
 854         </row>
 855         <row>
 856          <entry>Less than or equal</entry>
 857          <entry>2</entry>
 858          <entry>supported</entry>
 859         </row>
 860         <row>
 861          <entry>Equal</entry>
 862          <entry>3</entry>
 863          <entry>default</entry>
 864         </row>
 865         <row>
 866          <entry>Greater or equal</entry>
 867          <entry>4</entry>
 868          <entry>supported</entry>
 869         </row>
 870         <row>
 871          <entry>Greater than</entry>
 872          <entry>5</entry>
 873          <entry>supported</entry>
 874         </row>
 875         <row>
 876          <entry>Not equal</entry>
 877          <entry>6</entry>
 878          <entry>unsupported</entry>
 879         </row>
 880         <row>
 881          <entry>Phonetic</entry>
 882          <entry>100</entry>
 883          <entry>unsupported</entry>
 884         </row>
 885         <row>
 886          <entry>Stem</entry>
 887          <entry>101</entry>
 888          <entry>unsupported</entry>
 889         </row>
 890         <row>
 891          <entry>Relevance</entry>
 892          <entry>102</entry>
 893          <entry>supported</entry>
 894         </row>
 895         <row>
 896          <entry>AlwaysMatches</entry>
 897          <entry>103</entry>
 898          <entry>supported *</entry>
 899         </row>
 900        </tbody>
 901       </tgroup>
 902      </table>
 903      <note>
 904       <para>
 905        AlwaysMatches searches are only supported if alwaysmatches indexing
 906        has been enabled. See <xref linkend="default-idx-file"/>
 907       </para>
 908      </note>
 909
 910      <para>
 911       The relation attributes 1-5 are supported and work exactly as
 912       expected.
 913       All ordering operations are based on a lexicographical ordering,
 914       <emphasis>except</emphasis> when the
 915       structure attribute numeric (109) is used. In
 916       this case, ordering is numerical. See
 917       <xref linkend="querymodel-bib1-structure"/>.
 918       <screen>
 919        Z> find @attr 1=Title @attr 2=1 music
 920        ...
 921        Number of hits: 11745, setno 1
 922        ...
 923        Z> find @attr 1=Title @attr 2=2 music
 924        ...
 925        Number of hits: 11771, setno 2
 926        ...
 927        Z> find @attr 1=Title @attr 2=3 music
 928        ...
 929        Number of hits: 532, setno 3
 930        ...
 931        Z> find @attr 1=Title @attr 2=4 music
 932        ...
 933        Number of hits: 11463, setno 4
 934        ...
 935        Z> find @attr 1=Title @attr 2=5 music
 936        ...
 937        Number of hits: 11419, setno 5
 938       </screen>
 939      </para>
 940
 941      <para>
 942       The relation attribute
 943       <emphasis>Relevance (102)</emphasis> is supported, see
 944       <xref linkend="administration-ranking"/> for full information.
 945      </para>
 946
 947      <para>
 948       Ranked search for <emphasis>information retrieval</emphasis> in
 949       the title-register:
 950       <screen>
 951        Z> find @attr 1=4 @attr 2=102 "information retrieval"
 952       </screen>
 953      </para>
 954
 955      <para>
 956       The relation attribute
 957       <emphasis>AlwaysMatches (103)</emphasis> is in the default
 958       configuration
 959       supported in conjecture with structure attribute
 960       <emphasis>Phrase (1)</emphasis> (which may be omitted by
 961       default).
 962       It can be configured to work with other structure attributes,
 963       see the configuration file
 964       <filename>tab/default.idx</filename> and
 965       <xref linkend="querymodel-pqf-apt-mapping"/>.
 966      </para>
 967      <para>
 968       <emphasis>AlwaysMatches (103)</emphasis> is a
 969       great way to discover how many documents have been indexed in a
 970       given field. The search term is ignored, but needed for correct
 971       &acro.pqf; syntax. An empty search term may be supplied.
 972       <screen>
 973        Z> find @attr 1=Title  @attr 2=103  ""
 974        Z> find @attr 1=Title  @attr 2=103  @attr 4=1 ""
 975       </screen>
 976      </para>
 977
 978
 979     </section>
 980
 981     <section id="querymodel-bib1-position">
 982      <title>Position Attributes (type 3)</title>
 983
 984      <para>
 985       The position attribute specifies the location of the search term
 986       within the field or subfield in which it appears.
 987      </para>
 988
 989      <table id="querymodel-bib1-position-table" frame="top">
 990       <title>Position Attributes (type 3)</title>
 991       <tgroup cols="3">
 992        <thead>
 993         <row>
 994          <entry>Position</entry>
 995          <entry>Value</entry>
 996          <entry>Notes</entry>
 997         </row>
 998        </thead>
 999        <tbody>
1000         <row>
1001          <entry>First in field </entry>
1002          <entry>1</entry>
1003          <entry>supported *</entry>
1004         </row>
1005         <row>
1006          <entry>First in subfield</entry>
1007          <entry>2</entry>
1008          <entry>supported *</entry>
1009         </row>
1010         <row>
1011          <entry>Any position in field</entry>
1012          <entry>3</entry>
1013          <entry>default</entry>
1014         </row>
1015        </tbody>
1016       </tgroup>
1017      </table>
1018
1019      <note>
1020       <para>
1021        &zebra; only supports first-in-field seaches if the
1022        <literal>firstinfield</literal> is enabled for the index
1023        Refer to <xref linkend="default-idx-file"/>.
1024        &zebra; does not distinguish between first in field and
1025        first in subfield. They result in the same hit count.
1026        Searching for first position in (sub)field in only supported in &zebra;
1027        2.0.2 and later.
1028       </para>
1029      </note>
1030     </section>
1031
1032     <section id="querymodel-bib1-structure">
1033      <title>Structure Attributes (type 4)</title>
1034
1035      <para>
1036       The structure attribute specifies the type of search
1037       term. This causes the search to be mapped on
1038       different &zebra; internal indexes, which must have been defined
1039       at index time.
1040      </para>
1041
1042      <para>
1043       The possible values of the
1044       <literal>structure attribute (type 4)</literal> can be defined
1045       using the configuration file <filename>tab/default.idx</filename>.
1046       The default configuration is summarized in this table.
1047      </para>
1048
1049      <table id="querymodel-bib1-structure-table" frame="top">
1050       <title>Structure Attributes (type 4)</title>
1051       <tgroup cols="3">
1052        <thead>
1053         <row>
1054          <entry>Structure</entry>
1055          <entry>Value</entry>
1056          <entry>Notes</entry>
1057         </row>
1058        </thead>
1059        <tbody>
1060         <row>
1061          <entry>Phrase </entry>
1062          <entry>1</entry>
1063          <entry>default</entry>
1064         </row>
1065         <row>
1066          <entry>Word</entry>
1067          <entry>2</entry>
1068          <entry>supported</entry>
1069         </row>
1070         <row>
1071          <entry>Key</entry>
1072          <entry>3</entry>
1073          <entry>supported</entry>
1074         </row>
1075         <row>
1076          <entry>Year</entry>
1077          <entry>4</entry>
1078          <entry>supported</entry>
1079         </row>
1080         <row>
1081          <entry>Date (normalized)</entry>
1082          <entry>5</entry>
1083          <entry>supported</entry>
1084         </row>
1085         <row>
1086          <entry>Word list</entry>
1087          <entry>6</entry>
1088          <entry>supported</entry>
1089         </row>
1090         <row>
1091          <entry>Date (un-normalized)</entry>
1092          <entry>100</entry>
1093          <entry>unsupported</entry>
1094         </row>
1095         <row>
1096          <entry>Name (normalized) </entry>
1097          <entry>101</entry>
1098          <entry>unsupported</entry>
1099         </row>
1100         <row>
1101          <entry>Name (un-normalized) </entry>
1102          <entry>102</entry>
1103          <entry>unsupported</entry>
1104         </row>
1105         <row>
1106          <entry>Structure</entry>
1107          <entry>103</entry>
1108          <entry>unsupported</entry>
1109         </row>
1110         <row>
1111          <entry>Urx</entry>
1112          <entry>104</entry>
1113          <entry>supported</entry>
1114         </row>
1115         <row>
1116          <entry>Free-form-text</entry>
1117          <entry>105</entry>
1118          <entry>supported</entry>
1119         </row>
1120         <row>
1121          <entry>Document-text</entry>
1122          <entry>106</entry>
1123          <entry>supported</entry>
1124         </row>
1125         <row>
1126          <entry>Local-number</entry>
1127          <entry>107</entry>
1128          <entry>supported</entry>
1129         </row>
1130         <row>
1131          <entry>String</entry>
1132          <entry>108</entry>
1133          <entry>unsupported</entry>
1134         </row>
1135         <row>
1136          <entry>Numeric string</entry>
1137          <entry>109</entry>
1138          <entry>supported</entry>
1139         </row>
1140        </tbody>
1141       </tgroup>
1142      </table>
1143      <para>
1144       The structure attribute values
1145       <literal>Word list (6)</literal>
1146       is supported, and maps to the boolean <literal>AND</literal>
1147       combination of words supplied. The word list is useful when
1148       Google-like bag-of-word queries need to be translated from a GUI
1149       query language to &acro.pqf;.  For example, the following queries
1150       are equivalent:
1151       <screen>
1152        Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
1153        Z> find @attr 1=Title  @and mozart amadeus
1154       </screen>
1155      </para>
1156
1157      <para>
1158       The structure attribute value
1159       <literal>Free-form-text (105)</literal> and
1160       <literal>Document-text (106)</literal>
1161       are supported, and map both to the boolean <literal>OR</literal>
1162       combination of words supplied. The following queries
1163       are equivalent:
1164       <screen>
1165        Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman"
1166        Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman"
1167        Z> find @attr 1=Body-of-text @or bach @or salieri teleman
1168       </screen>
1169       This <literal>OR</literal> list of terms is very useful in
1170       combination with relevance ranking:
1171       <screen>
1172        Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman"
1173       </screen>
1174      </para>
1175
1176      <para>
1177       The structure attribute value
1178       <literal>Local number (107)</literal>
1179       is supported, and maps always to the &zebra; internal document ID,
1180       irrespectively which use attribute is specified. The following queries
1181       have exactly the same unique record in the hit set:
1182       <screen>
1183        Z> find @attr 4=107 10
1184        Z> find @attr 1=4 @attr 4=107 10
1185        Z> find @attr 1=1010 @attr 4=107 10
1186       </screen>
1187      </para>
1188
1189      <para>
1190       In
1191       the GILS schema (<literal>gils.abs</literal>), the
1192       west-bounding-coordinate is indexed as type <literal>n</literal>,
1193       and is therefore searched by specifying
1194       <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1195       To match all those records with west-bounding-coordinate greater
1196       than -114 we use the following query:
1197       <screen>
1198        Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1199       </screen>
1200      </para>
1201      <note>
1202       <para>
1203        The exact mapping between &acro.pqf; queries and &zebra; internal indexes
1204        and index types is explained in
1205        <xref linkend="querymodel-pqf-apt-mapping"/>.
1206       </para>
1207      </note>
1208     </section>
1209
1210
1211     <section id="querymodel-bib1-truncation">
1212      <title>Truncation Attributes (type = 5)</title>
1213
1214      <para>
1215       The truncation attribute specifies whether variations of one or
1216       more characters are allowed between search term and hit terms, or
1217       not. Using non-default truncation attributes will broaden the
1218       document hit set of a search query.
1219      </para>
1220
1221      <table id="querymodel-bib1-truncation-table" frame="top">
1222       <title>Truncation Attributes (type 5)</title>
1223       <tgroup cols="3">
1224        <thead>
1225         <row>
1226          <entry>Truncation</entry>
1227          <entry>Value</entry>
1228          <entry>Notes</entry>
1229         </row>
1230        </thead>
1231        <tbody>
1232         <row>
1233          <entry>Right truncation </entry>
1234          <entry>1</entry>
1235          <entry>supported</entry>
1236         </row>
1237         <row>
1238          <entry>Left truncation</entry>
1239          <entry>2</entry>
1240          <entry>supported</entry>
1241         </row>
1242         <row>
1243          <entry>Left and right truncation</entry>
1244          <entry>3</entry>
1245          <entry>supported</entry>
1246         </row>
1247         <row>
1248          <entry>Do not truncate</entry>
1249          <entry>100</entry>
1250          <entry>default</entry>
1251         </row>
1252         <row>
1253          <entry>Process # in search term</entry>
1254          <entry>101</entry>
1255          <entry>supported</entry>
1256         </row>
1257         <row>
1258          <entry>RegExpr-1 </entry>
1259          <entry>102</entry>
1260          <entry>supported</entry>
1261         </row>
1262         <row>
1263          <entry>RegExpr-2</entry>
1264          <entry>103</entry>
1265          <entry>supported</entry>
1266         </row>
1267        </tbody>
1268       </tgroup>
1269      </table>
1270
1271      <para>
1272       The truncation attribute values 1-3 perform the obvious way:
1273       <screen>
1274        Z> scan @attr 1=Body-of-text  schnittke
1275        ...
1276        * schnittke (81)
1277        schnittkes (31)
1278        schnittstelle (1)
1279        ...
1280        Z> find @attr 1=Body-of-text  @attr 5=1 schnittke
1281        ...
1282        Number of hits: 95, setno 7
1283        ...
1284        Z> find @attr 1=Body-of-text  @attr 5=2 schnittke
1285        ...
1286        Number of hits: 81, setno 6
1287        ...
1288        Z> find @attr 1=Body-of-text  @attr 5=3 schnittke
1289        ...
1290        Number of hits: 95, setno 8
1291       </screen>
1292      </para>
1293
1294      <para>
1295       The truncation attribute value
1296       <literal>Process # in search term (101)</literal> is a
1297       poor-man's regular expression search. It maps
1298       each <literal>#</literal> to <literal>.*</literal>, and
1299       performs then a <literal>Regexp-1 (102)</literal> regular
1300       expression search. The following two queries are equivalent:
1301       <screen>
1302        Z> find @attr 1=Body-of-text  @attr 5=101 schnit#ke
1303        Z> find @attr 1=Body-of-text  @attr 5=102 schnit.*ke
1304        ...
1305        Number of hits: 89, setno 10
1306       </screen>
1307      </para>
1308
1309      <para>
1310       The truncation attribute value
1311       <literal>Regexp-1 (102)</literal> is a normal regular search,
1312       see <xref linkend="querymodel-regular"/> for details.
1313       <screen>
1314        Z> find @attr 1=Body-of-text  @attr 5=102 schnit+ke
1315        Z> find @attr 1=Body-of-text  @attr 5=102 schni[a-t]+ke
1316       </screen>
1317      </para>
1318
1319      <para>
1320       The truncation attribute value
1321       <literal>Regexp-2 (103) </literal> is a &zebra; specific extension
1322       which allows <emphasis>fuzzy</emphasis> matches. One single
1323       error in spelling of search terms is allowed, i.e., a document
1324       is hit if it includes a term which can be mapped to the used
1325       search term by one character substitution, addition, deletion or
1326       change of position.
1327       <screen>
1328        Z> find @attr 1=Body-of-text  @attr 5=100 schnittke
1329        ...
1330        Number of hits: 81, setno 14
1331        ...
1332        Z> find @attr 1=Body-of-text  @attr 5=103 schnittke
1333        ...
1334        Number of hits: 103, setno 15
1335        ...
1336       </screen>
1337      </para>
1338     </section>
1339
1340     <section id="querymodel-bib1-completeness">
1341      <title>Completeness Attributes (type = 6)</title>
1342
1343
1344      <para>
1345       The <literal>Completeness Attributes (type = 6)</literal>
1346       is used to specify that a given search term or term list is  either
1347       part of the terms of a given index/field
1348       (<literal>Incomplete subfield (1)</literal>), or is
1349       what literally is found in the entire field's index
1350       (<literal>Complete field (3)</literal>).
1351      </para>
1352
1353      <table id="querymodel-bib1-completeness-table" frame="top">
1354       <title>Completeness Attributes (type = 6)</title>
1355       <tgroup cols="3">
1356        <thead>
1357         <row>
1358          <entry>Completeness</entry>
1359          <entry>Value</entry>
1360          <entry>Notes</entry>
1361         </row>
1362        </thead>
1363        <tbody>
1364         <row>
1365          <entry>Incomplete subfield</entry>
1366          <entry>1</entry>
1367          <entry>default</entry>
1368         </row>
1369         <row>
1370          <entry>Complete subfield</entry>
1371          <entry>2</entry>
1372          <entry>deprecated</entry>
1373         </row>
1374         <row>
1375          <entry>Complete field</entry>
1376          <entry>3</entry>
1377          <entry>supported</entry>
1378         </row>
1379        </tbody>
1380       </tgroup>
1381      </table>
1382
1383      <para>
1384       The <literal>Completeness Attributes (type = 6)</literal>
1385       is only partially and conditionally
1386       supported in the sense that it is ignored if the hit index is
1387       not of structure <literal>type="w"</literal> or
1388       <literal>type="p"</literal>.
1389      </para>
1390      <para>
1391       <literal>Incomplete subfield (1)</literal> is the default, and
1392       makes &zebra; use
1393       register <literal>type="w"</literal>, whereas
1394       <literal>Complete field (3)</literal> triggers
1395       search and scan in index <literal>type="p"</literal>.
1396      </para>
1397      <para>
1398       The <literal>Complete subfield (2)</literal> is a reminiscent
1399       from the  happy &acro.marc;
1400       binary format days. &zebra; does not support it, but maps silently
1401       to <literal>Complete field (3)</literal>.
1402      </para>
1403
1404      <note>
1405       <para>
1406        The exact mapping between &acro.pqf; queries and &zebra; internal indexes
1407        and index types is explained in
1408        <xref linkend="querymodel-pqf-apt-mapping"/>.
1409       </para>
1410      </note>
1411     </section>
1412
1413    </section>
1414
1415   </section>
1416
1417   <section id="querymodel-zebra">
1418    <title>Extended &zebra; &acro.rpn; Features</title>
1419    <para>
1420     The &zebra; internal query engine has been extended to specific needs
1421     not covered by the <literal>bib-1</literal> attribute set query
1422     model. These extensions are <emphasis>non-standard</emphasis>
1423     and <emphasis>non-portable</emphasis>: most functional extensions
1424     are modeled over the <literal>bib-1</literal> attribute set,
1425     defining type 7 and higher values.
1426     There are also the special
1427     <literal>string</literal> type index names for the
1428     <literal>idxpath</literal> attribute set.
1429    </para>
1430
1431    <section id="querymodel-zebra-attr-allrecords">
1432     <title>&zebra; specific retrieval of all records</title>
1433     <para>
1434      &zebra; defines a hardwired <literal>string</literal> index name
1435      called <literal>_ALLRECORDS</literal>. It matches any record
1436      contained in the database, if used in conjunction with
1437      the relation attribute
1438      <literal>AlwaysMatches (103)</literal>.
1439     </para>
1440     <para>
1441      The <literal>_ALLRECORDS</literal> index name is used for total database
1442      export. The search term is ignored, it may be empty.
1443      <screen>
1444       Z> find @attr 1=_ALLRECORDS @attr 2=103 ""
1445      </screen>
1446     </para>
1447     <para>
1448      Combination with other index types can be made. For example, to
1449      find all records which are <emphasis>not</emphasis> indexed in
1450      the <literal>Title</literal> register, issue one of the two
1451      equivalent queries:
1452      <screen>
1453       Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 ""
1454       Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 ""
1455      </screen>
1456     </para>
1457     <warning>
1458      <para>
1459       The special string index <literal>_ALLRECORDS</literal> is
1460       experimental, and the provided functionality and syntax may very
1461       well change in future releases of &zebra;.
1462      </para>
1463     </warning>
1464    </section>
1465
1466    <section id="querymodel-zebra-attr-search">
1467     <title>&zebra; specific Search Extensions to all Attribute Sets</title>
1468     <para>
1469      &zebra; extends the &acro.bib1; attribute types, and these extensions are
1470      recognized regardless of attribute
1471      set used in a <literal>search</literal> operation query.
1472     </para>
1473
1474     <table id="querymodel-zebra-attr-search-table" frame="top">
1475      <title>&zebra; Search Attribute Extensions</title>
1476      <tgroup cols="4">
1477       <thead>
1478        <row>
1479         <entry>Name</entry>
1480         <entry>Value</entry>
1481         <entry>Operation</entry>
1482         <entry>&zebra; version</entry>
1483        </row>
1484       </thead>
1485       <tbody>
1486        <row>
1487         <entry>Embedded Sort</entry>
1488         <entry>7</entry>
1489         <entry>search</entry>
1490         <entry>1.1</entry>
1491        </row>
1492        <row>
1493         <entry>Term Set</entry>
1494         <entry>8</entry>
1495         <entry>search</entry>
1496         <entry>1.1</entry>
1497        </row>
1498        <row>
1499         <entry>Rank Weight</entry>
1500         <entry>9</entry>
1501         <entry>search</entry>
1502         <entry>1.1</entry>
1503        </row>
1504        <row>
1505         <entry>Term Reference</entry>
1506         <entry>10</entry>
1507         <entry>search</entry>
1508         <entry>1.4</entry>
1509        </row>
1510        <row>
1511         <entry>Local Approx Limit</entry>
1512         <entry>11</entry>
1513         <entry>search</entry>
1514         <entry>1.4</entry>
1515        </row>
1516        <row>
1517         <entry>Global Approx Limit</entry>
1518         <entry>12</entry>
1519         <entry>search</entry>
1520         <entry>2.0.8</entry>
1521        </row>
1522        <row>
1523         <entry>Maximum number of truncated terms (truncmax)</entry>
1524         <entry>13</entry>
1525         <entry>search</entry>
1526         <entry>2.0.10</entry>
1527        </row>
1528        <row>
1529         <entry>
1530          Specifies whether un-indexed fields should be ignored.
1531          A zero value (default) throws a diagnostic when an un-indexed
1532          field is specified. A non-zero value makes it return 0 hits.
1533         </entry>
1534         <entry>14</entry>
1535         <entry>search</entry>
1536         <entry>2.0.16</entry>
1537        </row>
1538       </tbody>
1539      </tgroup>
1540     </table>
1541
1542     <section id="querymodel-zebra-attr-sorting">
1543      <title>&zebra; Extension Embedded Sort Attribute (type 7)</title>
1544      <para>
1545       The embedded sort is a way to specify sort within a query - thus
1546       removing the need to send a Sort Request separately. It is both
1547       faster and does not require clients to deal with the Sort
1548       Facility.
1549      </para>
1550
1551      <para>
1552       All ordering operations are based on a lexicographical ordering,
1553       <emphasis>except</emphasis> when the
1554       <literal>structure attribute numeric (109)</literal> is used. In
1555       this case, ordering is numerical. See
1556       <xref linkend="querymodel-bib1-structure"/>.
1557      </para>
1558
1559      <para>
1560       The possible values after attribute <literal>type 7</literal> are
1561       <literal>1</literal> ascending and
1562       <literal>2</literal> descending.
1563       The attributes+term (&acro.apt;) node is separate from the
1564       rest and must be <literal>@or</literal>'ed.
1565       The term associated with &acro.apt; is the sorting level in integers,
1566       where <literal>0</literal> means primary sort,
1567       <literal>1</literal> means secondary sort, and so forth.
1568       See also <xref linkend="administration-ranking"/>.
1569      </para>
1570      <para>
1571       For example, searching for water, sort by title (ascending)
1572       <screen>
1573        Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1574       </screen>
1575      </para>
1576      <para>
1577       Or, searching for water, sort by title ascending, then date descending
1578       <screen>
1579        Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1580       </screen>
1581      </para>
1582     </section>
1583
1584     <!--
1585     &zebra; Extension Term Set Attribute
1586     From the manual text, I can not see what is the point with this feature.
1587     I think it makes more sense when there are multiple terms in a query, or
1588     something...
1589
1590     We decided 2006-06-03 to disable this feature, as it is covered by
1591     scan within a resultset. Better use ressources to upgrade this
1592     feature for good performance.
1593     -->
1594
1595     <!--
1596     <section id="querymodel-zebra-attr-estimation">
1597     <title>&zebra; Extension Term Set Attribute (type 8)</title>
1598     <para>
1599     The Term Set feature is a facility that allows a search to store
1600     hitting terms in a "pseudo" resultset; thus a search (as usual) +
1601     a scan-like facility. Requires a client that can do named result
1602     sets since the search generates two result sets. The value for
1603     attribute 8 is the name of a result set (string). The terms in
1604     the named term set are returned as &acro.sutrs; records.
1605    </para>
1606     <para>
1607     For example, searching  for u in title, right truncated, and
1608     storing the result in term set named 'aset'
1609     <screen>
1610     Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1611    </screen>
1612    </para>
1613     <warning>
1614     The model has one serious flaw: we don't know the size of term
1615     set. Experimental. Do not use in production code.
1616    </warning>
1617    </section>
1618     -->
1619
1620
1621     <section id="querymodel-zebra-attr-weight">
1622      <title>&zebra; Extension Rank Weight Attribute (type 9)</title>
1623      <para>
1624       Rank weight is a way to pass a value to a ranking algorithm - so
1625       that one &acro.apt; has one value - while another as a different one.
1626       See also <xref linkend="administration-ranking"/>.
1627      </para>
1628      <para>
1629       For example, searching  for utah in title with weight 30 as well
1630       as any with weight 20:
1631       <screen>
1632        Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1633       </screen>
1634      </para>
1635     </section>
1636
1637     <section id="querymodel-zebra-attr-termref">
1638      <title>&zebra; Extension Term Reference Attribute (type 10)</title>
1639      <para>
1640       &zebra; supports the searchResult-1 facility.
1641       If the Term Reference Attribute (type 10) is
1642       given, that specifies a subqueryId value returned as part of the
1643       search result. It is a way for a client to name an &acro.apt; part of a
1644       query.
1645      </para>
1646
1647      <warning>
1648       <para>
1649        Experimental. Do not use in production code.
1650       </para>
1651      </warning>
1652
1653     </section>
1654
1655
1656
1657     <section id="querymodel-zebra-local-attr-limit">
1658      <title>Local Approximative Limit Attribute (type 11)</title>
1659      <para>
1660       &zebra; computes - unless otherwise configured -
1661       the exact hit count for every &acro.apt;
1662       (leaf) in the query tree. These hit counts are returned as part of
1663       the searchResult-1 facility in the binary encoded &acro.z3950; search
1664       response packages.
1665      </para>
1666      <para>
1667       By setting an estimation limit size of the resultset of the &acro.apt;
1668       leaves, &zebra; stops processing the result set when the limit
1669       length is reached.
1670       Hit counts under this limit are still precise, but hit counts over it
1671       are estimated using the statistics gathered from the chopped
1672       result set.
1673      </para>
1674      <para>
1675       Specifying a limit of <literal>0</literal> results in exact hit counts.
1676      </para>
1677      <para>
1678       For example, we might be interested in exact hit count for a, but
1679       for b we allow hit count estimates for 1000 and higher.
1680       <screen>
1681        Z> find @and a @attr 11=1000 b
1682       </screen>
1683      </para>
1684      <note>
1685       <para>
1686        The estimated hit count facility makes searches faster, as one
1687        only needs to process large hit lists partially.
1688        It is mostly used in huge databases, where you you want trade
1689        exactness of hit counts against speed of execution.
1690       </para>
1691      </note>
1692      <warning>
1693       <para>
1694        Do not use approximative hit count limits
1695        in conjunction with relevance ranking, as re-sorting of the
1696        result set only works when the entire result set has
1697        been processed.
1698       </para>
1699      </warning>
1700     </section>
1701
1702     <section id="querymodel-zebra-global-attr-limit">
1703      <title>Global Approximative Limit Attribute (type 12)</title>
1704      <para>
1705       By default &zebra; computes precise hit counts for a query as
1706       a whole. Setting attribute 12 makes it perform approximative
1707       hit counts instead. It has the same semantics as
1708       <literal>estimatehits</literal> for the <xref linkend="zebra-cfg"/>.
1709      </para>
1710      <para>
1711       The attribute (12) can occur anywhere in the query tree.
1712       Unlike regular attributes it does not relate to the leaf (&acro.apt;)
1713       - but to the whole query.
1714      </para>
1715      <warning>
1716       <para>
1717        Do not use approximative hit count limits
1718        in conjunction with relevance ranking, as re-sorting of the
1719        result set only works when the entire result set has
1720        been processed.
1721       </para>
1722      </warning>
1723     </section>
1724
1725    </section>
1726
1727    <section id="querymodel-zebra-attr-scan">
1728     <title>&zebra; specific Scan Extensions to all Attribute Sets</title>
1729     <para>
1730      &zebra; extends the Bib1 attribute types, and these extensions are
1731      recognized regardless of attribute
1732      set used in a scan operation query.
1733     </para>
1734     <table id="querymodel-zebra-attr-scan-table" frame="top">
1735      <title>&zebra; Scan Attribute Extensions</title>
1736      <tgroup cols="4">
1737       <thead>
1738        <row>
1739         <entry>Name</entry>
1740         <entry>Type</entry>
1741         <entry>Operation</entry>
1742         <entry>&zebra; version</entry>
1743        </row>
1744       </thead>
1745       <tbody>
1746        <row>
1747         <entry>Result Set Narrow</entry>
1748         <entry>8</entry>
1749         <entry>scan</entry>
1750         <entry>1.3</entry>
1751        </row>
1752        <row>
1753         <entry>Approximative Limit</entry>
1754         <entry>12</entry>
1755         <entry>scan</entry>
1756         <entry>2.0.20</entry>
1757        </row>
1758       </tbody>
1759      </tgroup>
1760     </table>
1761
1762     <section id="querymodel-zebra-attr-narrow">
1763      <title>&zebra; Extension Result Set Narrow (type 8)</title>
1764      <para>
1765       If attribute Result Set Narrow (type 8)
1766       is given for scan, the value is the name of a
1767       result set. Each hit count in scan is
1768       <literal>@and</literal>'ed with the result set given.
1769      </para>
1770      <para>
1771       Consider for example
1772       the case of scanning all title fields around the
1773       scanterm <emphasis>mozart</emphasis>, then refining the scan by
1774       issuing a filtering query for <emphasis>amadeus</emphasis> to
1775       restrict the scan to the result set of the query:
1776       <screen>
1777        Z> scan @attr 1=4 mozart
1778        ...
1779        * mozart (43)
1780        mozartforskningen (1)
1781        mozartiana (1)
1782        mozarts (16)
1783        ...
1784        Z> f @attr 1=4 amadeus
1785        ...
1786        Number of hits: 15, setno 2
1787        ...
1788        Z> scan @attr 1=4 @attr 8=2 mozart
1789        ...
1790        * mozart (14)
1791        mozartforskningen (0)
1792        mozartiana (0)
1793        mozarts (1)
1794        ...
1795       </screen>
1796      </para>
1797
1798      <para>
1799       &zebra; 2.0.2 and later is able to skip 0 hit counts. This, however,
1800       is known not to scale if the number of terms to skip is high.
1801       This most likely will happen if the result set is small (and
1802       result in many 0 hits).
1803      </para>
1804     </section>
1805
1806     <section id="querymodel-zebra-attr-approx">
1807      <title>&zebra; Extension Approximative Limit (type 12)</title>
1808      <para>
1809       The &zebra; Extension Approximative Limit (type 12) is a way to
1810       enable approximate hit counts for scan hit counts, in the same
1811       way as for search hit counts.
1812      </para>
1813     </section>
1814    </section>
1815
1816    <section id="querymodel-idxpath">
1817     <title>&zebra; special &acro.idxpath; Attribute Set for &acro.grs1; indexing</title>
1818     <para>
1819      The attribute-set <literal>idxpath</literal> consists of a single
1820      Use (type 1) attribute. All non-use attributes behave as normal.
1821     </para>
1822     <para>
1823      This feature is enabled when defining the
1824      <literal>xpath enable</literal> option in the &acro.grs1; filter
1825      <filename>*.abs</filename> configuration files. If one wants to use
1826      the special <literal>idxpath</literal> numeric attribute set, the
1827      main &zebra; configuration file <filename>zebra.cfg</filename>
1828      directive <literal>attset: idxpath.att</literal> must be enabled.
1829     </para>
1830     <warning>
1831      <para>
1832       The <literal>idxpath</literal> is deprecated, may not be
1833       supported in future &zebra; versions, and should definitely
1834       not be used in production code.
1835      </para>
1836     </warning>
1837
1838     <section id="querymodel-idxpath-use">
1839      <title>&acro.idxpath; Use Attributes (type = 1)</title>
1840      <para>
1841       This attribute set allows one to search &acro.grs1; filter indexed
1842       records by &acro.xpath; like structured index names.
1843      </para>
1844
1845      <warning>
1846       <para>
1847        The <literal>idxpath</literal> option defines hard-coded
1848        index names, which might clash with your own index names.
1849       </para>
1850      </warning>
1851
1852      <table id="querymodel-idxpath-use-table" frame="top">
1853       <title>&zebra; specific &acro.idxpath; Use Attributes (type 1)</title>
1854       <tgroup cols="4">
1855        <thead>
1856         <row>
1857          <entry>&acro.idxpath;</entry>
1858          <entry>Value</entry>
1859          <entry>String Index</entry>
1860          <entry>Notes</entry>
1861         </row>
1862        </thead>
1863        <tbody>
1864         <row>
1865          <entry>&acro.xpath; Begin</entry>
1866          <entry>1</entry>
1867          <entry>_XPATH_BEGIN</entry>
1868          <entry>deprecated</entry>
1869         </row>
1870         <row>
1871          <entry>&acro.xpath; End</entry>
1872          <entry>2</entry>
1873          <entry>_XPATH_END</entry>
1874          <entry>deprecated</entry>
1875         </row>
1876         <row>
1877          <entry>&acro.xpath; CData</entry>
1878          <entry>1016</entry>
1879          <entry>_XPATH_CDATA</entry>
1880          <entry>deprecated</entry>
1881         </row>
1882         <row>
1883          <entry>&acro.xpath; Attribute Name</entry>
1884          <entry>3</entry>
1885          <entry>_XPATH_ATTR_NAME</entry>
1886          <entry>deprecated</entry>
1887         </row>
1888         <row>
1889          <entry>&acro.xpath; Attribute CData</entry>
1890          <entry>1015</entry>
1891          <entry>_XPATH_ATTR_CDATA</entry>
1892          <entry>deprecated</entry>
1893         </row>
1894        </tbody>
1895       </tgroup>
1896      </table>
1897
1898      <para>
1899       See <filename>tab/idxpath.att</filename> for more information.
1900      </para>
1901      <para>
1902       Search for all documents starting with root element
1903       <literal>/root</literal> (either using the numeric or the string
1904       use attributes):
1905       <screen>
1906        Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
1907        Z> find @attr idxpath 1=1 @attr 4=3 root/
1908        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
1909       </screen>
1910      </para>
1911      <para>
1912       Search for all documents where specific nested &acro.xpath;
1913       <literal>/c1/c2/../cn</literal> exists. Notice the very
1914       counter-intuitive <emphasis>reverse</emphasis> notation!
1915       <screen>
1916        Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
1917        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
1918       </screen>
1919      </para>
1920      <para>
1921       Search for CDATA string <emphasis>text</emphasis> in any  element
1922       <screen>
1923        Z> find @attrset idxpath @attr 1=1016 text
1924        Z> find @attr 1=_XPATH_CDATA text
1925       </screen>
1926      </para>
1927      <para>
1928       Search for CDATA string <emphasis>anothertext</emphasis> in any
1929       attribute:
1930       <screen>
1931        Z> find @attrset idxpath @attr 1=1015 anothertext
1932        Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
1933       </screen>
1934      </para>
1935      <para>
1936       Search for all documents with have an &acro.xml; element node
1937       including an &acro.xml;  attribute named <emphasis>creator</emphasis>
1938       <screen>
1939        Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
1940        Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
1941       </screen>
1942      </para>
1943      <para>
1944       Combining usual <literal>bib-1</literal> attribute set searches
1945       with <literal>idxpath</literal> attribute set searches:
1946       <screen>
1947        Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
1948        Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
1949       </screen>
1950      </para>
1951      <para>
1952       Scanning is supported on all <literal>idxpath</literal>
1953       indexes, both specified as numeric use attributes, or as string
1954       index names.
1955       <screen>
1956        Z> scan  @attrset idxpath @attr 1=1016 text
1957        Z> scan  @attr 1=_XPATH_ATTR_CDATA anothertext
1958        Z> scan  @attrset idxpath @attr 1=3 @attr 4=3 ''
1959       </screen>
1960      </para>
1961
1962     </section>
1963    </section>
1964
1965
1966    <section id="querymodel-pqf-apt-mapping">
1967     <title>Mapping from &acro.pqf; atomic &acro.apt; queries to &zebra; internal
1968      register indexes</title>
1969     <para>
1970      The rules for &acro.pqf; &acro.apt; mapping are rather tricky to grasp in the
1971      first place. We deal first with the rules for deciding which
1972      internal register or string index to use, according to the use
1973      attribute or access point specified in the query. Thereafter we
1974      deal with the rules for determining the correct structure type of
1975      the named register.
1976     </para>
1977
1978     <section id="querymodel-pqf-apt-mapping-accesspoint">
1979      <title>Mapping of &acro.pqf; &acro.apt; access points</title>
1980      <para>
1981       &zebra; understands four fundamental different types of access
1982       points, of which only the
1983       <emphasis>numeric use attribute</emphasis> type access points
1984       are defined by the  <ulink url="&url.z39.50;">&acro.z3950;</ulink>
1985       standard.
1986       All other access point types are &zebra; specific, and non-portable.
1987      </para>
1988
1989      <table id="querymodel-zebra-mapping-accesspoint-types" frame="top">
1990       <title>Access point name mapping</title>
1991       <tgroup cols="4">
1992        <thead>
1993         <row>
1994          <entry>Access Point</entry>
1995          <entry>Type</entry>
1996          <entry>Grammar</entry>
1997          <entry>Notes</entry>
1998         </row>
1999        </thead>
2000        <tbody>
2001         <row>
2002          <entry>Use attribute</entry>
2003          <entry>numeric</entry>
2004          <entry>[1-9][1-9]*</entry>
2005          <entry>directly mapped to string index name</entry>
2006         </row>
2007         <row>
2008          <entry>String index name</entry>
2009          <entry>string</entry>
2010          <entry>[a-zA-Z](\-?[a-zA-Z0-9])*</entry>
2011          <entry>normalized name is used as internal string index name</entry>
2012         </row>
2013         <row>
2014          <entry>&zebra; internal index name</entry>
2015          <entry>zebra</entry>
2016          <entry>_[a-zA-Z](_?[a-zA-Z0-9])*</entry>
2017          <entry>hardwired internal string index name</entry>
2018         </row>
2019         <row>
2020          <entry>&acro.xpath; special index</entry>
2021          <entry>XPath</entry>
2022          <entry>/.*</entry>
2023          <entry>special xpath search for &acro.grs1; indexed records</entry>
2024         </row>
2025        </tbody>
2026       </tgroup>
2027      </table>
2028
2029      <para>
2030       <literal>Attribute set names</literal> and
2031       <literal>string index names</literal> are normalizes
2032       according to the following rules: all <emphasis>single</emphasis>
2033       hyphens <literal>'-'</literal> are stripped, and all upper case
2034       letters are folded to lower case.
2035      </para>
2036
2037      <para>
2038       <emphasis>Numeric use attributes</emphasis> are mapped
2039       to the &zebra; internal
2040       string index according to the attribute set definition in use.
2041       The default attribute set is &acro.bib1;, and may be
2042       omitted in the &acro.pqf; query.
2043      </para>
2044
2045      <para>
2046       According to normalization and numeric
2047       use attribute mapping, it follows that the following
2048       &acro.pqf; queries are considered equivalent (assuming the default
2049       configuration has not been altered):
2050       <screen>
2051        Z> find  @attr 1=Body-of-text serenade
2052        Z> find  @attr 1=bodyoftext serenade
2053        Z> find  @attr 1=BodyOfText serenade
2054        Z> find  @attr 1=bO-d-Y-of-tE-x-t serenade
2055        Z> find  @attr 1=1010 serenade
2056        Z> find  @attrset bib1 @attr 1=1010 serenade
2057        Z> find  @attrset bib1 @attr 1=1010 serenade
2058        Z> find  @attrset Bib1 @attr 1=1010 serenade
2059        Z> find  @attrset b-I-b-1 @attr 1=1010 serenade
2060       </screen>
2061      </para>
2062
2063      <para>
2064       The <emphasis>numerical</emphasis>
2065       <literal>use attributes (type 1)</literal>
2066       are interpreted according to the
2067       attribute sets which have been loaded in the
2068       <literal>zebra.cfg</literal> file, and are matched against specific
2069       fields as specified in the <literal>.abs</literal> file which
2070       describes the profile of the records which have been loaded.
2071       If no use attribute is provided, a default of
2072       &acro.bib1; Use Any (1016) is assumed.
2073       The predefined use attribute sets
2074       can be reconfigured by  tweaking the configuration files
2075       <filename>tab/*.att</filename>, and
2076       new attribute sets can be defined by adding similar files in the
2077       configuration path <literal>profilePath</literal> of the server.
2078      </para>
2079
2080      <para>
2081       String indexes can be accessed directly,
2082       independently which attribute set is in use. These are just
2083       ignored. The above mentioned name normalization applies.
2084       String index names are defined in the
2085       used indexing  filter configuration files, for example in the
2086       &acro.grs1;
2087       <filename>*.abs</filename> configuration files, or in the
2088       <literal>alvis</literal> filter &acro.xslt; indexing stylesheets.
2089      </para>
2090
2091      <para>
2092       &zebra; internal indexes can be accessed directly,
2093       according to the same rules as the user defined
2094       string indexes. The only difference is that
2095       &zebra; internal index names are hardwired,
2096       all uppercase and
2097       must start with the character <literal>'_'</literal>.
2098      </para>
2099
2100      <para>
2101       Finally, &acro.xpath; access points are only
2102       available using the &acro.grs1; filter for indexing.
2103       These access point names must start with the character
2104       <literal>'/'</literal>, they are <emphasis>not
2105        normalized</emphasis>, but passed unaltered to the &zebra; internal
2106       &acro.xpath; engine. See <xref linkend="querymodel-use-xpath"/>.
2107
2108      </para>
2109
2110
2111     </section>
2112
2113
2114     <section id="querymodel-pqf-apt-mapping-structuretype">
2115      <title>Mapping of &acro.pqf; &acro.apt; structure and completeness to
2116       register type</title>
2117      <para>
2118       Internally &zebra; has in its default configuration several
2119       different types of registers or indexes, whose tokenization and
2120       character normalization rules differ. This reflects the fact that
2121       searching fundamental different tokens like dates, numbers,
2122       bitfields and string based text needs different rule sets.
2123      </para>
2124
2125      <table id="querymodel-zebra-mapping-structure-types" frame="top">
2126       <title>Structure and completeness mapping to register types</title>
2127       <tgroup cols="4">
2128        <thead>
2129         <row>
2130          <entry>Structure</entry>
2131          <entry>Completeness</entry>
2132          <entry>Register type</entry>
2133          <entry>Notes</entry>
2134         </row>
2135        </thead>
2136        <tbody>
2137         <row>
2138          <entry>
2139           phrase (@attr 4=1), word (@attr 4=2),
2140           word-list (@attr 4=6),
2141           free-form-text  (@attr 4=105), or document-text (@attr 4=106)
2142          </entry>
2143          <entry>Incomplete field (@attr 6=1)</entry>
2144          <entry>Word ('w')</entry>
2145          <entry>Traditional tokenized and character normalized word index</entry>
2146         </row>
2147         <row>
2148          <entry>
2149           phrase (@attr 4=1), word (@attr 4=2),
2150           word-list (@attr 4=6),
2151           free-form-text  (@attr 4=105), or document-text (@attr 4=106)
2152          </entry>
2153          <entry>complete field' (@attr 6=3)</entry>
2154          <entry>Phrase ('p')</entry>
2155          <entry>Character normalized, but not tokenized index for phrase
2156           matches
2157          </entry>
2158         </row>
2159         <row>
2160          <entry>urx (@attr 4=104)</entry>
2161          <entry>ignored</entry>
2162          <entry>URX/URL ('u')</entry>
2163          <entry>Special index for URL web addresses</entry>
2164         </row>
2165         <row>
2166          <entry>numeric (@attr 4=109)</entry>
2167          <entry>ignored</entry>
2168          <entry>Numeric ('n')</entry>
2169          <entry>Special index for digital numbers</entry>
2170         </row>
2171         <row>
2172          <entry>key (@attr 4=3)</entry>
2173          <entry>ignored</entry>
2174          <entry>Null bitmap ('0')</entry>
2175          <entry>Used for non-tokenized and non-normalized bit sequences</entry>
2176         </row>
2177         <row>
2178          <entry>year (@attr 4=4)</entry>
2179          <entry>ignored</entry>
2180          <entry>Year ('y')</entry>
2181          <entry>Non-tokenized and non-normalized 4 digit numbers</entry>
2182         </row>
2183         <row>
2184          <entry>date (@attr 4=5)</entry>
2185          <entry>ignored</entry>
2186          <entry>Date ('d')</entry>
2187          <entry>Non-tokenized and non-normalized ISO date strings</entry>
2188         </row>
2189         <row>
2190          <entry>ignored</entry>
2191          <entry>ignored</entry>
2192          <entry>Sort ('s')</entry>
2193          <entry>Used with special sort attribute set (@attr 7=1, @attr 7=2)</entry>
2194         </row>
2195         <row>
2196          <entry>overruled</entry>
2197          <entry>overruled</entry>
2198          <entry>special</entry>
2199          <entry>Internal record ID register, used whenever
2200           Relation Always Matches (@attr 2=103) is specified</entry>
2201         </row>
2202        </tbody>
2203       </tgroup>
2204      </table>
2205
2206      <!-- see in util/zebramap.c -->
2207
2208      <para>
2209       If a <emphasis>Structure</emphasis> attribute of
2210       <emphasis>Phrase</emphasis> is used in conjunction with a
2211       <emphasis>Completeness</emphasis> attribute of
2212       <emphasis>Complete (Sub)field</emphasis>, the term is matched
2213       against the contents of the phrase (long word) register, if one
2214       exists for the given <emphasis>Use</emphasis> attribute.
2215       A phrase register is created for those fields in the
2216       &acro.grs1; <filename>*.abs</filename> file that contains a
2217       <literal>p</literal>-specifier.
2218       <screen>
2219        Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
2220        ...
2221        bayreuther festspiele (1)
2222        * beethoven bibliography database (1)
2223        benny carter (1)
2224        ...
2225        Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
2226        ...
2227        Number of hits: 0, setno 5
2228        ...
2229        Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
2230        ...
2231        Number of hits: 1, setno 6
2232       </screen>
2233      </para>
2234
2235      <para>
2236       If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
2237       used in conjunction with <emphasis>Incomplete Field</emphasis> - the
2238       default value for <emphasis>Completeness</emphasis>, the
2239       search is directed against the normal word registers, but if the term
2240       contains multiple words, the term will only match if all of the words
2241       are found immediately adjacent, and in the given order.
2242       The word search is performed on those fields that are indexed as
2243       type <literal>w</literal> in the &acro.grs1; <filename>*.abs</filename> file.
2244       <screen>
2245        Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2246        ...
2247        beefheart (1)
2248        * beethoven (18)
2249        beethovens (7)
2250        ...
2251        Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2252        ...
2253        Number of hits: 18, setno 1
2254        ...
2255        Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven  bibliography"
2256        ...
2257        Number of hits: 2, setno 2
2258        ...
2259       </screen>
2260      </para>
2261
2262      <para>
2263       If the <emphasis>Structure</emphasis> attribute is
2264       <emphasis>Word List</emphasis>,
2265       <emphasis>Free-form Text</emphasis>, or
2266       <emphasis>Document Text</emphasis>, the term is treated as a
2267       natural-language, relevance-ranked query.
2268       This search type uses the word register, i.e. those fields
2269       that are indexed as type <literal>w</literal> in the
2270       &acro.grs1; <filename>*.abs</filename> file.
2271      </para>
2272
2273      <para>
2274       If the <emphasis>Structure</emphasis> attribute is
2275       <emphasis>Numeric String</emphasis> the term is treated as an integer.
2276       The search is performed on those fields that are indexed
2277       as type <literal>n</literal> in the &acro.grs1;
2278       <filename>*.abs</filename> file.
2279      </para>
2280
2281      <para>
2282       If the <emphasis>Structure</emphasis> attribute is
2283       <emphasis>URX</emphasis> the term is treated as a URX (URL) entity.
2284       The search is performed on those fields that are indexed as type
2285       <literal>u</literal> in the <filename>*.abs</filename> file.
2286      </para>
2287
2288      <para>
2289       If the <emphasis>Structure</emphasis> attribute is
2290       <emphasis>Local Number</emphasis> the term is treated as
2291       native &zebra; Record Identifier.
2292      </para>
2293
2294      <para>
2295       If the <emphasis>Relation</emphasis> attribute is
2296       <emphasis>Equals</emphasis> (default), the term is matched
2297       in a normal fashion (modulo truncation and processing of
2298       individual words, if required).
2299       If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
2300       <emphasis>Less Than or Equal</emphasis>,
2301       <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
2302        Equal</emphasis>, the term is assumed to be numerical, and a
2303       standard regular expression is constructed to match the given
2304       expression.
2305       If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
2306       the standard natural-language query processor is invoked.
2307      </para>
2308
2309      <para>
2310       For the <emphasis>Truncation</emphasis> attribute,
2311       <emphasis>No Truncation</emphasis> is the default.
2312       <emphasis>Left Truncation</emphasis> is not supported.
2313       <emphasis>Process # in search term</emphasis> is supported, as is
2314       <emphasis>Regxp-1</emphasis>.
2315       <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
2316       search. As a default, a single error (deletion, insertion,
2317       replacement) is accepted when terms are matched against the register
2318       contents.
2319      </para>
2320
2321     </section>
2322    </section>
2323
2324    <section  id="querymodel-regular">
2325     <title>&zebra; Regular Expressions in Truncation Attribute (type = 5)</title>
2326
2327     <para>
2328      Each term in a query is interpreted as a regular expression if
2329      the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
2330      or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
2331      Both query types follow the same syntax with the operands:
2332     </para>
2333
2334     <table id="querymodel-regular-operands-table" frame="top">
2335      <title>Regular Expression Operands</title>
2336      <tgroup cols="2">
2337       <tbody>
2338        <row>
2339         <entry><literal>x</literal></entry>
2340         <entry>Matches the character <literal>x</literal>.</entry>
2341        </row>
2342        <row>
2343         <entry><literal>.</literal></entry>
2344         <entry>Matches any character.</entry>
2345        </row>
2346        <row>
2347         <entry><literal>[ .. ]</literal></entry>
2348         <entry>Matches the set of characters specified;
2349          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</entry>
2350        </row>
2351       </tbody>
2352      </tgroup>
2353     </table>
2354
2355     <para>
2356      The above operands can be combined with the following operators:
2357     </para>
2358
2359     <table id="querymodel-regular-operators-table" frame="top">
2360      <title>Regular Expression Operators</title>
2361      <tgroup cols="2">
2362       <tbody>
2363        <row>
2364         <entry><literal>x*</literal></entry>
2365         <entry>Matches <literal>x</literal> zero or more times.
2366          Priority: high.</entry>
2367        </row>
2368        <row>
2369         <entry><literal>x+</literal></entry>
2370         <entry>Matches <literal>x</literal> one or more times.
2371          Priority: high.</entry>
2372        </row>
2373        <row>
2374         <entry><literal>x?</literal></entry>
2375         <entry> Matches <literal>x</literal> zero or once.
2376          Priority: high.</entry>
2377        </row>
2378        <row>
2379         <entry><literal>xy</literal></entry>
2380         <entry> Matches <literal>x</literal>, then <literal>y</literal>.
2381          Priority: medium.</entry>
2382        </row>
2383        <row>
2384         <entry><literal>x|y</literal></entry>
2385         <entry> Matches either <literal>x</literal> or <literal>y</literal>.
2386          Priority: low.</entry>
2387        </row>
2388        <row>
2389         <entry><literal>( )</literal></entry>
2390         <entry>The order of evaluation may be changed by using parentheses.</entry>
2391        </row>
2392       </tbody>
2393      </tgroup>
2394     </table>
2395
2396     <para>
2397      If the first character of the <literal>Regxp-2</literal> query
2398      is a plus character (<literal>+</literal>) it marks the
2399      beginning of a section with non-standard specifiers.
2400      The next plus character marks the end of the section.
2401      Currently &zebra; only supports one specifier, the error tolerance,
2402      which consists one digit.
2403      <!-- TODO Nice thing, but what does
2404      that error tolerance digit *mean*? Maybe an example would be nice? -->
2405     </para>
2406
2407     <para>
2408      Since the plus operator is normally a suffix operator the addition to
2409      the query syntax doesn't violate the syntax for standard regular
2410      expressions.
2411     </para>
2412
2413     <para>
2414      For example, a phrase search with regular expressions  in
2415      the title-register is performed like this:
2416      <screen>
2417       Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
2418      </screen>
2419     </para>
2420
2421     <para>
2422      Combinations with other attributes are possible. For example, a
2423      ranked search with a regular expression:
2424      <screen>
2425       Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
2426      </screen>
2427     </para>
2428    </section>
2429
2430
2431    <!--
2432    <para>
2433    The RecordType parameter in the <literal>zebra.cfg</literal> file, or
2434    the <literal>-t</literal> option to the indexer tells &zebra; how to
2435    process input records.
2436    Two basic types of processing are available - raw text and structured
2437    data. Raw text is just that, and it is selected by providing the
2438    argument <literal>text</literal> to &zebra;. Structured records are
2439    all handled internally using the basic mechanisms described in the
2440    subsequent sections.
2441    &zebra; can read structured records in many different formats.
2442   </para>
2443    -->
2444   </section>
2445
2446
2447   <section id="querymodel-cql-to-pqf">
2448    <title>Server Side &acro.cql; to &acro.pqf; Query Translation</title>
2449    <para>
2450     Using the
2451     <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
2452     &yaz; Frontend Virtual
2453     Hosts option, one can configure
2454     the &yaz; Frontend &acro.cql;-to-&acro.pqf;
2455     converter, specifying the interpretation of various
2456     <ulink url="&url.cql;">&acro.cql;</ulink>
2457     indexes, relations, etc. in terms of Type-1 query attributes.
2458     <!-- The  yaz-client config file -->
2459    </para>
2460    <para>
2461     For example, using server-side &acro.cql;-to-&acro.pqf; conversion, one might
2462     query a zebra server like this:
2463     <screen>
2464      <![CDATA[
2465      yaz-client localhost:9999
2466      Z> querytype cql
2467      Z> find text=(plant and soil)
2468      ]]>
2469     </screen>
2470     and - if properly configured - even static relevance ranking can
2471     be performed using &acro.cql; query syntax:
2472     <screen>
2473      <![CDATA[
2474      Z> find text = /relevant (plant and soil)
2475      ]]>
2476     </screen>
2477    </para>
2478
2479    <para>
2480     By the way, the same configuration can be used to
2481     search using client-side &acro.cql;-to-&acro.pqf; conversion:
2482     (the only difference is <literal>querytype cql2rpn</literal>
2483     instead of
2484     <literal>querytype cql</literal>, and the call specifying a local
2485     conversion file)
2486     <screen>
2487      <![CDATA[
2488      yaz-client -q local/cql2pqf.txt localhost:9999
2489      Z> querytype cql2rpn
2490      Z> find text=(plant and soil)
2491      ]]>
2492     </screen>
2493    </para>
2494
2495    <para>
2496     Exhaustive information can be found in the
2497     Section <ulink url="&url.yaz.cql2pqf;">&acro.cql; to &acro.rpn; conversion</ulink>
2498     in the &yaz; manual.
2499    </para>
2500    <!--
2501    <para>
2502    See
2503    <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html"/>
2504    for the Maintenance Agency's work-in-progress mapping of Dublin Core
2505    indexes to Attribute Architecture (util, XD and BIB-2)
2506    attributes.
2507   </para>
2508    -->
2509   </section>
2510
2511  </chapter>
2512
2513  <!-- Keep this comment at the end of the file
2514  Local variables:
2515  mode: sgml
2516  sgml-omittag:t
2517  sgml-shorttag:t
2518  sgml-minimize-attributes:nil
2519  sgml-always-quote-attributes:t
2520  sgml-indent-step:1
2521  sgml-indent-data:t
2522  sgml-parent-document: "idzebra.xml"
2523  sgml-local-catalogs: nil
2524  sgml-namecase-general:t
2525  End:
2526  -->