<chapter id="administration">
- <!-- $Id: administration.xml,v 1.8 2002-10-11 09:05:09 adam Exp $ -->
+ <!-- $Id: administration.xml,v 1.23 2006-02-15 12:08:47 marc Exp $ -->
<title>Administrating Zebra</title>
-
+ <!-- ### It's a bit daft that this chapter (which describes half of
+ the configuration-file formats) is separated from
+ "recordmodel-grs.xml" (which describes the other half) by the
+ instructions on running zebraidx and zebrasrv. Some careful
+ re-ordering is required here.
+ -->
+
<para>
Unlike many simpler retrieval systems, Zebra supports safe, incremental
updates to an existing index.
<para>
You can edit the configuration file with a normal text editor.
parameter names and values are separated by colons in the file. Lines
- starting with a hash sign (<literal>#</literal>) are
+ starting with a hash sign (<literal>#</literal>) are
treated as comments.
</para>
<varlistentry>
<term>
<emphasis>group</emphasis>
- .recordType[<emphasis>.name</emphasis>]:
+ .recordType[<emphasis>.name</emphasis>]:
<replaceable>type</replaceable>
</term>
<listitem>
group of records. If you plan to update/delete this type of
records later this should be specified as 1; otherwise it
should be 0 (default), to save register space.
+ <!-- ### this is the first mention of "register" -->
See <xref linkend="file-ids"/>.
</para>
</listitem>
</listitem>
</varlistentry>
<varlistentry>
+ <!-- ### probably a better place to define "register" -->
<term>register: <replaceable>register-location</replaceable></term>
<listitem>
<para>
<term>keyTmpDir: <replaceable>directory</replaceable></term>
<listitem>
<para>
- Directory in which temporary files used during zebraidx' update
+ Directory in which temporary files used during zebraidx's update
phase are stored.
</para>
</listitem>
</listitem>
</varlistentry>
<varlistentry>
- <term>profilePath: <literal>path</literal></term>
+ <term>profilePath: <replaceable>path</replaceable></term>
<listitem>
<para>
Specifies a path of profile specification files.
Specifies <replaceable>size</replaceable> of internal memory
to use for the zebraidx program.
The amount is given in megabytes - default is 4 (4 MB).
+ The more memory, the faster large updates happen, up to about
+ half the free memory available on the computer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>tempfiles: <replaceable>Yes/Auto/No</replaceable></term>
+ <listitem>
+ <para>
+ Tells zebra if it should use temporary files when indexing. The
+ default is Auto, in which case zebra uses temporary files only
+ if it would need more that <replaceable>memMax</replaceable>
+ megabytes of memory. This should be good for most uses.
</para>
</listitem>
</varlistentry>
<para>
Specifies a directory base for Zebra. All relative paths
given (in profilePath, register, shadow) are based on this
- directory. This setting is useful if if you Zebra server
+ directory. This setting is useful if your Zebra server
is running in a different directory from where
<literal>zebra.cfg</literal> is located.
</para>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>passwd: <replaceable>file</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a file with description of user accounts for Zebra.
+ The format is similar to that known to Apache's htpasswd files
+ and UNIX' passwd files. Non-empty lines not beginning with
+ # are considered account lines. There is one account per-line.
+ A line consists of fields separate by a single colon character.
+ First field is username, second is password.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>passwd.c: <replaceable>file</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a file with description of user accounts for Zebra.
+ File format is similar to that used by the passwd directive except
+ that the password are encrypted. Use Apache's htpasswd or similar
+ for maintenanace.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>perm.<replaceable>user</replaceable>:
+ <replaceable>permstring</replaceable></term>
+ <listitem>
+ <para>
+ Specifies permissions (priviledge) for a user that are allowed
+ to access Zebra via the passwd system. There are two kinds
+ of permissions currently: read (r) and write(w). By default
+ users not listed in a permission directive are given the read
+ priviledge. To specify permissions for a user with no
+ username, or Z39.50 anonymous style use
+ <literal>anonymous</literal>. The permstring consists of
+ a sequence of characters. Include character <literal>w</literal>
+ for write/update access, <literal>r</literal> for read access.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>dbaccess <replaceable>accessfile</replaceable></term>
+ <listitem>
+ <para>
+ Names a file which lists database subscriptions for individual users.
+ The access file should consists of lines of the form <literal>username:
+ dbnames</literal>, where dbnames is a list of database names, seprated by
+ '+'. No whitespace is allowed in the database list.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
That is, when a client wishes to retrieve a record
following a search operation, the files are accessed from the place
where you originally put them - if you remove the files (without
- running <literal>zebraidx</literal> again, the client
- will receive a diagnostic message.
+ running <literal>zebraidx</literal> again, the server will return
+ diagnostic number 14 (``System error in presenting records'') to
+ the client.
</para>
<para>
<para>
<screen>
- profilePath: /usr/local/yaz
+ profilePath: /usr/local/idzebra/tab
attset: bib1.att
simple.recordType: text
simple.database: textbase
and then run <literal>zebraidx</literal> with the
<literal>update</literal> command.
</para>
+ <!-- ### what happens if a file contains multiple records? -->
</sect1>
<sect1 id="generic-ids">
</para>
<para>
- (see <xref linkend="data-model"/>
+ (see <xref linkend="record-model-grs"/>
for details of how the mapping between elements of your records and
searchable attributes is established).
</para>
<screen>
register: /d1:500M
-
shadow: /scratch1:100M /scratch2:200M
</screen>
</sect2>
</sect1>
+
+
+ <sect1 id="administration-ranking">
+ <title>Static and Dynamic Ranking</title>
+
+ <para>
+ Zebra uses internally inverted indexes to look up term occurencies
+ in documents. Multiple queries from different indexes can be
+ combined by the binary boolean operations <literal>AND</literal>,
+ <literal>OR</literal> and/or <literal>NOT</literal> (which
+ is in fact a binary <literal>AND NOT</literal> operation).
+ To ensure fast query execution
+ speed, all indexes have to be sorted in the same order.
+ </para>
+ <para>
+ The indexes are normally sorted according to document
+ <literal>ID</literal> in
+ ascending order, and any query which does not invoke a special
+ re-ranking function will therefore retrieve the result set in
+ document
+ <literal>ID</literal>
+ order.
+ </para>
+ <para>
+ If one defines the
+ <screen>
+ staticrank: 1
+ </screen>
+ directive in the main core Zebra config file, the internal document
+ keys used for ordering are augmented by a preceeding integer, which
+ contains the static rank of a given document, and the index lists
+ are ordered
+ first by ascending static rank,
+ then by ascending document <literal>ID</literal>.
+ </para>
+ <para>
+ This implies that the default rank <literal>0</literal>
+ is the best rank at the
+ beginning of the list, and <literal>max int</literal>
+ is the worst static rank.
+ </para>
+ <para>
+ The experimental <literal>alvis</literal> filter provides a
+ directive to fetch static rank information out of the indexed XML
+ records, thus making <emphasis>all</emphasis> hit sets orderd
+ after <emphasis>ascending</emphasis> static
+ rank, and for those doc's which have the same static rank, ordered
+ after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
+ See <xref linkend="record-model-alvisxslt"/> for the glory details.
+ </para>
+ <para>
+ If one wants to do a little fiddeling with the static rank order,
+ one has to invoke additional re-ranking/re-ordering using dynamic
+ reranking or score functions. These functions return positive
+ interger scores, where <emphasis>highest</emphasis> score is
+ <emphasis>best</emphasis>, which means that the
+ hit sets will be sorted according to
+ <emphasis>decending</emphasis>
+ scores (in contrary
+ to the index lists which are sorted according to
+ <emphasis>ascending</emphasis> rank number and document ID).
+ </para>
+ <!--
+ <para>
+ Those are defined in the zebra C source files
+ <screen>
+ "rank-1" : zebra/index/rank1.c
+ default TF/IDF like zebra dynamic ranking
+ "rank-static" : zebra/index/rankstatic.c
+ do-nothing dummy static ranking (this is just to prove
+ that the static rank can be used in dynamic ranking functions)
+ "zvrank" : zebra/index/zvrank.c
+ many different dynamic TF/IDF ranking functions
+ </screen>
+ </para>
+ -->
+ <para>
+ Those are in the zebra config file enabled by a directive like (use
+ only one of these a time!):
+ <screen>
+ rank: rank-1 # default
+ rank: rank-static # dummy
+ rank: zvrank # TDF-IDF like
+ </screen>
+ Notice that the <literal>rank-1</literal> and
+ <literal>zvrank</literal> do not use the static rank
+ information in the list keys, and will produce the same ordering
+ with our without static ranking enabled.
+ </para>
+ <para>
+ The dummy <literal>rank-static</literal> reranking/scoring
+ function returns just
+ <literal>score = max int - staticrank</literal>
+ in order to preserve the ordering of hit sets with and without it's
+ call.
+ Obviously, to combine static and dynamic ranking usefully, one wants
+ to make a new ranking
+ function, which is left
+ as an exercise for the reader.
+ </para>
+
+ </sect1>
+
</chapter>
+
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml