1 <chapter id="introduction">
2 <title>Introduction</title>
5 <title>Overview</title>
8 The Zebra system is a fielded free-text indexing and retrieval engine with a
9 Z39.50 frontend. You can use any commercial or freeware Z39.50 client
10 to access data stored in Zebra.
14 The Zebra server is our first step towards the development of a fully
15 configurable, open information system. Eventually, it will be paired
16 off with a powerful Z39.50 client to support complex information
17 management tasks within almost any application domain. We're making
18 the server available now because it's no fun to be in the open
19 information retrieval business all by yourself. We want to allow
20 people with interesting data to make their things
21 available in interesting ways, without having to start out
22 by implementing yet another protocol stack from scratch.
26 This document is an introduction to the Zebra system. It will tell you
27 how to compile the software, and how to prepare your first database.
28 It also explains how the server can be configured to give you the
29 functionality that you need.
33 If you find the software interesting, you should join the support
34 mailing-list by sending email to <literal>zebra-request@indexdata.dk</literal>.
40 <title>Features</title>
43 This is a list of some of the most important features of the
53 Supports updating - records can be added and deleted without
54 rebuilding the index from scratch.
55 The update procedure is tolerant to crashes or hard interrupts
56 during register updating - registers can be reconstructed following a crash.
57 Registers can be safely updated even while users are accessing the server.
64 Supports large databases - files for indices, etc. can be
65 automatically partitioned over multiple disks.
72 Supports arbitrarily complex records - base input format is an
73 SGML-like syntax which allows nested (structured) data elements, as
74 well as variant forms of data.
81 Supports random storage formats. A system of input filters driven by
82 regular expressions allows you to easily process most ASCII-based
83 data formats. SGML, ISO2709 (MARC), and raw text are also supported.
90 Supports boolean queries as well as relevance-ranking (free-text)
91 searching. Right truncation and masking in terms are supported, as
92 well as full regular expressions.
99 Supports multiple concrete syntaxes
100 for record exchange (depending on the configuration): GRS-1, SUTRS,
101 ISO2709 (*MARC). Records can be mapped between record syntaxes and
109 Supports approximate matching in registers (ie. spelling mistakes,
129 Protocol facilities: Init, Search, Retrieve, Browse and Sort.
136 Piggy-backed presents are honored in the search-request.
143 Named result sets are supported.
150 Easily configured to support different application profiles, with
151 tables for attribute sets, tag sets, and abstract syntaxes.
152 Additional tables control facilities such as element mappings to
153 different schema (eg., GILS-to-USMARC).
160 Complex composition specifications using Espec-1 are partially
161 supported (simple element requests only).
168 Element Set Names are defined using the Espec-1 capability of the
169 system, and are given in configuration files as simple element
170 requests (and possibly variant requests).
177 Some variant support (not fully implemented yet).
184 Using the YAZ toolkit for the protocol implementation, the
185 server can utilise a plug-in XTI/mOSI implementation (not included) to
186 provide SR services over an OSI stack, as well as Z39.50 over TCP/IP.
193 Zebra runs on most Unix-like systems as well as Windows NT - a binary
194 distribution for Windows NT is forthcoming - so far, the installation
195 requires MSVC++ to compile the system (we use version 5.0).
207 <title>Future Work</title>
210 This is a beta-release of the software, to allow you to look at
211 it - try it out, and assess whether it can be of use to you.
215 These are some of the plans that we have for the software in the near
216 and far future, approximately ordered after their relative importance.
218 asterisk will be implemented before the
228 *Complete the support for variants.
235 *Finalize the data element <emphasis>include</emphasis> facility
236 to support multimedia data elements in records.
243 Add more sophisticated relevance ranking mechanisms. Add support for soundex
244 and stemming. Add relevance <emphasis remap="it">feedback</emphasis> support.
251 Complete EXPLAIN support.
258 Add support for very large records by implementing segmentation and/or
266 Support the Item Update extended service of the protocol.
273 We want to add a management system that allows you to
274 control your databases and configuration tables from a graphical
275 interface. We'll probably use Tcl/Tk to stay platform-independent.
285 Programmers thrive on user feedback. If you are interested in a facility that
286 you don't see mentioned here, or if there's something you think we
287 could do better, please drop us a mail. If you think it's all really
288 neat, you're welcome to drop us a line saying that, too. You'll find
289 contact info at the end of this file.