1 <chapter id="introduction">
2 <!-- $Id: introduction.xml,v 1.5 2002-04-10 14:47:49 heikki Exp $ -->
3 <title>Introduction</title>
6 <title>Overview</title>
10 <ulink url="http://www.indexdata.dk/zebra/">
12 system is a fielded free-text indexing and retrieval engine with a
13 Z39.50 front-end. You can use our various toolkits or any commercial
14 or free-ware Z39.50 client to access data stored in Zebra.
18 FIXME - not a "first step" but a part of a complete system! -H
22 The Zebra server is our first step towards the development of a fully
23 configurable, open information system. Eventually, it will be paired
24 off with a powerful Z39.50 client to support complex information
25 management tasks within almost any application domain. We're making
26 the server available now because it's no fun to be in the open
27 information retrieval business all by yourself. We want to allow
28 people with interesting data to make their things
29 available in interesting ways, without having to start out
30 by implementing yet another protocol stack from scratch.
34 This document is an introduction to the Zebra system. It will tell you
35 how to compile the software, and how to prepare your first database.
36 It also explains how the server can be configured to give you the
37 functionality that you need.
42 If you find the software interesting, you should visit the
43 <ulink url="http://www.indexdata.dk/zebra/">
44 Zebra web site</ulink>, where you can join the
45 <ulink url="http://www.indexdata.dk/mailman/listinfo/zebralist">
53 <title>Features</title>
56 This is a list of some of the most important features of the
65 Supports large databases - files for indices, etc. can be
66 automatically partitioned over multiple disks.
72 Supports arbitrarily complex records - base input format is an
73 SGML-like syntax which allows nested (structured) data elements, as
74 well as variant forms of data.
80 Robust updating - records can be added and deleted without
81 rebuilding the index from scratch.
82 The update procedure is tolerant to crashes or hard interrupts
83 during register updating - registers can be reconstructed following
85 Registers can be safely updated even while users are accessing
92 Supports random storage formats. A system of input filters driven by
93 regular expressions allows you to easily process most ASCII-based
94 data formats. SGML, XML, ISO2709 (MARC), and raw text are also
101 Supports boolean queries as well as relevance-ranking (free-text)
102 searching. Right truncation and masking in terms are supported, as
103 well as full regular expressions.
109 Can import the data into Zebras own storage, or just refer to
110 external files (html pages).
116 Supports multiple concrete syntaxes
117 for record exchange (depending on the configuration): GRS-1, SUTRS,
118 XML, ISO2709 (*MARC). Records can be mapped between record syntaxes
119 and schema on the fly.
125 Supports approximate matching in registers (ie. spelling mistakes,
132 Zebra is written in portable C, so it runs on most Unix-like systems
133 as well as Windows NT - a binary distribution for Windows NT is available.
149 Protocol facilities: Init, Search, Retrieve, Delete, Browse and Sort.
150 FIXME - Itemupdate. (Remove delete until that time, confuses people) -H
156 Piggy-backed presents are honored in the search-request.
162 Named result sets are supported.
167 Easily configured to support different application profiles, with
168 tables for attribute sets, tag sets, and abstract syntaxes.
169 Additional tables control facilities such as element mappings to
170 different schema (eg., GILS-to-USMARC).
176 Complex composition specifications using Espec-1 are partially
177 supported (simple element requests only).
183 Element Set Names are defined using the Espec-1 capability of the
184 system, and are given in configuration files as simple element
185 requests (and possibly variant requests).
191 Some variant support (not fully implemented yet).
192 FIXME - Test if complete enough - is it worth mentioning at all -H
203 <title>Future Work</title>
206 These are some of the plans that we have for the software in the near
207 and far future, approximately ordered after their relative importance.
209 asterisk will be implemented before the
211 FIXME - What are the current plans?
218 *Complete the support for variants.
225 *Finalize the data element <emphasis>include</emphasis> facility
226 to support multimedia data elements in records.
232 Add more sophisticated relevance ranking mechanisms.
233 Add support for soundex and stemming.
234 Add relevance <emphasis>feedback</emphasis> support.
240 Complete EXPLAIN support.
246 Add support for very large records by implementing segmentation and/or
253 Support the Item Update extended service of the protocol.
259 We want to add a management system that allows you to
260 control your databases and configuration tables from a graphical
268 Programmers thrive on user feedback. If you are interested in a
269 facility that you don't see mentioned here, or if there's something
270 you think we could do better, please drop us a mail.
271 If you think it's all really neat, you're welcome to drop us a line
272 saying that, too. You'll find contact info at the end of this file.
277 <!-- Keep this comment at the end of the file
282 sgml-minimize-attributes:nil
283 sgml-always-quote-attributes:t
286 sgml-parent-document: "zebra.xml"
287 sgml-local-catalogs: nil
288 sgml-namecase-general:t