1 <chapter id="introduction">
2 <!-- $Id: introduction.xml,v 1.6 2002-08-02 19:26:55 adam Exp $ -->
3 <title>Introduction</title>
6 <title>Overview</title>
10 <ulink url="http://www.indexdata.dk/zebra/">
12 system is a fielded free-text indexing and retrieval engine with a
13 Z39.50 front-end. You can use our various toolkits or any commercial
14 or free-ware Z39.50 client to access data stored in Zebra.
18 FIXME - not a "first step" but a part of a complete system! -H
22 The Zebra server is our first step towards the development of a fully
23 configurable, open information system. Eventually, it will be paired
24 off with a powerful Z39.50 client to support complex information
25 management tasks within almost any application domain. We're making
26 the server available now because it's no fun to be in the open
27 information retrieval business all by yourself. We want to allow
28 people with interesting data to make their things
29 available in interesting ways, without having to start out
30 by implementing yet another protocol stack from scratch.
34 This document is an introduction to the Zebra system. It will tell you
35 how to compile the software, and how to prepare your first database.
36 It also explains how the server can be configured to give you the
37 functionality that you need.
42 If you find the software interesting, you should visit the
43 <ulink url="http://www.indexdata.dk/zebra/">
44 Zebra web site</ulink>, where you can join the
45 <ulink url="http://www.indexdata.dk/mailman/listinfo/zebralist">
53 <title>Features</title>
56 This is a list of some of the most important features of the
65 Supports large databases - files for indices, etc. can be
66 automatically partitioned over multiple disks.
72 Supports arbitrarily complex records - base input format is an
73 SGML-like syntax which allows nested (structured) data elements, as
74 well as variant forms of data.
80 Robust updating - records can be added and deleted without
81 rebuilding the index from scratch.
82 The update procedure is tolerant to crashes or hard interrupts
83 during register updating - registers can be reconstructed following
85 Registers can be safely updated even while users are accessing
92 Supports random storage formats. A system of input filters driven by
93 regular expressions allows you to easily process most ASCII-based
94 data formats. SGML, XML, ISO2709 (MARC), and raw text are also
101 Supports boolean queries as well as relevance-ranking (free-text)
102 searching. Right truncation and masking in terms are supported, as
103 well as full regular expressions.
109 Can import the data into Zebras own storage, or just refer to
110 external files (html pages).
116 Supports multiple concrete syntaxes
117 for record exchange (depending on the configuration): GRS-1, SUTRS,
118 XML, ISO2709 (*MARC). Records can be mapped between record syntaxes
119 and schema on the fly.
125 Supports approximate matching in registers (ie. spelling mistakes,
132 Zebra is written in portable C, so it runs on most Unix-like systems
133 as well as Windows NT - a binary distribution for Windows NT is available.
149 Protocol facilities: Init, Search, Retrieve, Delete, Browse and Sort.
155 Piggy-backed presents are honored in the search-request.
161 Named result sets are supported.
166 Easily configured to support different application profiles, with
167 tables for attribute sets, tag sets, and abstract syntaxes.
168 Additional tables control facilities such as element mappings to
169 different schema (eg., GILS-to-USMARC).
175 Complex composition specifications using Espec-1 are partially
176 supported (simple element requests only).
182 Element Set Names are defined using the Espec-1 capability of the
183 system, and are given in configuration files as simple element
184 requests (and possibly variant requests).
195 <title>Future Work</title>
198 These are some of the plans that we have for the software in the near
199 and far future, approximately ordered after their relative importance.
201 asterisk will be implemented before the
203 FIXME - What are the current plans?
211 *Finalize the data element <emphasis>include</emphasis> facility
212 to support multimedia data elements in records.
218 Add more sophisticated relevance ranking mechanisms.
219 Add support for soundex and stemming.
220 Add relevance <emphasis>feedback</emphasis> support.
226 Complete EXPLAIN support.
232 Add support for very large records by implementing segmentation and/or
239 Support the Item Update extended service of the protocol.
245 We want to add a management system that allows you to
246 control your databases and configuration tables from a graphical
254 Programmers thrive on user feedback. If you are interested in a
255 facility that you don't see mentioned here, or if there's something
256 you think we could do better, please drop us a mail.
257 If you think it's all really neat, you're welcome to drop us a line
258 saying that, too. You'll find contact info at the end of this file.
263 <!-- Keep this comment at the end of the file
268 sgml-minimize-attributes:nil
269 sgml-always-quote-attributes:t
272 sgml-parent-document: "zebra.xml"
273 sgml-local-catalogs: nil
274 sgml-namecase-general:t