1 <chapter id="introduction">
2 <!-- $Id: introduction.xml,v 1.7 2002-08-05 08:27:05 quinn Exp $ -->
3 <title>Introduction</title>
6 <title>Overview</title>
10 <ulink url="http://www.indexdata.dk/zebra/">
12 server is a high-performance, general-purpose structured text
13 indexing and retrieval engine. It reads structured records in a
14 variety of input formats (eg. email, XML, MARC) and allows access
15 to them through exact boolean search expressions and
16 relevance-ranked free-text queries.
20 Zebra supports large databases (more than ten gigabytes of data,
21 tens of millions of records). It supports incremental, safe
22 database updates on live systems. You can access data stored in
23 Zebra using a variety of Index Data tools (eg. YAZ and PHP/YAZ) as
24 well as commercial and freeware Z39.50 clients and toolkits.
28 This document is an introduction to the Zebra system. It will tell you
29 how to compile the software, and how to prepare your first database.
30 It also explains how the server can be configured to give you the
31 functionality that you need.
36 If you find the software interesting, you should visit the
37 <ulink url="http://www.indexdata.dk/zebra/">
38 Zebra web site</ulink>, where you can join the
39 <ulink url="http://www.indexdata.dk/mailman/listinfo/zebralist">
47 <title>Features</title>
50 This is an overview of some of the most important features of the
59 Supports large databases - files for indices, etc. can be
60 automatically partitioned over multiple disks.
66 Supports arbitrarily complex records - base input format is an
67 SGML-like syntax which allows nested (structured) data elements, as
68 well as variant forms of data.
74 Robust updating - records can be added and deleted without
75 rebuilding the index from scratch.
76 The update procedure is tolerant to crashes or hard interrupts
77 during register updating - registers can be reconstructed following
79 Registers can be safely updated even while users are accessing
86 Supports random storage formats. A system of input filters driven by
87 regular expressions allows you to easily process most ASCII-based
88 data formats. SGML, XML, ISO2709 (MARC), and raw text are also
95 Supports boolean queries as well as relevance-ranking (free-text)
96 searching. Right truncation and masking in terms are supported, as
97 well as full regular expressions.
103 Can import the data into Zebras own storage, or just refer to
104 external files (good for building indexes of "live"
111 Supports multiple concrete syntaxes
112 for record exchange (depending on the configuration): GRS-1, SUTRS,
113 XML, ISO2709 (*MARC). Records can be mapped between record syntaxes
114 and schema on the fly.
120 Supports approximate matching in registers (ie. spelling mistakes,
127 Zebra is written in portable C, so it runs on most Unix-like systems
128 as well as Windows NT - a binary distribution for Windows NT is available.
137 Z39.50 protocol support:
144 Protocol facilities: Init, Search, Retrieve, Delete, Browse and Sort.
150 Piggy-backed presents are honored in the search-request.
156 Named result sets are supported.
161 Easily configured to support different application profiles, with
162 tables for attribute sets, tag sets, and abstract syntaxes.
163 Additional tables control facilities such as element mappings to
164 different schema (eg., GILS-to-USMARC).
170 Complex composition specifications using Espec-1 are partially
171 supported (simple element requests only).
177 Element Set Names are defined using the Espec-1 capability of the
178 system, and are given in configuration files as simple element
179 requests (and possibly variant requests).
190 <title>Future Work</title>
193 These are some of the plans that we have for the software in the near
194 and far future, approximately ordered after their relative importance.
202 Improved support for XML in search and retrieval. Eventually,
203 the goal is for Zebra to pull double duty as a flexible
204 information retrieval engine and high-performance XML
211 Access to search engine through SOAP/RPC API to allow the
212 construction of applications without requiring Z39.50 tools.
218 Finalisation, documentation of the Zebra API. Consider
219 exposing the API through SOAP as well (allowing updates,
220 database management).
226 Improved free-text searching. We're first and foremost octet jockeys and
227 we're actively looking for organisations or people who'd like
228 to contribute experience in relevance ranking and text
237 Programmers thrive on user feedback. If you are interested in a
238 facility that you don't see mentioned here, or if there's something
239 you think we could do better, please drop us a mail.
240 If you think it's all really neat, you're welcome to drop us a line
241 saying that, too. You'll find contact info at the end of this file.
246 <!-- Keep this comment at the end of the file
251 sgml-minimize-attributes:nil
252 sgml-always-quote-attributes:t
255 sgml-parent-document: "zebra.xml"
256 sgml-local-catalogs: nil
257 sgml-namecase-general:t