1 <chapter id="introduction">
2 <!-- $Id: introduction.xml,v 1.8 2002-08-27 07:49:23 mike Exp $ -->
3 <title>Introduction</title>
6 <title>Overview</title>
9 <ulink url="http://www.indexdata.dk/zebra/">
11 is a high-performance, general-purpose structured text
12 indexing and retrieval engine. It reads structured records in a
13 variety of input formats (eg. email, XML, MARC) and allows access
14 to them through exact boolean search expressions and
15 relevance-ranked free-text queries.
19 Zebra supports large databases (more than ten gigabytes of data,
20 tens of millions of records). It supports safe, incremental
21 database updates on live systems. You can access data stored in
22 Zebra using a variety of Index Data tools (eg. YAZ and PHP/YAZ) as
23 well as commercial and freeware Z39.50 clients and toolkits.
27 This document is an introduction to the Zebra system. It will tell you
28 how to compile the software, and how to prepare your first database.
29 It also explains how the server can be configured to give you the
30 functionality that you need.
35 If you find the software interesting, you should visit the
36 <ulink url="http://www.indexdata.dk/zebra/">
37 Zebra web site</ulink>, where you can join the
38 <ulink url="http://www.indexdata.dk/mailman/listinfo/zebralist">
46 <title>Features</title>
49 This is an overview of some of the most important features of the
58 Supports large databases - files for indices, etc. can be
59 automatically partitioned over multiple disks.
65 Supports arbitrarily complex records - base input format is an
66 SGML-like syntax which allows nested (structured) data elements, as
67 well as variant forms of data.
73 Robust updating - records can be added and deleted without
74 rebuilding the index from scratch.
75 The update procedure is tolerant to crashes or hard interrupts
76 during register updating - registers can be reconstructed following
78 Registers can be safely updated even while users are accessing
85 Supports random storage formats. A system of input filters driven by
86 regular expressions allows you to easily process most ASCII-based
87 data formats. SGML, XML, ISO2709 (MARC), and raw text are also
94 Supports boolean queries as well as relevance-ranking (free-text)
95 searching. Right truncation and masking in terms are supported, as
96 well as full regular expressions.
102 Can import the data into Zebras own storage, or just refer to
103 external files (good for building indexes of "live"
110 Supports multiple concrete syntaxes
111 for record exchange (depending on the configuration): GRS-1, SUTRS,
112 XML, ISO2709 (*MARC). Records can be mapped between record syntaxes
113 and schema on the fly.
119 Supports approximate matching in registers (ie. spelling mistakes,
126 Zebra is written in portable C, so it runs on most Unix-like systems
127 as well as Windows NT - a binary distribution for Windows NT is available.
136 Z39.50 protocol support:
143 Protocol facilities: Init, Search, Retrieve, Delete, Browse and Sort.
149 Piggy-backed presents are honored in the search-request.
155 Named result sets are supported.
160 Easily configured to support different application profiles, with
161 tables for attribute sets, tag sets, and abstract syntaxes.
162 Additional tables control facilities such as element mappings to
163 different schema (eg., GILS-to-USMARC).
169 Complex composition specifications using Espec-1 are partially
170 supported (simple element requests only).
176 Element Set Names are defined using the Espec-1 capability of the
177 system, and are given in configuration files as simple element
178 requests (and possibly variant requests).
189 <title>Applications</title>
191 Zebra has been deployed in numerous applications, in both the
192 academic and commercial worlds, in application domains as diverse
193 as bibliographic information, geospatial, ### (Help, guys!)
196 Notable applications include the following:
200 <title>DADS - the DTV Article Database Service</title>
202 DADS is a huge database of ### records, allowing students and
203 researchers at DTU (###) to search and order articles from several
204 different databases at once. The database contains
205 literature on all engineering subjects. It's available on-line
206 through a web gateway at
207 http://www.dtv.dk/search/index_e.htm
208 though only to members of the university.
211 ### Much more information needed.
217 <title>Future Work</title>
220 These are some of the plans that we have for the software in the near
221 and far future, approximately ordered after their relative importance.
229 Improved support for XML in search and retrieval. Eventually,
230 the goal is for Zebra to pull double duty as a flexible
231 information retrieval engine and high-performance XML
238 Access to search engine through SOAP/RPC API to allow the
239 construction of applications without requiring Z39.50 tools.
245 Finalisation, documentation of the Zebra API. Consider
246 exposing the API through SOAP as well (allowing updates,
247 database management).
253 Improved free-text searching. We're first and foremost octet jockeys and
254 we're actively looking for organisations or people who'd like
255 to contribute experience in relevance ranking and text
264 Programmers thrive on user feedback. If you are interested in a
265 facility that you don't see mentioned here, or if there's something
266 you think we could do better, please drop us a mail.
267 If you think it's all really neat, you're welcome to drop us a line
268 saying that, too. You'll find contact info at the end of this file.
273 <!-- Keep this comment at the end of the file
278 sgml-minimize-attributes:nil
279 sgml-always-quote-attributes:t
282 sgml-parent-document: "zebra.xml"
283 sgml-local-catalogs: nil
284 sgml-namecase-general:t