1 <chapter id="architecture">
2 <!-- $Id: architecture.xml,v 1.5 2006-02-16 16:50:18 marc Exp $ -->
3 <title>Overview of Zebra Architecture</title>
6 <sect1 id="architecture-representation">
7 <title>Local Representation</title>
10 As mentioned earlier, Zebra places few restrictions on the type of
11 data that you can index and manage. Generally, whatever the form of
12 the data, it is parsed by an input filter specific to that format, and
13 turned into an internal structure that Zebra knows how to handle. This
14 process takes place whenever the record is accessed - for indexing and
19 The RecordType parameter in the <literal>zebra.cfg</literal> file, or
20 the <literal>-t</literal> option to the indexer tells Zebra how to
21 process input records.
22 Two basic types of processing are available - raw text and structured
23 data. Raw text is just that, and it is selected by providing the
24 argument <emphasis>text</emphasis> to Zebra. Structured records are
25 all handled internally using the basic mechanisms described in the
27 Zebra can read structured records in many different formats.
29 How this is done is governed by additional parameters after the
30 "grs" keyword, separated by "." characters.
35 <sect1 id="architecture-maincomponents">
36 <title>Main Components</title>
38 The Zebra system is designed to support a wide range of data management
39 applications. The system can be configured to handle virtually any
40 kind of structured data. Each record in the system is associated with
41 a <emphasis>record schema</emphasis> which lends context to the data
42 elements of the record.
43 Any number of record schemas can coexist in the system.
44 Although it may be wise to use only a single schema within
45 one database, the system poses no such restrictions.
48 The Zebra indexer and information retrieval server consists of the
49 following main applications: the <literal>zebraidx</literal>
50 indexing maintenance utility, and the <literal>zebrasrv</literal>
51 information query and retireval server. Both are using some of the
52 same main components, which are presented here.
55 This virtual package installs all the necessary packages to start
56 working with Zebra - including utility programs, development libraries,
57 documentation and modules.
58 <literal>idzebra1.4</literal>
61 <sect2 id="componentcore">
62 <title>Core Zebra Module Containing Common Functionality</title>
64 - loads external filter modules used for presenting
65 the recods in a search response.
66 - executes search requests in PQF/RPN, which are handed over from
67 the YAZ server frontend API
68 - calls resorting/reranking algorithms on the hit sets
69 - returns - possibly ranked - result sets, hit
70 numbers, and the like internal data to the YAZ server backend API.
73 This package contains all run-time libraries for Zebra.
74 <literal>libidzebra1.4</literal>
75 This package includes documentation for Zebra in PDF and HTML.
76 <literal>idzebra1.4-doc</literal>
77 This package includes common essential Zebra configuration files
78 <literal>idzebra1.4-common</literal>
83 <sect2 id="componentindexer">
84 <title>Zebra Indexer</title>
86 the core Zebra indexer which
87 - loads external filter modules used for indexing data records of
89 - creates, updates and drops databases and indexes
92 This package contains Zebra utilities such as the zebraidx indexer
93 utility and the zebrasrv server.
94 <literal>idzebra1.4-utils</literal>
98 <sect2 id="componentsearcher">
99 <title>Zebra Searcher/Retriever</title>
101 the core Zebra searcher/retriever which
104 This package contains Zebra utilities such as the zebraidx indexer
105 utility and the zebrasrv server, and their associated man pages.
106 <literal>idzebra1.4-utils</literal>
110 <sect2 id="componentyazserver">
111 <title>YAZ Server Frontend</title>
113 The YAZ server frontend is
114 a full fledged stateful Z39.50 server taking client
115 connections, and forwarding search and scan requests to the
119 In addition to Z39.50 requests, the YAZ server frontend acts
120 as HTTP server, honouring
121 <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink> SOAP requests, and <ulink url="http://www.loc.gov/standards/sru/">SRU</ulink> REST requests. Moreover, it can
122 translate inco ming <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> queries to PQF/RPN queries, if
123 correctly configured.
126 YAZ is a toolkit that allows you to develop software using the
127 ANSI Z39.50/ISO23950 standard for information retrieval.
128 <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>/ <ulink url="http://www.loc.gov/standards/sru/">SRU</ulink>
129 <literal>libyazthread.so</literal>
130 <literal>libyaz.so</literal>
131 <literal>libyaz</literal>
135 <sect2 id="componentmodules">
136 <title>Record Models and Filter Modules</title>
138 all filter modules which do indexing and record display filtering:
139 This virtual package contains all base IDZebra filter modules. EMPTY ???
140 <literal>libidzebra1.4-modules</literal>
143 <sect3 id="componentmodulestext">
144 <title>TEXT Record Model and Filter Module</title>
146 Plain ASCII text filter
148 <literal>text module missing as deb file<literal>
153 <sect3 id="componentmodulesgrs">
154 <title>GRS Record Model and Filter Modules</title>
156 <xref linkend="record-model-grs"/>
158 - grs.danbib GRS filters of various kind (*.abs files)
159 IDZebra filter grs.danbib (DBC DanBib records)
160 This package includes grs.danbib filter which parses DanBib records.
161 DanBib is the Danish Union Catalogue hosted by DBC
162 (Danish Bibliographic Centre).
163 <literal>libidzebra1.4-mod-grs-danbib</literal>
168 This package includes the grs.marc and grs.marcxml filters that allows
169 IDZebra to read MARC records based on ISO2709.
171 <literal>libidzebra1.4-mod-grs-marc</literal>
174 - grs.tcl GRS TCL scriptable filter
175 This package includes the grs.regx and grs.tcl filters.
176 <literal>libidzebra1.4-mod-grs-regx</literal>
180 <literal>libidzebra1.4-mod-grs-sgml not packaged yet ??</literal>
183 This package includes the grs.xml filter which uses <ulink url="http://expat.sourceforge.net/">Expat</ulink> to
184 parse records in XML and turn them into IDZebra's internal grs node.
185 <literal>libidzebra1.4-mod-grs-xml</literal>
189 <sect3 id="componentmodulesalvis">
190 <title>ALVIS Record Model and Filter Module</title>
192 <xref linkend="record-model-alvisxslt"/>
193 - alvis Experimental Alvis XSLT filter
194 <literal>mod-alvis.so</literal>
195 <literal>libidzebra1.4-mod-alvis</literal>
199 <sect3 id="componentmodulessafari">
200 <title>SAFARI Record Model and Filter Module</title>
204 <literal>safari module missing as deb file<literal>
214 <sect1 id="architecture-workflow">
215 <title>Indexing and Retrieval Workflow</title>
218 Records pass through three different states during processing in the
228 When records are accessed by the system, they are represented
229 in their local, or native format. This might be SGML or HTML files,
230 News or Mail archives, MARC records. If the system doesn't already
231 know how to read the type of data you need to store, you can set up an
232 input filter by preparing conversion rules based on regular
233 expressions and possibly augmented by a flexible scripting language
235 The input filter produces as output an internal representation,
243 When records are processed by the system, they are represented
244 in a tree-structure, constructed by tagged data elements hanging off a
245 root node. The tagged elements may contain data or yet more tagged
246 elements in a recursive structure. The system performs various
247 actions on this tree structure (indexing, element selection, schema
255 Before transmitting records to the client, they are first
256 converted from the internal structure to a form suitable for exchange
257 over the network - according to the Z39.50 standard.
269 <!-- Keep this comment at the end of the file
274 sgml-minimize-attributes:nil
275 sgml-always-quote-attributes:t
278 sgml-parent-document: "zebra.xml"
279 sgml-local-catalogs: nil
280 sgml-namecase-general:t