1 $Id: ONEWS,v 1.1 2007-01-22 11:02:12 mike Exp $
3 Archived news for pre-2.0.0 Zebra. The Zebra source code was forked
4 after release 1.3.16: development on the 1.3.x branch continuing as
5 documented in the post-1.3.16 entries in this file. In the mean time,
6 the other branch was initially known as the 1.4.x branch and
7 eventually became the 2.0.x branch. Release news subsequent to 2.0.0
8 is in the NEWS file. During the period between the fork at 1.3.16 and
9 the formal release of 2.0.0, some but not all of the subsequent 1.3.x
10 changes were also made in the 1.4.x (= 2.0.x) branch, so some but not
11 all of the post-1.3.16 changes listed in this file also apply to
12 2.0.0. One day we'll annotate which do and don't apply.
16 Update code to use new YAZ log functions/defines.
20 Updated to use YLOG_-defines for YAZ, rather than the obsolte LOG-defines.
24 Fixed several compilation warnings. (gcc 4.1.2, -O3 -g -Wall)
26 Fixed bug #710: Duplicate keys for CDATA in xelm/melm rules.
30 Fixed bug #529: multiple simultaneous updates by extended services trashes
35 Fixed bug #47: Commit needs to check for roll-back.
37 Fixed bug #672: Trailing characters in password are ignored
39 Added extra presence check for tcl.h, because some systems have
40 tclConfig.sh installed even though Tcl C headers are missing.
44 Optimized melm performance.
46 Do not use sync(2) during commit (but rely on sync'd individual files).
50 Updated Debian package to use sym links for zebraidx, zebrasrv.
52 Fixed bug in isamb (isam:b) regarding ISAM tree splitting.
54 Fixed problem with file locking during commit phase. This error was
59 Fixed bug #602: Any extra terms appended to a wordlist with a hitcount
60 of 0, gets incorrect term count
62 Fixed bug #597: Support null missing key for sort.
66 Fixed bug #465: dup fields in ISO2709 retrieval.
68 Another fix for X-Path indexing. The previous fix, unfortunately, indexed
73 Fix X-Path attribute indexing. Bug #431.
77 Fixed bug #415: Strange truncation behavior.
79 Added 'melm' directive to absyn format to simplify config files
80 for MARC-style databases. See tab/marc21.abs for an example.
82 Added support for special slement set _sysno_ which returns a
83 record ID for a record packed as a SUTRS record.
87 Documented authentication facility in Zebra. Added zebra.cfg directive
88 'passwd.c' which specifies user accounts file with encrypted passwords. The
89 directive 'passwd' specifies user accounts file with clear-text passwords.
90 The previous version of Zebra used plain/clear text depending on
91 configuration automatically. That caused upgrade trouble. Bug #356.
95 Depend on YAZ 2.0.18 or later in configure.
97 Fixed crash that could occur if ES update transaction failed.
99 Configure enables the use of the crypt API - if available.
101 Fixed bug #304: Fuzzy search regExpr-2 did not use proper error distance
104 Fixed bug #305: Scan now handles negative preferred-position-in-response.
106 Implemented the 'equivalent' directive for .chr-files.
108 --- 1.3.24 2005/02/09
110 For configure, support threading again. It was removed by mistake
113 Fixed bug #262: spaces in control fields in MARC returned.
115 Fixed bug #259: Second indicator lost in MARC records
117 --- 1.3.22 2005/01/23
119 Fixed bug #253: Setting group.database not honored
121 Fixed bug #252: Sort does not work.
123 Fixed bug #248: hit counts in combinatoric (and) searches in specific ..
125 --- 1.3.20 2005/01/17
127 Fixed bug #245: Time for getting records changes a lot based on record
130 Fixed bug #235: weird x-path results.
132 Avoid crash in ISAMB when isamb_pp_open gets ISAMB_POS = 0. In
133 this case EOF (no entries) is signalled. It fixes problem with
134 terms being deleted. See bug #109.
136 Fixed bug #169: Phrase term counts does not work. The bug exists in
139 Added mechanism to ignore leading articles when doing full-field indexing,
140 based on the character map files. See the manual for further discussion.
142 --- 1.3.18 2004/08/20
144 Fixed bug in record management. Releasing blocks could result in
147 Fixed bug in isam:b. A tree split could result in a lost item.
149 --- 1.3.17 2004/08/17
151 Add IDZebra.i to dist so that Perl extension builds again.
153 --- 1.3.16 2004/08/16
155 Added facility to make attibutes in grs.regx and grs.tcl filter using the
156 data command with argument -attribute <name> . The content of data is
157 the value of the attribute. This command should be used inside a
158 begin element , end element section.
160 Update zebra.nsi to NSIS 2.
162 Added a new 'cut' directive to charmaps (.chr files) which specifies that
163 only characters after the cutting char should be indexed.
165 Update Perl internals so that it matches the current Zebra API.
166 The recordGroup structure is no longer available. A group of resources
167 can still be referenced by setting groupName=>.. in various methods.
169 Maximum number of records to be sorted in a result set can be
170 specified by setting "sortmax". Default is 1000.
172 Allow use of string use attributes for regular attribute sets. The
173 name matches the name given in the attribute set file. All strings
174 starting with / are considered X-Path as usual.
176 Fixed bug in grs.regx. filter . 'end element' could pop off top tag
177 element for XML tree. It may only pop off if -record is given.
179 Added grs.danbib filter - for Danish Bibliographic Centre.
181 Rename CHANGELOG to NEWS.
183 For text filter, return only header if elementSetName=H . elementSetName=R
184 returns contents only. Other elementSetName returns both header+content.
186 Added test for charmap and rusmarc.
188 Added feature charmaps (.chr) so that characters may be specified in
191 Fixed problem with encoding directive for charmap(.chr) files.
193 Allow Remote insert/delete/replace/update with record, recordIdNumber
194 (sysno) and/or recordIdOpaque(user supplied record Id). If both
195 IDs are omitted internal record ID match is assumed (recordId: - in
198 --- 1.3.15 2004/01/15
200 Fix bug. X-Path attribute expressions with spaces in them now works.
202 Fix base address for MARC output.
204 --- 1.3.14 2003/11/29
206 Fix bug with shadow and result set handling.
208 Implement MARCXML to ISO2709 conversion.
210 --- 1.3.13 2003/09/26
212 Add missing examples for Windows install.
214 Fix bug in regx filter to make it "greedy" again. This bug appeared
219 --- 1.3.12 2003/09/08
221 Fix XML error handling. Stop XML parse immediately if XML parse error
222 occur (i.e. produce one error only).
224 Zebra ignores "unsupported use attribute" for individual databases
225 when search multiple databases (unless all databases fail).
227 New filter grs.marcxml which works like grs.marc but produces MARCXML.
229 Added support for database deletion. It is possible to create/drop
230 a database from zebraidx utility. Note: only for isam:b.
232 Write zebrasrv.pid to lockdir.
234 Bug fix: result sets were not recovered correctly. Had to
235 add ODR handle for zebra_search_RPN in order to make it work.
237 Fixed a bug in regx filters that didn't do anchors (^) correctly.
239 Fixed a bug in searches with X-Path searches sometimes giving "extra"
242 Zebra server checks for zebrasrv.pid and refuses to start if it is already
243 locked by another (running) zebrasrv.
245 Fixed a bug with text being chunked in pieces for the grs.xml filter.
247 --- 1.3.11 2003/04/25
249 xelm code updates. xelm works regardless state of 'xpath enable/disable'
250 Avoid -L/usr/lib since that is already default library path.
252 Allow multiple updates within one transaction.
254 Fixed a bug with >2GB files (overflow in integer expression).
256 --- 1.3.10 2003/04/01
258 Fix linker error for Perl module.
260 Fix bug in and operation which in some cases could result in "extra"
261 hits. Bug was introduced in 1.3.5.
263 Fix bug in handling of schema conversion when producing numeric tags.
269 Add missing files doc/zvrank.txt and doc/marc_indexing.xml.
273 Zvrank: an experimental ranking algorithm. See doc/zvrank.txt and
274 source in index/zvrank.c. Enable this by using rank: zvrank in zebra.cfg.
275 Contributed by Johannes Leveling <Johannes.Leveling at fernuni-hagen.de>
277 livrank: another experimental ranking algorithm. Source in livcode.c.
278 Enable this by using rank: livrank in zebra.cfg and use -DLIV_CODE=1
280 Contributed by Pete Mallinson, University of Liverpool.
282 Advanced MARC indexing. See doc/marc_indexing.xml
283 Oleg Kolobov <oleg at lib.tpu.ru>
285 Perl API updates and fixes.
286 Peter Popovics <pop at technomat.hu>
288 Fixed 'zebraidx delete'.
290 Implemented 'zebraidx clean'.
292 64-bit offsets for register files on WIN32 (no 2 GB limit).
294 Fixed a few memory leaks WRT sorting.
298 Fixed error handling : error code was not properly returned.
300 Support Truncation 104 (CCL).
304 Added missing source files for perl extension.
308 Implemented xelm directive.
310 Updated for newer version of YAZ (introduction of string schema).
312 Directory examples/zthes now part of distribution (was missing
313 in previous release).
315 New .abs directive, systag, that control where to put retrieval
316 information. The directive takes two arguments: system tag, element name.
317 System tag is one of : rank, sysno, size.
321 Perl Filter and Perl API. By Peter Popovics.
323 For zebra.cfg, if no profilePath is specified, directory
324 (prefix)/share/idzebra/tab
327 Zebra Examples in examples . Zebra tests in test.
329 Bug fix: sort index was not properly modified on
330 record updates/deletes.
332 Fix handling of character entities for sgml filter.
334 Move data1 to Zebra (used to be part of YAZ).
338 Fix character encoding of scan response terms.
340 Fix character decoding of scan request terms.
342 Fix ESpec handling (requires YAZ 1.9.1)
344 Fix searches for complete fields.
348 When name zebra is used in a filename or directory 'idzebra' is used
349 instead to avoid confusion with GNU zebra (routing software).
351 Zebra server stops with a fatal error if config file cannot be read.
353 New config setting, followLinks, that controls whether update of files
354 should follow symbolic. Set it to 1 (for enable) or 0 (to disable).
355 By default symbolic links are followed.
357 Fix MARC transfer . MARC fields had wrong data for multiple fields.
359 XML record reader moved from YAZ to Zebra, to make YAZ less
360 dependant on external libraries.
362 Zebra uses yaz_iconv which is mini iconv library supporting UTF-8,
363 UCS4, ISO-8859-1. This means that Zebra does UNICODE even
364 on systems that doesn't offer iconv.
366 XML record reader supports external system entities.
370 New .abs-directive "xpath" that takes one argument: "enable"
371 or "disable" to enable and disable XPath -indexing. If no "xpath"
372 direcive is found in .abs-file , XPath-indexing is disabled to ensure
373 backwards compatibility. For missing .abs-files XPath-indexing is
374 enabled so that such records are searchable.
376 Zebra warns about missing .abs-file only once (for each type).
378 Fixed a bug in file update where already-inserted files could
383 Zebra license changed to GNU GPL.
385 XPath-like queries used when RPN string attributes are used, eg.
386 @attr 1=/portal/title sometitle
387 @attr 1=/portal/title[@xml:lang=da] danishtitle
388 @attr 1=/portal/title/@xml:lang da
389 @attr 1=//title sometitle
391 Zebra uses UTF-8 internally:
392 1) New setting "encoding" for zebra.cfg that specifies encoding for
393 OCTET terms in queries and record encoding for most transfer syntaxes
394 (except those that use International Strings, such as GRS-1).
395 2) The encoding of International strings is UTF-8 by default. It
396 may be changed by character set negotiation. If character set
397 negotiation is in effect and if records are selected for conversion
398 these'll be converted to the selected character set - thus overriding
399 the encoding setting in zebra.cfg.
400 3) New directive "encoding" in .abs-files. This specifies the external
401 character encoding for files indexed by zebra. However, if records
402 themselves have an XML header that specifies and encoding that'll be used
405 XML filter (-t grs.xml).
407 Multiple registers. New setting in resource 'root' that holds base
408 directory for register(s). A group a databases may be put in separate
409 register in directory root/reg by using db name 'reg/db1' ... 'reg/dbN'.
413 Fixes for Digital Unix
415 Implemented hits per term using USR:SearchResult-1.
417 New Zebra API. Locking system re-implemented.
419 --- 1.1.stable 2002/02/20
421 Rank weight can be controlled with attribute type 9. Default
422 value is 34. Recommended values between 1-36.
426 Updated for YAZ version 1.8.
428 Added support for termsets - a result set of terms matching
429 a given query. For @attr 8=<set> creates termset named <set>.
431 Added support for raw retrieval. Element Set Name R forces the
432 text filter which returns the record in its original form.
434 Added numerical sort - triggered by structure=numeric (4=109).
436 Remote record import using Z39.50 Extended Services and Segments.
438 Fixed bug where updating a database with user-defined attributes
439 could corrupt the register (bad storeKeys).
441 Multi-threaded version.
443 Fixed bug regarding proximity.
445 Documentation updates.
447 Fixed bug in record retrieval module that occured on 64-bit OSF
452 Fixed bug in makefile for WIN32.
454 Fixed bug in configure script - used bash-specific features.
458 Added support for multiple records in one file for filter grs.sgml.
460 Changed record index structure. New layout is incompatible with
461 previous releases. Added setting "recordcompression" to control
462 compression of records. Possible values are "none" (no
463 compression) and bzip2 (compression using libbz2).
465 Added XML transfer syntax support for retrieval of structured records.
466 Schema in CompSpec is recognised in retrieval of structured records.
468 Changed Tcl record filter so that it attemps to read <filt>.tflt. If
469 that fails, the filter reads the file <filt>.flt (regx style filter).
471 Implemented new Tcl record filter - use grs.tcl.<filter> to enable it.
472 Zebra's configure script automatically attempts to locate Tcl. For
473 manual Tcl configuration use option --with-tclconfig=<path> to specify
474 where Tcl's library files are located.
476 Implemented "compression" of Dictionary and ISAM system. Dictionary
479 Added "tagsysno" directive to zebra.cfg to control under which tag the
480 system ID is placed. Use tagsysno: 0 to disable Zebra's system number
483 Added "tagrank" as above.
485 Changed file naming scheme for register files from <name>.mf.<no> to
488 Implemented "position"-flag for register type (as defined in
489 default.idx). When set to zero no position (or seqence number) is
490 saved in register for each word occurrence, thus saving some register
493 Implemented database mapping. Using mapdb one can specify a database
494 to be mapped to one or more physical databases. Usage:
495 mapdb <fromdb> <todb> ..
497 Added SOIF-filter. Thanks to Peter Valkenburg.
499 For the regx-filter "end element -record" may trigger a mark-of-record
500 if outer level is reached.
502 Tag sets may be typed in the reference to it. From the .abs-file the
503 "tagset" directive takes a third optional integer type for the tag set
504 referenced. From a .tag-file the "include" directive takes a third
505 optional type as well. The old "type" directive in the tag set itself
506 is still recognized but acts as the default type for the tag set.
508 Zebra supports the specification of arbitrary attributes sets, schemas
509 and tag sets, because of the change in YAZ' OID management system.
511 Fixed bug in Sort that caused it NOT to use character mapping as it
514 Zebra now uses GNU configure to generate Makefile(s).
516 Added un-optimised support for left and left/right truncation attributes.
518 Added support for relational operators on text when using RPN queries.
520 Added support for sort specifications in RPN queries. Type 7 specifies
521 'sort' where value 1=ascending, value 2=descending. The use attribute
522 specifies the field criteria as usual. The term specifies priority
523 where 0=first, 1=second, ...
525 Changed the way use attributes are specified in the recordId
528 Maximum number of databases in one Zebra register increased.
530 New setting, databasePath, which specifies that first directory during
531 update traversal is the database name (instead of a fixed one).
533 New setting, explainDatabase, which specifies that databases are
536 Modified Zebra so that it works with ASN.1 compiled code for YAZ.
538 Implemented EXPLAIN database maintenance. Zebra automatically
539 generate - and update CategoryList, TargetInfo, DatabaseInfo,
540 AttributeSetInfo and AttributeDetails records at this stage. The
541 records may be transferred as GRS-1, SUTRS or Explain.
543 Fixed register spec so that colon isn't treated as size separator
544 unless followed by [0-9+-] in order to allow DOS drive specifications.
546 Fixed two bugs in ISAMC system.
548 Changed the way Zebra keeps its maintenance information about attribute
549 sets, available attributes, etc.. Records in "SGML" notation using an
550 EXPLAIN schema is now used when appropriate.
552 Bug fix: Index didn't handle update/insert/delete of the same record
553 (i.e. same recordId) in one run (one invocation of zebraidx). Only the
554 first occurence of a record is considered.
556 Most searches now return correct number of hits.
558 New modular ranking system. Interested programmers are encouraged to
559 inspect rank1.c and improve the algorithm.
561 Bug fix: Lock files weren't removed as they should on NT.
563 Implemented Z39.50 Sort. Zebra's sort handler uses use attributes to
564 specify a "sort register". Refer to the gils sample records which refer
565 to index type "s" which is specified as "sort" in the default.idx file.
566 Each sort criteria can either be Ascending or Descending and at most
567 three sort elements can be specified.
569 Bug fix: Character mapping didn't work for text files.
573 Simple ranked searches now return correct number of hits.
575 The test option (-s) only makes a read-lock on the index as well
576 as using read-only operations anywhere.
578 Moved towards generic character mapping. Configuration file default.idx
579 specifies character map files for register types w, p, u, etc.
581 Implemented "begin variant" for the sgml.regx - filter.
583 Fixed a few memory leaks.
585 Added support for C++, headers uses extern "C" for public definitions.
587 Bug fix: The show records facility (-s) only displayed information for
588 the first record in a file (and not for every record in the file).
590 Added option "-f <n>" to limit the logging of record operations. After
591 <n> records has been processed no logging is performed (unless errors
594 Bug fix: the compressed ISAM system didn't handle update operations
597 Added setting, "maxResultSetSize", to hold the number of records to
598 save in a result set.
600 Bug fix: Complete phrase did't work for search operations.
602 Bug fix: temporary result sets weren't deleted.
604 Reduced disk space for saved keys (storeKeys = 1).
606 Added optional, physical ANY (key replication)
608 Implemented proximity operator in search.
610 Bug fix: the path name buffers used by file match traversal routines
611 have been extended to support long file names.
613 New C(ompressed) ISAM system. To enable it, specify "isam: c" in the
614 configuration file. The resulting register without "storeKeys" is about
615 half the size, and the memory used by zebraidx during phase 2 (merge) is
616 reduced to a minimum.
618 Reworked the way Regexp-2 queries with error tolerance are handled and
619 specified. The documentation has been updated accordingly.
621 Bug fix: Zebrasrv didn't search correctly when queries contained masking
622 characters. This bug was introduced in 1.0a8.
624 Zebrasrv now tag records with the proper database name.
626 New settings, memMax and keyTmpDir.
628 Changed name of setting lockDir (previously called lockPath) and
629 setTmpDir (previously called tempSetPath).
631 Generalized and changed record type specifications. In short, there are:
633 grs.sgml structured, "SGML-like" syntax
634 grs.regx.<filter> structured, Regular expression filter
635 grs.marc.<abs> Reads *MARC records in the ISO2709 format. <abs>
636 is the name of an abstract syntax file.
637 Bug fix: Result sets weren't sorted in operations involving boolean
638 operations with "ranked" operands.
642 Added national character-handling subsystem.
646 Small modifications to input filters and profiles.
648 Added support for SOIF syntax (with private OID).
652 Fixed buffer-size problem in indexing.
654 Added compression to temporary files for updating.
656 Added phrase registers.
658 Added dynamic mapping of search attribute to multiple termlists (ANY).
660 Scan support in multiple databases/registers.
662 Configuration settings are case-insensitive and single dash (-)
663 characters are ignored in comparisons.
665 The index processing ignores empty files - warning given.
667 New option to zebraidx (-V) displays version information.
671 Fixed problem in file-update system.
673 Fixed problem in shadow system; register was sometimes corrupted after
678 Fixed problems in the ISAM subsystem. Caused difficulties when updating
681 Fixed small problem in SUTRS-filter. A newline was sometimes inserted before
682 the rank and record number.
684 Fixed bug in the isam subsystem - caused a malfunction when accessing
685 words which occurred more than 10000 times.
687 Distribution should now include YAZ (Z39.50 protocol stack) to simplify
690 Server can now run under inetd. Use option -i, and -w <directory> to
691 set working directory to desired location.
693 New zebraidx command: clean - removes temporary shadow files.
695 Fixed bug in ISAM system. Occurred rarely during register updates.
697 Logging during index merge phase is improved. The remaining running
700 Temporary files generated by zebraidx are removed after each run.
702 Bug fix: Dictionary didn't handle 8-bit characters correctly; was obvious
703 when doing scan operations in dictionaries with European characters.
707 A whole slew of updates, to make the first publicized release. Get the doc
712 Memory-problems in ISAM fixed. More blocktypes added to the default setup
713 to increase performance on larger databases.
715 Various minor changes in data management system.
719 A couple of portability-problems resolved.
721 Changed some malloc() to xmalloc().