X-Git-Url: http://lists.indexdata.com/cgi-bin?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=b067c98a4a6ce946896319d0161f46055a347590;hb=fa91adf8d4d03f1b04c2ad3e4be9bc9a487e9a51;hp=f1a7964c130122dae2de2ad2317ad2a8f59c15c9;hpb=f5497e3c3bcee00bc10b576506b5d7901bb7b3bd;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index f1a7964..b067c98 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model @@ -23,6 +23,8 @@ + + Prefix Query Format (PQF) @@ -98,7 +100,7 @@ may start with one specification of the attribute set used. Following is a query tree, which - consists of atomic query parts, eventually + consists of atomic query parts (APT), eventually paired by boolean binary operators, and finally recursively combined into complex query trees. @@ -119,7 +121,9 @@ - +
+ - + - + - + @@ -159,7 +163,9 @@ using the standard boolean operators into new query trees. -
Attribute sets predefined in Zebra
exp-1exp-1 Explain attribute set Special attribute set used on the special automagic IR-Explain-1 database to gain information on @@ -136,7 +140,7 @@ and semantics.
bib-1bib-1 Bib1 attribute set Standard PQF query language attribute set which defines the semantics of Z39.50 searching. In addition, all of the @@ -144,7 +148,7 @@ processing
gilsgils GILS attribute set Extention to the Bib1 attribute set.
+
+ - + - + - + - + - - + + - + - + @@ -1147,8 +1491,8 @@ The above operands can be combined with the following operators: - -
Boolean operators
@and
@and binary AND operator Set intersection of two atomic queries hit sets
@or
@or binary OR operator Set union of two atomic queries hit sets
@not
@not binary AND NOT operator Set complement of two atomic queries hit sets
@prox
@prox binary PROXIMY operator Set intersection of two atomic queries hit sets. In addition, the intersection set is purged for all @@ -237,12 +243,13 @@ - Atomic queries + Atomic queries (APT) Atomic queries are the query parts which work on one acess point only. These consist of an attribute list followed by a single term or a - quoted term list. + quoted term list, and are often called + Attributes-Plus-Terms (APT) queries. Unsupplied non-use attributes type 2-9 are either inherited from @@ -250,7 +257,9 @@ See for details. - +
+ + + Relation attributes describe the relationship of the access + point (left side + of the relation) to the search term as qualified by the attributes (right + side of the relation), e.g., Date-publication <= 1975. + - All operations are based on a lexicographical ordering, - expect in the case for the - following structure attributes: numeric(109). - +
Atomic queries
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Relation Attributes (type 2)
RelationValueNotes
Less than1supported
Less than or equal2supported
Equal3default
Greater or equal4supported
Greater than5supported
Not equal6unsupported
Phonetic100unsupported
Stem101unsupported
Relevance102supported
AlwaysMatches103unsupported
+ + The relation attribute + relevance (102) is supported, see + for full information. + + + + All ordering operations are based on a lexicographical ordering, + expect when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See + . + + + Ranked search for information retrieval in - the title-register - (see for the glory details): + the title-register: Z> find @attr 1=4 @attr 2=102 "information retrieval" @@ -633,21 +734,173 @@
- Position Attributes (type = 3) + Position Attributes (type 3) + - Only value of (any position(3) is supported. first in field(1), - and first in subfield(2) are unsupported but using them - does not trigger an error. + The position attribute specifies the location of the search term + within the field or subfield in which it appears. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Position Attributes (type 3)
PositionValueNotes
First in field 1unsupported
First in subfield2unsupported
Any position in field3default
+ + + The position attribute values first in field (1), + and first in subfield(2) are unsupported. + Using them does not trigger an error, but silent defaults to + any position in field (3).
- Structure Attributes (type = 4) - + Structure Attributes (type 4) + + + The structure attribute specifies the type of search + term. This causes the search to be mapped on + different Zebra internal indexes, which must have been defined + at index time. + + + + The possible values of the + structure attribute (type 4) can be defined + using the configuraiton file + tab/default.idx. + The default configuration is summerized in this table. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Structure Attributes (type 4)
StructureValueNotes
Phrase 1default
Word2supported
Key3supported
Year4supported
Date (normalized)5supported
Word list6supported
Date (un-normalized)100unsupported
Name (normalized) 101unsupported
Name (un-normalized) 102unsupported
Structure103unsupported
Urx104supported
Free-form-text105supported
Document-text106supported
Local-number107supported
String108unsupported
Numeric string109supported
+ The structure attribute value local-number + (107) + is supported, and maps always to the Zebra internal document ID. + + + For example, in the GILS schema (gils.abs), the west-bounding-coordinate is indexed as type n, @@ -662,16 +915,88 @@ Truncation Attributes (type = 5) + + + The truncation attribute specifies whether variations of one or + more characters are allowed between serch term and hit terms, or + not. Using non-default truncation attributes will broaden the + document hit set of a search query. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Truncation Attributes (type 5)
TruncationValueNotes
Right truncation 1supported
Left truncation2supported
Left and right truncation3supported
Do not truncate100default
Process # in search term101supported
RegExpr-1 102supported
RegExpr-2103supported
+ - Supported are: No truncation(100) which is the default, - Right trunation(1), Left truncation(2), - Left&Right truncation(3), - Process # in term(100) which maps - each # to .*, - Regexp-1(102) normal regular, Regexp-2(103) (regular with fuzzy), + Truncation attribute value + Process # in search term (100) is a + poor-man's regular expression search. It maps + each # to .*, and + performes then a Regexp-1 (102) regular + expression search. + + + Truncation attribute value + Regexp-1 (102) is a normal regular search, + see. + + + Truncation attribute value + Regexp-2 (103) is a Zebra specific extention + which allows fuzzy matches. One single + error in spelling of search terms is allowed, i.e., a document + is hit if it includes a term which can be mapped to the used + search term by one character substitution, addition, deletion or + change of posiiton. + -
@@ -697,38 +1022,46 @@ set used in a search operation query. - +
+ - + + - + + - + + - + + - + + - + + @@ -844,7 +1177,8 @@ Zebra Extention Term Reference Attribute (type 10) - Zebra supports the searchResult-1 facility. If attribute 10 is + Zebra supports the searchResult-1 facility. + If the Term Reference Attribute (type 10) is given, that specifies a subqueryId value returned as part of the search result. It is a way for a client to name an APT part of a query. @@ -870,36 +1204,42 @@ recognized regardless of attribute set used in a scan operation query. -
Zebra Search Attribute Extentions
Name and TypeNameValue Operation Zebra version
Embedded Sort (type 7)Embedded Sort7 search 1.1
Term Set (type 8)Term Set8 search 1.1
Rank weight (type 9)Rank Weight9 search 1.1
Approx Limit (type 9)Approx Limit9 search 1.4
Term Reference (type 10)Term Reference10 search 1.4
+
+ - + + - + + - + +
Zebra Scan Attribute Extentions
Name and TypeNameType Operation Zebra version
Result Set Narrow (type 8)Result Set Narrow8 scan 1.3
Approximative Limit (type 9)Approximative Limit9 scan 1.4
- + Zebra Extention Result Set Narrow (type 8) - If attribute 8 is given for scan, the value is the name of a - result set. Each hit count in scan is @and'ed with the result set - given. + If attribute Result Set Narrow (type 8) + is given for scan, the value is the name of a + result set. Each hit count in scan is + @and'ed with the result set given.
xMatches the character x.xMatches the character x.
.. Matches any character.
[ .. ][ .. ] Matches the set of characters specified; such as [abc] or [a-c].
+
- - + - - + - - + - - + - - + - +
Regular Expression Operators
x*Matches x zero or more times. + x*Matches x zero or more times. Priority: high.
x+Matches x one or more times. + x+Matches x one or more times. Priority: high.
x? Matches x zero or once. + x? Matches x zero or once. Priority: high.
xy Matches x, then y. + xy Matches x, then y. Priority: medium.
x|y Matches either x or y. + x|y Matches either x or y. Priority: low.
( )( ) The order of evaluation may be changed by using parentheses.
- + - If the first character of the Regxp-2 query + If the first character of the Regxp-2 query is a plus character (+) it marks the beginning of a section with non-standard specifiers. The next plus character marks the end of the section. @@ -1213,8 +1557,7 @@ Combinations with other attributes are possible. For example, a - ranked search with a regular expression - (see for the glory details): + ranked search with a regular expression: Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval" @@ -1229,7 +1572,7 @@ process input records. Two basic types of processing are available - raw text and structured data. Raw text is just that, and it is selected by providing the - argument text to Zebra. Structured records are + argument text to Zebra. Structured records are all handled internally using the basic mechanisms described in the subsequent sections. Zebra can read structured records in many different formats.