X-Git-Url: http://lists.indexdata.com/cgi-bin?a=blobdiff_plain;f=doc%2Fquerymodel.xml;h=88c2fd7799e247f9968d09d4c746034e12eab9e1;hb=92e6edd8ee7ab998fe525ade78ae7859533bdd56;hp=d359d18a40d5168817b97536e859b2a18fed9314;hpb=997db1975fa2132c9bb155b69c86f1310f5136b4;p=idzebra-moved-to-github.git diff --git a/doc/querymodel.xml b/doc/querymodel.xml index d359d18..88c2fd7 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model @@ -149,7 +149,7 @@ - Prefix Query Format structure and syntax + Prefix Query Format syntax and semantics The PQF grammer is documented in the YAZ manual, and shall not be @@ -236,11 +236,9 @@ - The use attributes (type 1) of the predefined attribute sets can - be reconfigured by tweaking the files - tab/*.att. - New attribute sets can be defined by adding similar files in the - configuration path of the server. + The use attributes (type 1) mappings the + predefined attribute sets are found in the + attribute set configuration files tab/*.att. @@ -387,21 +385,21 @@ default index using the default attribite set, the server choice of access point/index, and the default non-use attributes. - Z> find "information" + Z> find information Equivalent query fully specified including all default values: - Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information" + Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information - Finding all documents which have empty titles. Notice that the - empty term must be quoted, but is otherwise legal. + Finding all documents which have the term + debussy in the title field. - Z> find @attr 1=4 "" + Z> find @attr 1=4 debussy @@ -453,7 +451,7 @@ - Zebra's special use attribute type 1 of form 'string' + Zebra's special access point of type 'string' The numeric use (type 1) attribute is usually refered to from a given @@ -494,7 +492,7 @@ - See also for details, and + See also for details, and for the SRU PQF query extention using string names as a fast debugging facility. @@ -502,7 +500,7 @@ - Zebra's special use attribute type 1 of form 'XPath' + <title>Zebra's special access point of type 'XPath' for GRS filters As we have seen above, it is possible (albeit seldom a great @@ -612,8 +610,8 @@ Explain Attribute Set The Z39.50 standard defines the - Explainattribute set - exp-1, which is used to discover information + Explain attribute set + Exp-1, which is used to discover information about a server's search semantics and functional capabilities Zebra exposes a "classic" Explain database by base name IR-Explain-1, which @@ -621,7 +619,7 @@ The attribute-set exp-1 consists of a single - Use (type 1) attribute. + use attribute (type 1). In addition, the non-Use @@ -867,33 +865,81 @@ AlwaysMatches 103 - unsupported + supported + The relation attributes + 1-5 are supported and work exactly as + expected. + All ordering operations are based on a lexicographical ordering, + expect when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See + . + + Z> find @attr 1=Title @attr 2=1 music + ... + Number of hits: 11745, setno 1 + ... + Z> find @attr 1=Title @attr 2=2 music + ... + Number of hits: 11771, setno 2 + ... + Z> find @attr 1=Title @attr 2=3 music + ... + Number of hits: 532, setno 3 + ... + Z> find @attr 1=Title @attr 2=4 music + ... + Number of hits: 11463, setno 4 + ... + Z> find @attr 1=Title @attr 2=5 music + ... + Number of hits: 11419, setno 5 + + + + The relation attribute - relevance (102) is supported, see + Relevance (102) is supported, see for full information. - - - All ordering operations are based on a lexicographical ordering, - expect when the - structure attribute numeric (109) is used. In - this case, ordering is numerical. See - . - + + Ranked search for information retrieval in + the title-register: + + Z> find @attr 1=4 @attr 2=102 "information retrieval" + + - Ranked search for information retrieval in - the title-register: - - Z> find @attr 1=4 @attr 2=102 "information retrieval" - - + The relation attribute + AlwaysMatches (103) is in the default + configuration + supported in conjecture with structure attribute + Phrase (1) (which may be omitted by + default). + It can be configured to work with other structure attributes, + see the configuration file + tab/default.idx and + . + + + AlwaysMatches (103) is a + great way to discover how many documents have been indexed in a + given field. The search term is ignored, but needed for correct + PQF syntax. An empty search term may be supplied. + + Z> find @attr 1=Title @attr 2=103 "" + Z> find @attr 1=Title @attr 2=103 @attr 4=1 "" + + + + @@ -1118,7 +1164,7 @@ The exact mapping between PQF queries and Zebra internal indexes and index types is explained in - . + . @@ -1318,7 +1364,7 @@ The exact mapping between PQF queries and Zebra internal indexes and index types is explained in - . + . @@ -1340,6 +1386,39 @@ idxpath attribute set. + + Zebra specific retrieval of all records + + Zebra defines a hardwired string index name + called _ALLRECORDS. It matches any record + contained in the database, if used in conjunction with + the relation attribute + AlwaysMatches (103). + + + The _ALLRECORDS index name is used for total database + export. The search term is ignored, it may be empty. + + Z> find @attr 1=_ALLRECORDS @attr 2=103 "" + + + + Combination with other index types can be made. For example, to + find all records which are not indexed in + the Title register, issue one of the two + equivalent queries: + + Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 "" + Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 "" + + + + The special string index _ALLRECORDS is + experimental, and the provided functionality and syntax may very + well change in future releases of Zebra. + + + Zebra specific Search Extentions to all Attribute Sets @@ -1404,6 +1483,15 @@ faster and does not require clients to deal with the Sort Facility. + + + All ordering operations are based on a lexicographical ordering, + expect when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See + . + + The possible values after attribute type 7 are 1 ascending and @@ -1633,7 +1721,7 @@ This feature is enabled when defining the xpath enable option in the GRS filter - *.abs configuration files. If one wants to use + *.abs configuration files. If one wants to use the special idxpath numeric attribute set, the main Zebra configuraiton file zebra.cfg directive attset: idxpath.att must be enabled. @@ -1769,100 +1857,249 @@ - - Mapping from Bib1 Attributes to Zebra internal + <sect2 id="querymodel-pqf-apt-mapping"> + <title>Mapping from PQF atomic APT queries to Zebra internal register indexes - TO-DO - - - - + The rules for PQF APT mapping are rather tricky to grasp in the + first place. We deal first with the rules for deciding which + internal register or string index to use, according to the use + attribute or access point specified in the query. Thereafter we + deal with the rules for determining the correct structure type of + the named register. + - + + Mapping of PQF APT access points - Use attributes are interpreted according to the - attribute sets which have been loaded in the - zebra.cfg file, and are matched against specific - fields as specified in the .abs file which - describes the profile of the records which have been loaded. - If no Use attribute is provided, a default of Bib-1 Any is assumed. + Zebra understands four fundamental different types of access + points, of which only the + numeric use attribute type access points + are defined by the Z39.50 + standard. + All other access point types are Zebra specific, and non-portable. - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Acces point name mapping
Acess PointTypeGrammarNotes
Use attibutenumeric[1-9][1-9]*directly mapped to string index name
String index namestring[a-zA-Z](\-?[a-zA-Z0-9])*normalized name is used as internal string index name
Zebra internal index namezebra_[a-zA-Z](_?[a-zA-Z0-9])*hardwired internal string index name
XPATH special indexXPath/.*special xpath search for GRS indexed records
+ + + Attribute set names and + string index names are normalizes + according to the following rules: all single + hyphens '-' are stripped, and all upper case + letters are folded to lower case. + + + + Numeric use attributes are mapped + to the Zebra internal + string index according to the attribute set defintion in use. + The default attribute set is Bib-1, and may be + omitted in the PQF query. + + + + According to normalization and numeric + use attribute mapping, it follows that the following + PQF queries are considered equivalent (assuming the default + configuration has not been altered): + + Z> find @attr 1=Body-of-text serenade + Z> find @attr 1=bodyoftext serenade + Z> find @attr 1=BodyOfText serenade + Z> find @attr 1=bO-d-Y-of-tE-x-t serenade + Z> find @attr 1=1010 serenade + Z> find @attrset Bib-1 @attr 1=1010 serenade + Z> find @attrset bib1 @attr 1=1010 serenade + Z> find @attrset Bib1 @attr 1=1010 serenade + Z> find @attrset b-I-b-1 @attr 1=1010 serenade + + + + + The numerical + use attributes (type 1) + are interpreted according to the + attribute sets which have been loaded in the + zebra.cfg file, and are matched against specific + fields as specified in the .abs file which + describes the profile of the records which have been loaded. + If no use attribute is provided, a default of + Bib-1 Use Any (1016) is + assumed. + The predefined use attribute sets + can be reconfigured by tweaking the configuration files + tab/*.att, and + new attribute sets can be defined by adding similar files in the + configuration path profilePath of the server. + + + + String indexes can be acessed directly, + independently which attribute set is in use. These are just + ignored. The above mentioned name normalization applies. + String index names are defined in the + used indexing filter configuration files, for example in the + GRS + *.abs configuration files, or in the + alvis filter XSLT indexing stylesheets. + + + + Zebra internal indexes can be acessed directly, + according to the same rules as the user defined + string indexes. The only difference is that + Zebra internal indexe names are hardwired, + all uppercase and + must start with the character '_'. + + + + Finally, XPATH access points are only + available using the GRS filter for indexing. + These acees point names must start with the character + '/', they are not + normalized, but passed unaltered to the Zebra internal + XPATH engine. See . + + + + +
+ + + + Mapping of PQF APT structure and completeness to + register type + + Internally Zebra has in it's default configuration several + different types of registers or indexes, whose tokenization and + character normalization rules differ. This reflects the fact that + serching fundamental different tokens like dates, numbers, + bitfields and string based text needs different rulesets. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Structure and completeness mapping to register types
StructureCompletenessRegister typeNotes
+ phrase (@attr 4=1), word (@attr 4=2), + word-list (@attr 4=6), + free-form-text (@attr 4=105), or document-text (@attr 4=106) + Incomplete field (@attr 6=1)Word ('w')Traditional tokenized and character normalized word index
+ phrase (@attr 4=1), word (@attr 4=2), + word-list (@attr 4=6), + free-form-text (@attr 4=105), or document-text (@attr 4=106) + complete field' (@attr 6=3)Phrase ('p')Character normalized, but not tokenized index for phrase + matches +
urx (@attr 4=104)ignoredURX/URL ('u')Special index for URL web adresses
numeric (@attr 4=109)ignoredNumeric ('u')Special index for digital numbers
key (@attr 4=3)ignoredNull bitmap ('0')Used for non-tokenizated and non-normalized bit sequences
year (@attr 4=4)ignoredYear ('y')Non-tokenizated and non-normalized 4 digit numbers
date (@attr 4=5)ignoredDate ('d')Non-tokenizated and non-normalized ISO date strings
ignoredignoredSort ('s')Used with special sort attribute set (@attr 7=1, @attr 7=2)
overruledoverruledspecialInternal record ID register, used whenever + Relation Always Matches (@attr 2=103) is specified
+ + + If a Structure attribute of Phrase is used in conjunction with a @@ -1871,9 +2108,23 @@ against the contents of the phrase (long word) register, if one exists for the given Use attribute. A phrase register is created for those fields in the - .abs file that contains a + GRS *.abs file that contains a p-specifier. - + + Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven + ... + bayreuther festspiele (1) + * beethoven bibliography database (1) + benny carter (1) + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography" + ... + Number of hits: 0, setno 5 + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database" + ... + Number of hits: 1, setno 6 + @@ -1884,7 +2135,23 @@ contains multiple words, the term will only match if all of the words are found immediately adjacent, and in the given order. The word search is performed on those fields that are indexed as - type w in the .abs file. + type w in the GRS *.abs file. + + Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven + ... + beefheart (1) + * beethoven (18) + beethovens (7) + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven + ... + Number of hits: 18, setno 1 + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven bibliography" + ... + Number of hits: 2, setno 2 + ... + @@ -1895,21 +2162,22 @@ natural-language, relevance-ranked query. This search type uses the word register, i.e. those fields that are indexed as type w in the - .abs file. + GRS *.abs file. If the Structure attribute is Numeric String the term is treated as an integer. The search is performed on those fields that are indexed - as type n in the .abs file. + as type n in the GRS + *.abs file. If the Structure attribute is URx the term is treated as a URX (URL) entity. The search is performed on those fields that are indexed as type - u in the .abs file. + u in the *.abs file. @@ -1944,6 +2212,8 @@ replacement) is accepted when terms are matched against the register contents. + +