+
+ </section>
+
+ <section id="relevance_ranking">
+ <title>Relevance ranking</title>
+ <para>
+ Pazpar2 uses a variant of the fterm frequency–inverse document frequency
+ (Tf-idf) ranking algorithm.
+ </para>
+ <para>
+ The Tf-part is straightforward to calculate and is based on the
+ documents that Pazpar2 fetches. The idf-part, however, is more tricky
+ since the corpus at hand is ONLY the relevant documents and not
+ irrelevant ones. Pazpar2 does not have the full corpus -- only the
+ documents that match a particular search.
+ </para>
+ <para>
+ Computatation of the Tf-part is based on the normalized documents.
+ The length, the position and terms are thus normalized at this point.
+ Also the computation if performed for each document received from the
+ target - before merging takes place. The result of a TF-compuation is
+ added to the TF-total of a cluster. Thus, if a document occurs twice,
+ then the TF-part is doubled. That, however, can be adjusted, because the
+ TF-part may be divided by the number of documents in a cluster.
+ </para>
+ <para>
+ The algorithm used by Pazpar2 has two phases. In phase one
+ Pazpar2 computes a tf-array .. This is being done as records are
+ fetched form the database. In this case, the rank weigth
+ <literal>w</literal>, the and rank tweaks <literal>lead</literal>,
+ <literal>follow</literal> and <literal>length</literal>.
+
+ </para>
+ <screen><![CDATA[
+ tf[1,2,..N] = 0;
+ foreach document in a cluster
+ foreach field
+ w[1,2,..N] = 0;
+ for i = 1, .. N: (each term)
+ foreach pos (where term i occurs in field)
+ // w is configured weight for field
+ // pos is position of term in field
+ w[i] += w / (1 + log2(1+lead*pos))
+ if (d > 0)
+ w[i] += w[i] * follow / (1+log2(d)
+ // length: length of field (number of terms that is)
+ if (length strategy is "linear")
+ tf[i] += w[i] / length;
+ else if (length strategy is "log")
+ tf[i] += w[i] / log2(length);
+ else if (length strategy is "none")
+ tf[i] += w[i];
+ ]]></screen>
+ <para>
+ In phase two, the idf-array is computed and the final score
+ is computed. This is done for each cluster as part of each show command.
+ The rank tweak <literal>cluster</literal> is in use here.
+ </para>
+ <screen><![CDATA[
+ // dococcur[i]: number of records where term occurs
+ // doctotal: number of records
+ for i = 1, .., N (each term)
+ if (dococcur[i] > 0)
+ idf[i] = log(1 + doctotal / dococcur[i])
+ else
+ idf[i] = 0;
+
+ relevance = 0;
+ for i = 1, .., N: (each term)
+ if (cluster is "yes")
+ tf[i] = tf[i] / cluster_size;
+ relevance += 100000 * tf[i] / idf[i];
+ ]]></screen>
+ <para>
+ For controlling the ranking parameters, refer to the
+ <link linkend="service-rank">rank</link> element of the
+ service definition.
+ Refer to the <link linkend="metadata-rank">rank</link> attribute
+ of the metadata element for how to control ranking for individual
+ metadata fields.
+ </para>
+ </section> <!-- relevance_ranking -->
+
+ <section id="masterkey_connect">
+ <title>Pazpar2 and MasterKey Connect</title>
+ <para>
+ MasterKey Connect is a hosted connector, or gateway, service that exposes
+ whatever searchable resources you need. Since the service exposes all
+ resources using Z39.50 (or SRU), it is easy to set up Pazpar2 to use the
+ service. In particular, since all connectors expose basically the same core
+ behavior, it is a good use of Pazpar2's mechanism for managing default
+ behaviors across similar databases.
+ </para>
+ <para>
+ After installation of Pazpar2, the directory
+ <filename>/etc/pazpar2/settings/mkc</filename> (location may
+ vary depending on installation preferences) contains an example setup that
+ searches two different resources through a MasterKey Connect demo account.
+ The file mkc.xml contains default parameters that will work for all
+ MasterKey Connect resources (if you decide to become a customer of the
+ service, you will substitute your own account credentials for
+ the guest/guest). The other files contain specific information about
+ a couple of demonstration resources.
+ </para>
+
+ <para>
+ To play with the demo, just create a symlink from
+ <filename>/etc/pazpar2/services-enabled/default.xml</filename>
+ to <filename>/etc/pazpar2/services-available/mkc.xml</filename>.
+ And restart Pazpar2. You should now be able to search the two demo
+ resources using JSDemo or any user interface of your choice.
+ If you are interested in learning more about MasterKey Connect, or to
+ try out the service for free against your favorite online resource, just
+ contact us at <email>info@indexdata.com</email>.
+ </para>