- Allow keywords to be used unquoted as search terms.

author mike <mike>

Thu, 14 Nov 2002 22:04:16 +0000 (22:04 +0000)

committer mike <mike>

Thu, 14 Nov 2002 22:04:16 +0000 (22:04 +0000)
author mike <mike>
Thu, 14 Nov 2002 22:04:16 +0000 (22:04 +0000)
committer mike <mike>
Thu, 14 Nov 2002 22:04:16 +0000 (22:04 +0000)
diff --git a/Changes b/Changes

index 74d7c6e..6d7f3fa 100644 (file)
--- a/Changes
+++ b/Changes
@@ -1,21 +1,29 @@
-$Id: Changes,v 1.7 2002-11-12 22:37:48 mike Exp $
+$Id: Changes,v 1.8 2002-11-14 22:04:16 mike Exp $
  
  Revision history for "cql-java"
+See the bottom of this file for a list of things still to do.
  
  0.3  (IN PROGRESS)
+       - Allow keywords to be used unquoted as search terms.
+       - Add support for serverChoiceRelation (scr).
+       - Add support for prefix-mapping, as in
+               >dc="http://dublincore.org/ dc.title=fish
+         and
+               >"http://dublincore.org/ title=fish
+         ### But the XCQL output may need to be changed depending on
+             the result of the ZNG list's deliberations.
+       - Fix the parser to normalise relation modifiers to lower case.
         - Fix the CQLParser test harness not to emit an extraneous
           blank line at end of XCQL output.
-       - Fix the parser to normalise relation modifiers to lower case.
         - Fix CQLNode documentation to contain a link to YAZ's
           documentation of Prefix Query Format (PQF) rather than
           containing a rather unhelpful chunk of BNF.
-       - Change the source directory's Makefile so that it specifies
-         the appropriate -classpath by default.
-         ### undo this change!
         - Change the test/regression Makefile so that "make clean" now
           does what "make distclean" used to do - the distinction
           between them is pointless.
         - Fix a few typos in the documentation.
+       - Move the README file's old "THINGS TO DO" section to the end
+         of this file, the new "Still to do" section.
  
  0.2  Wed Nov  6 23:05:54 2002
         - Fix the order of proximity parameters in accordance with the
@@ -45,3 +53,31 @@ Revision history for "cql-java"
  0.1  Sun Nov  3 20:58:27 2002
         - First public release.
  
+--
+
+### Still to do
+       - Fix the bug where "9x" is parsed as two tokens, a TT_NUMBER
+         followed by a TT_WORD.  The problem here is that I don't
+         think it's actually possible to fix this without throwing
+         out StreakTokenizer and rolling our own, which we absolutely
+         _don't_ want to do.
+       - Write javadoc comments for CQLRelation and ModifierSet.
+       - Write "overview" file for the javadoc documentation.
+       - Some niceties for the cql-decompiling back-end:
+         * Don't emit redundant parentheses.
+         * Don't put spaces around relations that don't need them.
+       - Consider the utility of yet another back-end that translates
+         a CQLNode tree into JZKit's representation of a Type-1 query
+         tree.  That would be nice so that CQL could become a JZKit
+         query-type; but you could achieve the same effect by
+         generating PQF, and running that through JZKit's existing
+         PQN-to-Type-1 compiler.
+       - Many refinements to the random query generator:
+         * Generate relation modifiers
+         * Proximity support
+         * Don't always generate qualifier/relation for terms
+         * Better selection of qualifier (configurable?)
+         * Better selection of terms (from a dictionary file?)
+         * Introduce wildcard characters into generated terms
+         * Generate multi-word terms
+
diff --git a/README b/README

index 913a781..08a76a5 100644 (file)
--- a/README
+++ b/README
@@ -1,4 +1,4 @@
-$Id: README,v 1.17 2002-11-08 13:49:48 mike Exp $
+$Id: README,v 1.18 2002-11-14 22:04:16 mike Exp $
  
  cql-java - a free CQL compiler, and other CQL tools, for Java
  
@@ -114,35 +114,5 @@ All the other free CQL compilers everyone's going to write  :-)
  THINGS TO DO
  ------------
  
-* ### Fix bug where "9x" is parsed as two tokens, a TT_NUMBER followed
-  by a TT_WORD.  The problem here is that I don't think it's actually
-  possible to fix this without throwing out StreakTokenizer and
-  rolling our own, which we absolutely _don't_ want to do.
-
-* Allow keywords to be used unquoted as search terms.
-
-* Add support for serverChoiceRelation (scr).
-
-* Write javadoc comments for CQLRelation and ModifierSet.
-
-* Write "overview" file for the javadoc documentation.
-
-* Some niceties for the cql-decompiling back-end:
-       * don't emit redundant parentheses.
-       * don't put spaces around relations that don't need them.
-
-* Consider the utility of yet another back-end that translates a
-  CQLNode tree into a Type-1 query tree using the JZKit data
-  structures.  That would be nice so that CQL could become a JZKit
-  query-type; but you could achieve the same effect by generating PQN,
-  and running that through JZKit's existing PQN-to-Type-1 compiler.
-
-* Many refinements to the random query generator:
-       * Generate relation modifiers
-       * Proximity support
-       * Don't always generate qualifier/relation for terms
-       * Better selection of qualifier (configurable?)
-       * Better selection of terms (from a dictionary file?)
-       * Introduce wildcard characters into generated terms
-       * Generate multi-word terms
+[See the final "Still to do" section of the "Changes" file.]
  
diff --git a/src/org/z3950/zing/cql/CQLLexer.java b/src/org/z3950/zing/cql/CQLLexer.java

index 8d054d9..8ae5085 100644 (file)
--- a/src/org/z3950/zing/cql/CQLLexer.java
+++ b/src/org/z3950/zing/cql/CQLLexer.java
@@ -1,4 +1,4 @@
-// $Id: CQLLexer.java,v 1.4 2002-11-02 01:21:35 mike Exp $
+// $Id: CQLLexer.java,v 1.5 2002-11-14 22:04:16 mike Exp $
  
  package org.z3950.zing.cql;
  import java.io.StreamTokenizer;
@@ -35,6 +35,7 @@ class CQLLexer extends StreamTokenizer {
      static int TT_RELEVANT  = 1016;    // The "relevant" relation modifier
      static int TT_FUZZY     = 1017;    // The "fuzzy" relation modifier
      static int TT_STEM      = 1018;    // The "stem" relation modifier
+    static int TT_SCR       = 1019;    // The server choice relation
  
      // Support for keywords.  It would be nice to compile this linear
      // list into a Hashtable, but it's hard to store ints as hash
@@ -67,6 +68,7 @@ class CQLLexer extends StreamTokenizer {
         new Keyword(TT_RELEVANT, "relevant"),
         new Keyword(TT_FUZZY, "fuzzy"),
         new Keyword(TT_STEM, "stem"),
+       new Keyword(TT_SCR, "scr"),
      };
  
      // For halfDecentPushBack() and the code at the top of nextToken()
diff --git a/src/org/z3950/zing/cql/CQLParser.java b/src/org/z3950/zing/cql/CQLParser.java

index 6329146..eadedef 100644 (file)
--- a/src/org/z3950/zing/cql/CQLParser.java
+++ b/src/org/z3950/zing/cql/CQLParser.java
@@ -1,4 +1,4 @@
-// $Id: CQLParser.java,v 1.19 2002-11-08 16:38:47 mike Exp $
+// $Id: CQLParser.java,v 1.20 2002-11-14 22:04:16 mike Exp $
  
  package org.z3950.zing.cql;
  import java.io.IOException;
@@ -12,7 +12,7 @@ import java.io.FileNotFoundException;
  /**
   * Compiles CQL strings into parse trees of CQLNode subtypes.
   *
- * @version    $Id: CQLParser.java,v 1.19 2002-11-08 16:38:47 mike Exp $
+ * @version    $Id: CQLParser.java,v 1.20 2002-11-14 22:04:16 mike Exp $
   * @see                <A href="http://zing.z3950.org/cql/index.html"
   *                     >http://zing.z3950.org/cql/index.html</A>
   */
@@ -45,39 +45,38 @@ public class CQLParser {
         lexer = new CQLLexer(cql, LEXDEBUG);
  
         lexer.nextToken();
-       debug("about to parse_query()");
-       CQLNode root = parse_query("srw.serverChoice", new CQLRelation("scr"));
-       // ### "scr" above should arguably be "="
+       debug("about to parseQuery()");
+       CQLNode root = parseQuery("srw.serverChoice", new CQLRelation("scr"));
         if (lexer.ttype != lexer.TT_EOF)
             throw new CQLParseException("junk after end: " + lexer.render());
  
         return root;
      }
  
-    private CQLNode parse_query(String qualifier, CQLRelation relation)
+    private CQLNode parseQuery(String qualifier, CQLRelation relation)
         throws CQLParseException, IOException {
-       debug("in parse_query()");
+       debug("in parseQuery()");
  
-       CQLNode term = parse_term(qualifier, relation);
+       CQLNode term = parseTerm(qualifier, relation);
         while (lexer.ttype != lexer.TT_EOF &&
                lexer.ttype != ')') {
             if (lexer.ttype == lexer.TT_AND) {
                 match(lexer.TT_AND);
-               CQLNode term2 = parse_term(qualifier, relation);
+               CQLNode term2 = parseTerm(qualifier, relation);
                 term = new CQLAndNode(term, term2);
             } else if (lexer.ttype == lexer.TT_OR) {
                 match(lexer.TT_OR);
-               CQLNode term2 = parse_term(qualifier, relation);
+               CQLNode term2 = parseTerm(qualifier, relation);
                 term = new CQLOrNode(term, term2);
             } else if (lexer.ttype == lexer.TT_NOT) {
                 match(lexer.TT_NOT);
-               CQLNode term2 = parse_term(qualifier, relation);
+               CQLNode term2 = parseTerm(qualifier, relation);
                 term = new CQLNotNode(term, term2);
             } else if (lexer.ttype == lexer.TT_PROX) {
                 match(lexer.TT_PROX);
                 CQLProxNode proxnode = new CQLProxNode(term);
                 gatherProxParameters(proxnode);
-               CQLNode term2 = parse_term(qualifier, relation);
+               CQLNode term2 = parseTerm(qualifier, relation);
                 proxnode.addSecondSubterm(term2);
                 term = (CQLNode) proxnode;
             } else {
@@ -90,32 +89,25 @@ public class CQLParser {
         return term;
      }
  
-    private CQLNode parse_term(String qualifier, CQLRelation relation)
+    private CQLNode parseTerm(String qualifier, CQLRelation relation)
         throws CQLParseException, IOException {
-       debug("in parse_term()");
+       debug("in parseTerm()");
  
         String word;
         while (true) {
             if (lexer.ttype == '(') {
                 debug("parenthesised term");
                 match('(');
-               CQLNode expr = parse_query(qualifier, relation);
+               CQLNode expr = parseQuery(qualifier, relation);
                 match(')');
                 return expr;
-           } else if (lexer.ttype != lexer.TT_WORD &&
-                      lexer.ttype != lexer.TT_NUMBER &&
-                      lexer.ttype != '"') {
-               throw new CQLParseException("expected qualifier or term, " +
-                                           "got " + lexer.render());
+           } else if (lexer.ttype == '>') {
+               match('>');
+               return parsePrefix(qualifier, relation);
             }
  
             debug("non-parenthesised term");
-           if (lexer.ttype == lexer.TT_NUMBER) {
-               word = lexer.render();
-           } else {
-               word = lexer.sval;
-           }
-           match(lexer.ttype);
+           word = matchSymbol("qualifier or term");
             if (!isBaseRelation())
                 break;
  
@@ -143,6 +135,21 @@ public class CQLParser {
         return node;
      }
  
+    private CQLNode parsePrefix(String qualifier, CQLRelation relation)
+       throws CQLParseException, IOException {
+       debug("prefix mapping");
+
+       String name = null;
+       String identifier = matchSymbol("prefix-name");
+       if (lexer.ttype == '=') {
+           match('=');
+           name = identifier;
+           identifier = matchSymbol("prefix-identifer");
+       }
+       CQLNode term = parseTerm(qualifier, relation);
+       return new CQLPrefixNode(name, identifier, term);
+    }
+
      private void gatherProxParameters(CQLProxNode node)
         throws CQLParseException, IOException {
         for (int i = 0; i < 4; i++) {
@@ -212,7 +219,8 @@ public class CQLParser {
         return (isProxRelation() ||
                 lexer.ttype == lexer.TT_ANY ||
                 lexer.ttype == lexer.TT_ALL ||
-               lexer.ttype == lexer.TT_EXACT);
+               lexer.ttype == lexer.TT_EXACT ||
+               lexer.ttype == lexer.TT_SCR);
      }
  
      private boolean isProxRelation() {
@@ -239,6 +247,43 @@ public class CQLParser {
               " (tmp=" + tmp + ")");
      }
  
+    private String matchSymbol(String expected)
+       throws CQLParseException, IOException {
+
+       debug("in matchSymbol()");
+       if (lexer.ttype == lexer.TT_WORD ||
+           lexer.ttype == lexer.TT_NUMBER ||
+           lexer.ttype == '"' ||
+           // The following is a complete list of keywords.  Because
+           // they're listed here, they can be used unquoted as
+           // qualifiers, terms, prefix names and prefix identifiers.
+           lexer.ttype == lexer.TT_AND ||
+           lexer.ttype == lexer.TT_OR ||
+           lexer.ttype == lexer.TT_NOT ||
+           lexer.ttype == lexer.TT_PROX ||
+           lexer.ttype == lexer.TT_ANY ||
+           lexer.ttype == lexer.TT_ALL ||
+           lexer.ttype == lexer.TT_EXACT ||
+           lexer.ttype == lexer.TT_pWORD ||
+           lexer.ttype == lexer.TT_SENTENCE ||
+           lexer.ttype == lexer.TT_PARAGRAPH ||
+           lexer.ttype == lexer.TT_ELEMENT ||
+           lexer.ttype == lexer.TT_ORDERED ||
+           lexer.ttype == lexer.TT_UNORDERED ||
+           lexer.ttype == lexer.TT_RELEVANT ||
+           lexer.ttype == lexer.TT_FUZZY ||
+           lexer.ttype == lexer.TT_STEM ||
+           lexer.ttype == lexer.TT_SCR) {
+           String symbol = (lexer.ttype == lexer.TT_NUMBER) ?
+               lexer.render() : lexer.sval;
+           match(lexer.ttype);
+           return symbol;
+       }
+
+       throw new CQLParseException("expected " + expected + ", " +
+                                   "got " + lexer.render());
+    }
+
  
      /**
       * Simple test-harness for the CQLParser class.
diff --git a/src/org/z3950/zing/cql/CQLPrefix.java b/src/org/z3950/zing/cql/CQLPrefix.java

new file mode 100644 (file)

index 0000000..42edfc1
--- /dev/null
+++ b/src/org/z3950/zing/cql/CQLPrefix.java
@@ -0,0 +1,34 @@
+// $Id: CQLPrefix.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+
+package org.z3950.zing.cql;
+import java.lang.String;
+
+/**
+ * Represents a CQL prefix mapping from short name to long identifier.
+ *
+ * @version    $Id: CQLPrefix.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+ */
+public class CQLPrefix {
+    /**
+     * The short name of the prefix mapping - that is, the prefix
+     * itself, such as <TT>dc</TT>, as it might be used in a qualifier
+     * like <TT>dc.title</TT>.
+     */
+    String name;
+
+    /**
+     * The full identifier name of the prefix mapping - that is, the prefix
+     * itself, such as <TT>dc</TT>, as it might be used in a qualifier
+     * like <TT>dc.title</TT>.
+     */
+    String identifier;
+
+    /**
+     * Creates a new CQLPrefix mapping, which maps the specified name
+     * to the specified identifier.
+     */
+    CQLPrefix(String name, String identifier) {
+       this.name = name;
+       this.identifier = identifier;
+    }
+}
diff --git a/src/org/z3950/zing/cql/CQLPrefixNode.java b/src/org/z3950/zing/cql/CQLPrefixNode.java

new file mode 100644 (file)

index 0000000..43a526c
--- /dev/null
+++ b/src/org/z3950/zing/cql/CQLPrefixNode.java
@@ -0,0 +1,60 @@
+// $Id: CQLPrefixNode.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+
+package org.z3950.zing.cql;
+import java.lang.String;
+import java.util.Properties;
+
+
+/**
+ * Represents a prefix node in a CQL parse-tree.
+ *
+ * @version    $Id: CQLPrefixNode.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+ */
+public class CQLPrefixNode extends CQLNode {
+    /**
+     * The prefix definition that governs the subtree.
+     */
+    public CQLPrefix prefix;
+
+    /**
+     * The root of a parse-tree representing the part of the query
+     * that is governed by this prefix definition.
+     */ 
+    public CQLNode subtree;
+
+    /**
+     * Creates a new CQLPrefixNode inducing a mapping from the
+     * specified qualifier-set name to the specified identifier across
+     * the specified subtree.
+     */
+    public CQLPrefixNode(String name, String identifier, CQLNode subtree) {
+       this.prefix = new CQLPrefix(name, identifier);
+       this.subtree = subtree;
+    }
+
+    public String toXCQL(int level) {
+       String maybeName = "";
+       if (prefix.name != null)
+           maybeName = indent(level+1) + "<name>" + prefix.name + "<name>\n";
+
+       return (indent(level) + "<prefix>\n" + maybeName +
+               indent(level+1) +
+                   "<identifier>" + prefix.identifier + "<identifier>\n" +
+               subtree.toXCQL(level+1) +
+               indent(level) + "</prefix>\n");
+    }
+
+    public String toCQL() {
+       // ### We don't always need parens around the operand
+       return ">" + prefix.name + "=\"" + prefix.identifier + "\" " +
+           "(" + subtree.toCQL() + ")";
+    }
+
+    public String toPQF(Properties config) throws PQFTranslationException {
+       // Prefixes and their identifiers don't actually play any role
+       // in PQF translation, since the meanings of the qualifiers,
+       // including their prefixes if any, are instead wired into
+       // `config'.
+       return subtree.toPQF(config);
+    }
+}
diff --git a/src/org/z3950/zing/cql/Makefile b/src/org/z3950/zing/cql/Makefile

index 4bc961e..029ca04 100644 (file)
--- a/src/org/z3950/zing/cql/Makefile
+++ b/src/org/z3950/zing/cql/Makefile
@@ -1,13 +1,20 @@
-# $Id: Makefile,v 1.10 2002-11-12 22:38:35 mike Exp $
+# $Id: Makefile,v 1.11 2002-11-14 22:04:16 mike Exp $
+#
+# Your Java compiler, and javadoc, will require that this source
+# directory is on the classpath.  The best way to do that is just to
+# add the cql-java distribution's "src" subdirectory to your CLASSPATH
+# environment variable, like this:
+#      CLASSPATH=$CLASSPATH:/where/ever/you/unpacked/it/cql-java-VERSION/src
  
  DOCDIR = ../../../../../docs
  
  OBJ = Utils.class \
         CQLNode.class CQLTermNode.class CQLBooleanNode.class \
         CQLAndNode.class CQLOrNode.class CQLNotNode.class \
-       CQLRelation.class CQLProxNode.class ModifierSet.class \
-       CQLParser.class CQLLexer.class CQLParseException.class \
-       CQLGenerator.class MissingParameterException.class \
+       CQLProxNode.class CQLPrefixNode.class CQLPrefix.class \
+       CQLRelation.class ModifierSet.class \
+       CQLParser.class CQLLexer.class CQLGenerator.class \
+       CQLParseException.class MissingParameterException.class \
         PQFTranslationException.class \
         UnknownQualifierException.class UnknownRelationException.class \
         UnknownRelationModifierException.class UnknownPositionException.class
@@ -15,15 +22,6 @@ OBJ = Utils.class \
  ../../../../../lib/cql-java.jar: $(OBJ)
         cd ../../../..; jar cf ../lib/cql-java.jar org/z3950/zing/cql/*.class
  
-# ### FIX THIS COMMENT!
-# Your Java compiler will require that this source directory is on the
-# classpath.  Generally, you can use the rules below, which set the
-# classpath suitably.  But that will break if you need other elements
-# in the CLASSPATH too.  If that's the situation you're in, take the
-# "-classpath ../../../.." flag out of the rules below, and set your
-# CLASSPATH environment variable to include
-#      /where/ever/you/unpacked/it/cql-java-VERSION/src
-#
  %.class: %.java
         javac $<
  
diff --git a/test/regression/queries.raw b/test/regression/queries.raw

index 5366fb9..67daa48 100644 (file)
--- a/test/regression/queries.raw
+++ b/test/regression/queries.raw
@@ -1,4 +1,5 @@
-# Simple
+
+# Simple 
  
  cat
  "cat"
@@ -9,6 +10,8 @@ xml:element
  "prox/>=/5/word"
  ("cat")
  ((dog))
+all
+prox
  
  # index relation term
  
@@ -23,6 +26,7 @@ dc.title any/stem fish
  dc.fish all/stem/fuzzy "fish chips"
  (title any frog)
  ((dc.title any/stem "frog pond"))
+dc.title scr "fish frog chicken"
  
  # Simple Boolean
  
@@ -31,22 +35,24 @@ cat and fish
  cat not frog
  (cat not frog)
  "cat" not "fish food"
-xml and "prox///word/"
+xml and "prox///"
+fred and any
+((fred or all))
  a or b and c not d
  
  # I/R/T plus Boolean
  
  bath.author any fish and dc.title all "cat dog"
-(title any/stem "fish dog" or "and")
+(title any/stem "fish dog" or and)
  
  # Prox
  
  cat prox hat
  cat prox/=/3/word/ordered hat
  cat prox//3 hat
-"fish food" prox///sentence "and"
-title all "chips frog" prox//5/word "any"
-(dc.author exact "jones" prox//5 title >= "smith")
+"fish food" prox///sentence and
+title all "chips frog" prox/>=/5 exact
+(dc.author exact "jones" prox/</5/element title >= "smith")
  ((cat prox hat))
  
  # Special characters
@@ -65,22 +71,21 @@ cat?dog
  
  # Lame searches
  
-"any" or "all:stem" and "all" exact "any" prox///word "prox"="fuzzy"
-((((((((("any")))))))))
-
+any or all:stem and all exact any prox prox=fuzzy
+(((((((((any)))))))))
+("")
  
  # Invalid searches [should error]
  
  >
  ===
  cat or
-index any
+index any 
  index any/wrong term
  a prox/wrong b
  ()
  (a
  index any fish)
  (cat any dog or ())
-fred and any
-((fred or all))
-sorry = (mike)
+title = ("illegal parentheses")
+"quoted" any "illegal quotes"
author	mike <mike>
	Thu, 14 Nov 2002 22:04:16 +0000 (22:04 +0000)
committer	mike <mike>
	Thu, 14 Nov 2002 22:04:16 +0000 (22:04 +0000)
Changes		patch \| blob \| history
README		patch \| blob \| history
src/org/z3950/zing/cql/CQLLexer.java		patch \| blob \| history
src/org/z3950/zing/cql/CQLParser.java		patch \| blob \| history
src/org/z3950/zing/cql/CQLPrefix.java	[new file with mode: 0644]	patch \| blob
src/org/z3950/zing/cql/CQLPrefixNode.java	[new file with mode: 0644]	patch \| blob
src/org/z3950/zing/cql/Makefile		patch \| blob \| history
test/regression/queries.raw		patch \| blob \| history