- Add support for serverChoiceRelation (scr).
- Add support for prefix-mapping, as in
>dc="http://dublincore.org/ dc.title=fish
and
>"http://dublincore.org/ title=fish
### But the XCQL output may need to be changed depending on
the result of the ZNG list's deliberations.
- Move the README file's old "THINGS TO DO" section to the end
of this file, the new "Still to do" section.
-$Id: Changes,v 1.7 2002-11-12 22:37:48 mike Exp $
+$Id: Changes,v 1.8 2002-11-14 22:04:16 mike Exp $
Revision history for "cql-java"
+See the bottom of this file for a list of things still to do.
0.3 (IN PROGRESS)
+ - Allow keywords to be used unquoted as search terms.
+ - Add support for serverChoiceRelation (scr).
+ - Add support for prefix-mapping, as in
+ >dc="http://dublincore.org/ dc.title=fish
+ and
+ >"http://dublincore.org/ title=fish
+ ### But the XCQL output may need to be changed depending on
+ the result of the ZNG list's deliberations.
+ - Fix the parser to normalise relation modifiers to lower case.
- Fix the CQLParser test harness not to emit an extraneous
blank line at end of XCQL output.
- - Fix the parser to normalise relation modifiers to lower case.
- Fix CQLNode documentation to contain a link to YAZ's
documentation of Prefix Query Format (PQF) rather than
containing a rather unhelpful chunk of BNF.
- - Change the source directory's Makefile so that it specifies
- the appropriate -classpath by default.
- ### undo this change!
- Change the test/regression Makefile so that "make clean" now
does what "make distclean" used to do - the distinction
between them is pointless.
- Fix a few typos in the documentation.
+ - Move the README file's old "THINGS TO DO" section to the end
+ of this file, the new "Still to do" section.
0.2 Wed Nov 6 23:05:54 2002
- Fix the order of proximity parameters in accordance with the
0.1 Sun Nov 3 20:58:27 2002
- First public release.
+--
+
+### Still to do
+ - Fix the bug where "9x" is parsed as two tokens, a TT_NUMBER
+ followed by a TT_WORD. The problem here is that I don't
+ think it's actually possible to fix this without throwing
+ out StreakTokenizer and rolling our own, which we absolutely
+ _don't_ want to do.
+ - Write javadoc comments for CQLRelation and ModifierSet.
+ - Write "overview" file for the javadoc documentation.
+ - Some niceties for the cql-decompiling back-end:
+ * Don't emit redundant parentheses.
+ * Don't put spaces around relations that don't need them.
+ - Consider the utility of yet another back-end that translates
+ a CQLNode tree into JZKit's representation of a Type-1 query
+ tree. That would be nice so that CQL could become a JZKit
+ query-type; but you could achieve the same effect by
+ generating PQF, and running that through JZKit's existing
+ PQN-to-Type-1 compiler.
+ - Many refinements to the random query generator:
+ * Generate relation modifiers
+ * Proximity support
+ * Don't always generate qualifier/relation for terms
+ * Better selection of qualifier (configurable?)
+ * Better selection of terms (from a dictionary file?)
+ * Introduce wildcard characters into generated terms
+ * Generate multi-word terms
+
-$Id: README,v 1.17 2002-11-08 13:49:48 mike Exp $
+$Id: README,v 1.18 2002-11-14 22:04:16 mike Exp $
cql-java - a free CQL compiler, and other CQL tools, for Java
THINGS TO DO
------------
-* ### Fix bug where "9x" is parsed as two tokens, a TT_NUMBER followed
- by a TT_WORD. The problem here is that I don't think it's actually
- possible to fix this without throwing out StreakTokenizer and
- rolling our own, which we absolutely _don't_ want to do.
-
-* Allow keywords to be used unquoted as search terms.
-
-* Add support for serverChoiceRelation (scr).
-
-* Write javadoc comments for CQLRelation and ModifierSet.
-
-* Write "overview" file for the javadoc documentation.
-
-* Some niceties for the cql-decompiling back-end:
- * don't emit redundant parentheses.
- * don't put spaces around relations that don't need them.
-
-* Consider the utility of yet another back-end that translates a
- CQLNode tree into a Type-1 query tree using the JZKit data
- structures. That would be nice so that CQL could become a JZKit
- query-type; but you could achieve the same effect by generating PQN,
- and running that through JZKit's existing PQN-to-Type-1 compiler.
-
-* Many refinements to the random query generator:
- * Generate relation modifiers
- * Proximity support
- * Don't always generate qualifier/relation for terms
- * Better selection of qualifier (configurable?)
- * Better selection of terms (from a dictionary file?)
- * Introduce wildcard characters into generated terms
- * Generate multi-word terms
+[See the final "Still to do" section of the "Changes" file.]
-// $Id: CQLLexer.java,v 1.4 2002-11-02 01:21:35 mike Exp $
+// $Id: CQLLexer.java,v 1.5 2002-11-14 22:04:16 mike Exp $
package org.z3950.zing.cql;
import java.io.StreamTokenizer;
static int TT_RELEVANT = 1016; // The "relevant" relation modifier
static int TT_FUZZY = 1017; // The "fuzzy" relation modifier
static int TT_STEM = 1018; // The "stem" relation modifier
+ static int TT_SCR = 1019; // The server choice relation
// Support for keywords. It would be nice to compile this linear
// list into a Hashtable, but it's hard to store ints as hash
new Keyword(TT_RELEVANT, "relevant"),
new Keyword(TT_FUZZY, "fuzzy"),
new Keyword(TT_STEM, "stem"),
+ new Keyword(TT_SCR, "scr"),
};
// For halfDecentPushBack() and the code at the top of nextToken()
-// $Id: CQLParser.java,v 1.19 2002-11-08 16:38:47 mike Exp $
+// $Id: CQLParser.java,v 1.20 2002-11-14 22:04:16 mike Exp $
package org.z3950.zing.cql;
import java.io.IOException;
/**
* Compiles CQL strings into parse trees of CQLNode subtypes.
*
- * @version $Id: CQLParser.java,v 1.19 2002-11-08 16:38:47 mike Exp $
+ * @version $Id: CQLParser.java,v 1.20 2002-11-14 22:04:16 mike Exp $
* @see <A href="http://zing.z3950.org/cql/index.html"
* >http://zing.z3950.org/cql/index.html</A>
*/
lexer = new CQLLexer(cql, LEXDEBUG);
lexer.nextToken();
- debug("about to parse_query()");
- CQLNode root = parse_query("srw.serverChoice", new CQLRelation("scr"));
- // ### "scr" above should arguably be "="
+ debug("about to parseQuery()");
+ CQLNode root = parseQuery("srw.serverChoice", new CQLRelation("scr"));
if (lexer.ttype != lexer.TT_EOF)
throw new CQLParseException("junk after end: " + lexer.render());
return root;
}
- private CQLNode parse_query(String qualifier, CQLRelation relation)
+ private CQLNode parseQuery(String qualifier, CQLRelation relation)
throws CQLParseException, IOException {
- debug("in parse_query()");
+ debug("in parseQuery()");
- CQLNode term = parse_term(qualifier, relation);
+ CQLNode term = parseTerm(qualifier, relation);
while (lexer.ttype != lexer.TT_EOF &&
lexer.ttype != ')') {
if (lexer.ttype == lexer.TT_AND) {
match(lexer.TT_AND);
- CQLNode term2 = parse_term(qualifier, relation);
+ CQLNode term2 = parseTerm(qualifier, relation);
term = new CQLAndNode(term, term2);
} else if (lexer.ttype == lexer.TT_OR) {
match(lexer.TT_OR);
- CQLNode term2 = parse_term(qualifier, relation);
+ CQLNode term2 = parseTerm(qualifier, relation);
term = new CQLOrNode(term, term2);
} else if (lexer.ttype == lexer.TT_NOT) {
match(lexer.TT_NOT);
- CQLNode term2 = parse_term(qualifier, relation);
+ CQLNode term2 = parseTerm(qualifier, relation);
term = new CQLNotNode(term, term2);
} else if (lexer.ttype == lexer.TT_PROX) {
match(lexer.TT_PROX);
CQLProxNode proxnode = new CQLProxNode(term);
gatherProxParameters(proxnode);
- CQLNode term2 = parse_term(qualifier, relation);
+ CQLNode term2 = parseTerm(qualifier, relation);
proxnode.addSecondSubterm(term2);
term = (CQLNode) proxnode;
} else {
return term;
}
- private CQLNode parse_term(String qualifier, CQLRelation relation)
+ private CQLNode parseTerm(String qualifier, CQLRelation relation)
throws CQLParseException, IOException {
- debug("in parse_term()");
+ debug("in parseTerm()");
String word;
while (true) {
if (lexer.ttype == '(') {
debug("parenthesised term");
match('(');
- CQLNode expr = parse_query(qualifier, relation);
+ CQLNode expr = parseQuery(qualifier, relation);
match(')');
return expr;
- } else if (lexer.ttype != lexer.TT_WORD &&
- lexer.ttype != lexer.TT_NUMBER &&
- lexer.ttype != '"') {
- throw new CQLParseException("expected qualifier or term, " +
- "got " + lexer.render());
+ } else if (lexer.ttype == '>') {
+ match('>');
+ return parsePrefix(qualifier, relation);
}
debug("non-parenthesised term");
- if (lexer.ttype == lexer.TT_NUMBER) {
- word = lexer.render();
- } else {
- word = lexer.sval;
- }
- match(lexer.ttype);
+ word = matchSymbol("qualifier or term");
if (!isBaseRelation())
break;
return node;
}
+ private CQLNode parsePrefix(String qualifier, CQLRelation relation)
+ throws CQLParseException, IOException {
+ debug("prefix mapping");
+
+ String name = null;
+ String identifier = matchSymbol("prefix-name");
+ if (lexer.ttype == '=') {
+ match('=');
+ name = identifier;
+ identifier = matchSymbol("prefix-identifer");
+ }
+ CQLNode term = parseTerm(qualifier, relation);
+ return new CQLPrefixNode(name, identifier, term);
+ }
+
private void gatherProxParameters(CQLProxNode node)
throws CQLParseException, IOException {
for (int i = 0; i < 4; i++) {
return (isProxRelation() ||
lexer.ttype == lexer.TT_ANY ||
lexer.ttype == lexer.TT_ALL ||
- lexer.ttype == lexer.TT_EXACT);
+ lexer.ttype == lexer.TT_EXACT ||
+ lexer.ttype == lexer.TT_SCR);
}
private boolean isProxRelation() {
" (tmp=" + tmp + ")");
}
+ private String matchSymbol(String expected)
+ throws CQLParseException, IOException {
+
+ debug("in matchSymbol()");
+ if (lexer.ttype == lexer.TT_WORD ||
+ lexer.ttype == lexer.TT_NUMBER ||
+ lexer.ttype == '"' ||
+ // The following is a complete list of keywords. Because
+ // they're listed here, they can be used unquoted as
+ // qualifiers, terms, prefix names and prefix identifiers.
+ lexer.ttype == lexer.TT_AND ||
+ lexer.ttype == lexer.TT_OR ||
+ lexer.ttype == lexer.TT_NOT ||
+ lexer.ttype == lexer.TT_PROX ||
+ lexer.ttype == lexer.TT_ANY ||
+ lexer.ttype == lexer.TT_ALL ||
+ lexer.ttype == lexer.TT_EXACT ||
+ lexer.ttype == lexer.TT_pWORD ||
+ lexer.ttype == lexer.TT_SENTENCE ||
+ lexer.ttype == lexer.TT_PARAGRAPH ||
+ lexer.ttype == lexer.TT_ELEMENT ||
+ lexer.ttype == lexer.TT_ORDERED ||
+ lexer.ttype == lexer.TT_UNORDERED ||
+ lexer.ttype == lexer.TT_RELEVANT ||
+ lexer.ttype == lexer.TT_FUZZY ||
+ lexer.ttype == lexer.TT_STEM ||
+ lexer.ttype == lexer.TT_SCR) {
+ String symbol = (lexer.ttype == lexer.TT_NUMBER) ?
+ lexer.render() : lexer.sval;
+ match(lexer.ttype);
+ return symbol;
+ }
+
+ throw new CQLParseException("expected " + expected + ", " +
+ "got " + lexer.render());
+ }
+
/**
* Simple test-harness for the CQLParser class.
--- /dev/null
+// $Id: CQLPrefix.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+
+package org.z3950.zing.cql;
+import java.lang.String;
+
+/**
+ * Represents a CQL prefix mapping from short name to long identifier.
+ *
+ * @version $Id: CQLPrefix.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+ */
+public class CQLPrefix {
+ /**
+ * The short name of the prefix mapping - that is, the prefix
+ * itself, such as <TT>dc</TT>, as it might be used in a qualifier
+ * like <TT>dc.title</TT>.
+ */
+ String name;
+
+ /**
+ * The full identifier name of the prefix mapping - that is, the prefix
+ * itself, such as <TT>dc</TT>, as it might be used in a qualifier
+ * like <TT>dc.title</TT>.
+ */
+ String identifier;
+
+ /**
+ * Creates a new CQLPrefix mapping, which maps the specified name
+ * to the specified identifier.
+ */
+ CQLPrefix(String name, String identifier) {
+ this.name = name;
+ this.identifier = identifier;
+ }
+}
--- /dev/null
+// $Id: CQLPrefixNode.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+
+package org.z3950.zing.cql;
+import java.lang.String;
+import java.util.Properties;
+
+
+/**
+ * Represents a prefix node in a CQL parse-tree.
+ *
+ * @version $Id: CQLPrefixNode.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+ */
+public class CQLPrefixNode extends CQLNode {
+ /**
+ * The prefix definition that governs the subtree.
+ */
+ public CQLPrefix prefix;
+
+ /**
+ * The root of a parse-tree representing the part of the query
+ * that is governed by this prefix definition.
+ */
+ public CQLNode subtree;
+
+ /**
+ * Creates a new CQLPrefixNode inducing a mapping from the
+ * specified qualifier-set name to the specified identifier across
+ * the specified subtree.
+ */
+ public CQLPrefixNode(String name, String identifier, CQLNode subtree) {
+ this.prefix = new CQLPrefix(name, identifier);
+ this.subtree = subtree;
+ }
+
+ public String toXCQL(int level) {
+ String maybeName = "";
+ if (prefix.name != null)
+ maybeName = indent(level+1) + "<name>" + prefix.name + "<name>\n";
+
+ return (indent(level) + "<prefix>\n" + maybeName +
+ indent(level+1) +
+ "<identifier>" + prefix.identifier + "<identifier>\n" +
+ subtree.toXCQL(level+1) +
+ indent(level) + "</prefix>\n");
+ }
+
+ public String toCQL() {
+ // ### We don't always need parens around the operand
+ return ">" + prefix.name + "=\"" + prefix.identifier + "\" " +
+ "(" + subtree.toCQL() + ")";
+ }
+
+ public String toPQF(Properties config) throws PQFTranslationException {
+ // Prefixes and their identifiers don't actually play any role
+ // in PQF translation, since the meanings of the qualifiers,
+ // including their prefixes if any, are instead wired into
+ // `config'.
+ return subtree.toPQF(config);
+ }
+}
-# $Id: Makefile,v 1.10 2002-11-12 22:38:35 mike Exp $
+# $Id: Makefile,v 1.11 2002-11-14 22:04:16 mike Exp $
+#
+# Your Java compiler, and javadoc, will require that this source
+# directory is on the classpath. The best way to do that is just to
+# add the cql-java distribution's "src" subdirectory to your CLASSPATH
+# environment variable, like this:
+# CLASSPATH=$CLASSPATH:/where/ever/you/unpacked/it/cql-java-VERSION/src
DOCDIR = ../../../../../docs
OBJ = Utils.class \
CQLNode.class CQLTermNode.class CQLBooleanNode.class \
CQLAndNode.class CQLOrNode.class CQLNotNode.class \
- CQLRelation.class CQLProxNode.class ModifierSet.class \
- CQLParser.class CQLLexer.class CQLParseException.class \
- CQLGenerator.class MissingParameterException.class \
+ CQLProxNode.class CQLPrefixNode.class CQLPrefix.class \
+ CQLRelation.class ModifierSet.class \
+ CQLParser.class CQLLexer.class CQLGenerator.class \
+ CQLParseException.class MissingParameterException.class \
PQFTranslationException.class \
UnknownQualifierException.class UnknownRelationException.class \
UnknownRelationModifierException.class UnknownPositionException.class
../../../../../lib/cql-java.jar: $(OBJ)
cd ../../../..; jar cf ../lib/cql-java.jar org/z3950/zing/cql/*.class
-# ### FIX THIS COMMENT!
-# Your Java compiler will require that this source directory is on the
-# classpath. Generally, you can use the rules below, which set the
-# classpath suitably. But that will break if you need other elements
-# in the CLASSPATH too. If that's the situation you're in, take the
-# "-classpath ../../../.." flag out of the rules below, and set your
-# CLASSPATH environment variable to include
-# /where/ever/you/unpacked/it/cql-java-VERSION/src
-#
%.class: %.java
javac $<
-# Simple
+
+# Simple
cat
"cat"
"prox/>=/5/word"
("cat")
((dog))
+all
+prox
# index relation term
dc.fish all/stem/fuzzy "fish chips"
(title any frog)
((dc.title any/stem "frog pond"))
+dc.title scr "fish frog chicken"
# Simple Boolean
cat not frog
(cat not frog)
"cat" not "fish food"
-xml and "prox///word/"
+xml and "prox///"
+fred and any
+((fred or all))
a or b and c not d
# I/R/T plus Boolean
bath.author any fish and dc.title all "cat dog"
-(title any/stem "fish dog" or "and")
+(title any/stem "fish dog" or and)
# Prox
cat prox hat
cat prox/=/3/word/ordered hat
cat prox//3 hat
-"fish food" prox///sentence "and"
-title all "chips frog" prox//5/word "any"
-(dc.author exact "jones" prox//5 title >= "smith")
+"fish food" prox///sentence and
+title all "chips frog" prox/>=/5 exact
+(dc.author exact "jones" prox/</5/element title >= "smith")
((cat prox hat))
# Special characters
# Lame searches
-"any" or "all:stem" and "all" exact "any" prox///word "prox"="fuzzy"
-((((((((("any")))))))))
-
+any or all:stem and all exact any prox prox=fuzzy
+(((((((((any)))))))))
+("")
# Invalid searches [should error]
>
===
cat or
-index any
+index any
index any/wrong term
a prox/wrong b
()
(a
index any fish)
(cat any dog or ())
-fred and any
-((fred or all))
-sorry = (mike)
+title = ("illegal parentheses")
+"quoted" any "illegal quotes"