1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.4//EN"
2 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % idcommon SYSTEM "common/common.ent">
11 <refentry id="yaz-icu">
13 <productname>YAZ</productname>
14 <productnumber>&version;</productnumber>
15 <info><orgname>Index Data</orgname></info>
19 <refentrytitle>yaz-icu</refentrytitle>
20 <manvolnum>1</manvolnum>
21 <refmiscinfo class="manual">Commands</refmiscinfo>
25 <refname>yaz-icu</refname>
26 <refpurpose>YAZ ICU utility</refpurpose>
31 <command>yaz-icu</command>
32 <arg choice="opt" rep="repeat">commands</arg>
33 <arg>-c <replaceable>config</replaceable></arg>
34 <arg>-p <replaceable>opt</replaceable></arg>
40 <refsect1><title>DESCRIPTION</title>
42 <command>yaz-icu</command> is utility which demonstrates
43 the ICU chain module of yaz. (<filename>yaz/icu.h</filename>).
47 <refsect1><title>OPTIONS</title>
50 <term>-c <replaceable>config</replaceable></term>
52 Specifies the file containing ICU chain configuration
58 <term>-p <replaceable>type</replaceable></term>
60 Specifies extra information to be printed about the ICU system.
61 If <replaceable>type</replaceable> is <literal>c</literal>
62 then ICU converters are printed.
63 If <replaceable>type</replaceable> is <literal>l</literal>
64 available locales are printed.
65 If <replaceable>type</replaceable> is <literal>t</literal>
66 available transliterators are printed.
73 Specifies that output should include sort key as well. Note that
74 sort key differs between ICU versions.
81 Specifies that output should be XML based rather than
88 <refsect1><title>ICU chain configuration</title>
90 The ICU chain configuration speicifies one or more rules to convert
91 text data into tokens. The configuration format is XML based.
94 The toplevel element must be named <literal>icu_chain</literal>.
95 The <literal>icu_chain</literal> element has one required attribute
96 <literal>locale</literal> which specifies the ICU locale to be used
97 in the conversion steps.
100 The <literal>icu_chain</literal> element must include elements where
101 each element specifies a conversion step. The conversion is performed
102 in the order in which the conversion steps are specified.
103 Each conversion element takes one attribute: <literal>rule</literal>
104 which serves as argument to the conversion step.
107 The following conversion elements are available:
113 Converts case and rule specifies how:
119 <para>Lowercase using ICU function u_strToLower. </para>
126 <para>Upper case using ICU function u_strToUpper.</para>
133 <para>To title using UCU function u_strToTitle.</para>
140 <para>Fold case using ICU function u_strFoldCase.</para>
151 This is a meta step which specifies that a term/token is to
152 be displayed. This term is retrieved in an application
153 using function icu_chain_token_display (<filename>yaz/icu.h</filename>).
158 <term>transform</term>
160 Specifies an ICU transform rule using a transliterator
162 The rule attribute is the transliterator Identifier.
163 See <ulink url="&url.icu.transform;">ICU Transforms</ulink> for
169 <term>transliterate</term>
171 Specifies a rule-based transliterator.
172 The rule attribute is the custom transformation rule to be used.
173 See <ulink url="&url.icu.transform;">ICU Transforms</ulink> for
179 <term>tokenize</term>
181 Breaks / tokenizes a string into components using
182 ICU functions ubrk_open, ubrk_setText, .. . The rule is
188 <para>Line. ICU: UBRK_LINE.</para>
195 <para>Sentence. ICU: UBRK_SENTENCE.</para>
202 <para>Word. ICU: UBRK_WORD.</para>
209 <para>Character. ICU: UBRK_CHARACTER.</para>
216 <para>Title. ICU: UBRK_TITLE.</para>
228 <refsect1><title>EXAMPLES</title>
230 The following command analyzes text in file <filename>text</filename>
231 using ICU chain configuration <filename>chain.xml</filename>:
233 cat text | yaz-icu -c chain.xml
235 The chain.xml might look as follows:
237 <icu_chain locale="en">
238 <transform rule="[:Control:] Any-Remove"/>
240 <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
241 <transliterate rule="xy > z"/>
249 <refsect1><title>SEE ALSO</title>
252 <refentrytitle>yaz</refentrytitle>
253 <manvolnum>7</manvolnum>
257 <ulink url="&url.icu;">ICU Home</ulink>
260 <ulink url="&url.icu.transform;">ICU Transforms</ulink>
265 <!-- Keep this comment at the end of the file
270 sgml-minimize-attributes:nil
271 sgml-always-quote-attributes:t
274 sgml-parent-document:nil
275 sgml-local-catalogs: nil
276 sgml-namecase-general:t