1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
2 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % idcommon SYSTEM "common/common.ent">
11 <refentry id="yaz-icu">
13 <productname>YAZ</productname>
14 <productnumber>&version;</productnumber>
18 <refentrytitle>yaz-icu</refentrytitle>
19 <manvolnum>1</manvolnum>
23 <refname>yaz-icu</refname>
24 <refpurpose>YAZ ICU utility</refpurpose>
29 <command>yaz-icu</command>
30 <arg choice="opt" rep="repeat">commands</arg>
31 <arg>-c <replaceable>config</replaceable></arg>
32 <arg>-p <replaceable>opt</replaceable></arg>
37 <refsect1><title>DESCRIPTION</title>
39 <command>yaz-icu</command> is utility which demonstrates
40 the ICU chain module of yaz. (<filename>yaz/icu.h</filename>).
44 <refsect1><title>OPTIONS</title>
47 <term>-c <replaceable>config</replaceable></term>
49 Specifies the file containing ICU chain configuration
55 <term>-p <replaceable>type</replaceable></term>
57 Specifies extra information to be printed about the ICU system.
58 If <replaceable>type</replaceable> is <literal>c</literal>
59 then ICU converters are printed.
60 If <replaceable>type</replaceable> is <literal>l</literal>
61 available locales are printed.
62 If <replaceable>type</replaceable> is <literal>t</literal>
63 available transliterators are printed.
68 <term>-x <replaceable>config</replaceable></term>
70 Specifies that output should be XML based rather than
77 <refsect1><title>ICU chain configuration</title>
79 The ICU chain configuration speicifies one or more rules to convert
80 text data into tokens. The configuration format is XML based.
83 The toplevel element must be named <literal>icu_chain</literal>.
84 The <literal>icu_chain</literal> element has one required attribute
85 <literal>locale</literal> which specifies the ICU locale to be used
86 in the conversion steps.
89 The <literal>icu_chain</literal> element must include elements where
90 each element specifies a conversion step. The conversion is performed
91 in the order in which the conversion steps are specified.
92 Each conversion element takes one attribute: <literal>rule</literal>
93 which serves as argument to the conversion step.
96 The following conversion elements are available:
102 Converts case and rule specifies how:
108 <para>Lowercase using ICU function u_strToLower. </para>
115 <para>Upper case using ICU function u_strToUpper.</para>
122 <para>To title using UCU function u_strToTitle.</para>
129 <para>Fold case using ICU function u_strFoldCase.</para>
140 This is a meta step which specifies that a term/token is to
141 be displayed. This term is retrieved in an application
142 using function icu_chain_token_display (<filename>yaz/icu.h</filename>).
147 <term>transform</term>
149 Specifies an ICU transform rule. The rule attribute is the
150 custom transformation rule to be used. This is a text based format
151 which is offered by the ICU transform system. See
152 <ulink url="&url.icu.transform;">ICU Transforms</ulink> for
158 <term>tokenize</term>
160 Breaks / tokenizes a string into components using
161 ICU functions ubrk_open, ubrk_setText, .. . The rule is
167 <para>Line. ICU: UBRK_LINE.</para>
174 <para>Sentence. ICU: UBRK_SENTENCE.</para>
181 <para>Word. ICU: UBRK_WORD.</para>
188 <para>Character. ICU: UBRK_CHARACTER.</para>
195 <para>Title. ICU: UBRK_TITLE.</para>
207 <refsect1><title>EXAMPLES</title>
209 The following command analyzes text in file <filename>text</filename>
210 using ICU chain configuration <filename>chain.xml</filename>:
212 cat text | yaz-icu -c chain.xml
214 The chain.xml might look as follows:
216 <icu_chain locale="en">
217 <transform rule="[:Control:] Any-Remove"/>
219 <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
227 <refsect1><title>SEE ALSO</title>
230 <refentrytitle>yaz</refentrytitle>
231 <manvolnum>7</manvolnum>
235 <ulink url="&url.icu;">ICU Home</ulink>
238 <ulink url="&url.icu.transform;">ICU Transforms</ulink>
243 <!-- Keep this comment at the end of the file
248 sgml-minimize-attributes:nil
249 sgml-always-quote-attributes:t
252 sgml-parent-document:nil
253 sgml-local-catalogs: nil
254 sgml-namecase-general:t