1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
2 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % idcommon SYSTEM "common/common.ent">
11 <refentry id="yaz-icu">
13 <productname>YAZ</productname>
14 <productnumber>&version;</productnumber>
18 <refentrytitle>yaz-icu</refentrytitle>
19 <manvolnum>1</manvolnum>
23 <refname>yaz-icu</refname>
24 <refpurpose>YAZ ICU utility</refpurpose>
29 <command>yaz-icu</command>
30 <arg choice="opt" rep="repeat">commands</arg>
31 <arg>-c <replaceable>config</replaceable></arg>
32 <arg>-p <replaceable>opt</replaceable></arg>
38 <refsect1><title>DESCRIPTION</title>
40 <command>yaz-icu</command> is utility which demonstrates
41 the ICU chain module of yaz. (<filename>yaz/icu.h</filename>).
45 <refsect1><title>OPTIONS</title>
48 <term>-c <replaceable>config</replaceable></term>
50 Specifies the file containing ICU chain configuration
56 <term>-p <replaceable>type</replaceable></term>
58 Specifies extra information to be printed about the ICU system.
59 If <replaceable>type</replaceable> is <literal>c</literal>
60 then ICU converters are printed.
61 If <replaceable>type</replaceable> is <literal>l</literal>
62 available locales are printed.
63 If <replaceable>type</replaceable> is <literal>t</literal>
64 available transliterators are printed.
71 Specifies that output should include sort key as well. Note that
72 sort key differs between ICU versions.
79 Specifies that output should be XML based rather than
86 <refsect1><title>ICU chain configuration</title>
88 The ICU chain configuration speicifies one or more rules to convert
89 text data into tokens. The configuration format is XML based.
92 The toplevel element must be named <literal>icu_chain</literal>.
93 The <literal>icu_chain</literal> element has one required attribute
94 <literal>locale</literal> which specifies the ICU locale to be used
95 in the conversion steps.
98 The <literal>icu_chain</literal> element must include elements where
99 each element specifies a conversion step. The conversion is performed
100 in the order in which the conversion steps are specified.
101 Each conversion element takes one attribute: <literal>rule</literal>
102 which serves as argument to the conversion step.
105 The following conversion elements are available:
111 Converts case and rule specifies how:
117 <para>Lowercase using ICU function u_strToLower. </para>
124 <para>Upper case using ICU function u_strToUpper.</para>
131 <para>To title using UCU function u_strToTitle.</para>
138 <para>Fold case using ICU function u_strFoldCase.</para>
149 This is a meta step which specifies that a term/token is to
150 be displayed. This term is retrieved in an application
151 using function icu_chain_token_display (<filename>yaz/icu.h</filename>).
156 <term>transform</term>
158 Specifies an ICU transform rule. The rule attribute is the
159 custom transformation rule to be used. This is a text based format
160 which is offered by the ICU transform system. See
161 <ulink url="&url.icu.transform;">ICU Transforms</ulink> for
167 <term>tokenize</term>
169 Breaks / tokenizes a string into components using
170 ICU functions ubrk_open, ubrk_setText, .. . The rule is
176 <para>Line. ICU: UBRK_LINE.</para>
183 <para>Sentence. ICU: UBRK_SENTENCE.</para>
190 <para>Word. ICU: UBRK_WORD.</para>
197 <para>Character. ICU: UBRK_CHARACTER.</para>
204 <para>Title. ICU: UBRK_TITLE.</para>
216 <refsect1><title>EXAMPLES</title>
218 The following command analyzes text in file <filename>text</filename>
219 using ICU chain configuration <filename>chain.xml</filename>:
221 cat text | yaz-icu -c chain.xml
223 The chain.xml might look as follows:
225 <icu_chain locale="en">
226 <transform rule="[:Control:] Any-Remove"/>
228 <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
236 <refsect1><title>SEE ALSO</title>
239 <refentrytitle>yaz</refentrytitle>
240 <manvolnum>7</manvolnum>
244 <ulink url="&url.icu;">ICU Home</ulink>
247 <ulink url="&url.icu.transform;">ICU Transforms</ulink>
252 <!-- Keep this comment at the end of the file
257 sgml-minimize-attributes:nil
258 sgml-always-quote-attributes:t
261 sgml-parent-document:nil
262 sgml-local-catalogs: nil
263 sgml-namecase-general:t