1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
2 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % idcommon SYSTEM "common/common.ent">
11 <refentry id="yaz-icu">
13 <productname>YAZ</productname>
14 <productnumber>&version;</productnumber>
18 <refentrytitle>yaz-icu</refentrytitle>
19 <manvolnum>1</manvolnum>
23 <refname>yaz-icu</refname>
24 <refpurpose>YAZ ICU utility</refpurpose>
29 <command>yaz-icu</command>
30 <arg choice="opt" rep="repeat">commands</arg>
31 <arg>-c <replaceable>config</replaceable></arg>
32 <arg>-p <replaceable>opt</replaceable></arg>
38 <refsect1><title>DESCRIPTION</title>
40 <command>yaz-icu</command> is utility which demonstrates
41 the ICU chain module of yaz. (<filename>yaz/icu.h</filename>).
45 <refsect1><title>OPTIONS</title>
48 <term>-c <replaceable>config</replaceable></term>
50 Specifies the file containing ICU chain configuration
56 <term>-p <replaceable>type</replaceable></term>
58 Specifies extra information to be printed about the ICU system.
59 If <replaceable>type</replaceable> is <literal>c</literal>
60 then ICU converters are printed.
61 If <replaceable>type</replaceable> is <literal>l</literal>
62 available locales are printed.
63 If <replaceable>type</replaceable> is <literal>t</literal>
64 available transliterators are printed.
71 Specifies that output should include sort key as well. Note that
72 sort key differs between ICU versions.
79 Specifies that output should be XML based rather than
86 <refsect1><title>ICU chain configuration</title>
88 The ICU chain configuration speicifies one or more rules to convert
89 text data into tokens. The configuration format is XML based.
92 The toplevel element must be named <literal>icu_chain</literal>.
93 The <literal>icu_chain</literal> element has one required attribute
94 <literal>locale</literal> which specifies the ICU locale to be used
95 in the conversion steps.
98 The <literal>icu_chain</literal> element must include elements where
99 each element specifies a conversion step. The conversion is performed
100 in the order in which the conversion steps are specified.
101 Each conversion element takes one attribute: <literal>rule</literal>
102 which serves as argument to the conversion step.
105 The following conversion elements are available:
111 Converts case and rule specifies how:
117 <para>Lowercase using ICU function u_strToLower. </para>
124 <para>Upper case using ICU function u_strToUpper.</para>
131 <para>To title using UCU function u_strToTitle.</para>
138 <para>Fold case using ICU function u_strFoldCase.</para>
149 This is a meta step which specifies that a term/token is to
150 be displayed. This term is retrieved in an application
151 using function icu_chain_token_display (<filename>yaz/icu.h</filename>).
156 <term>transform</term>
158 Specifies an ICU transform rule using a transliterator
160 The rule attribute is the transliterator Identifier.
161 See <ulink url="&url.icu.transform;">ICU Transforms</ulink> for
167 <term>transliterate</term>
169 Specifies a rule-based transliterator.
170 The rule attribute is the custom transformation rule to be used.
171 See <ulink url="&url.icu.transform;">ICU Transforms</ulink> for
177 <term>tokenize</term>
179 Breaks / tokenizes a string into components using
180 ICU functions ubrk_open, ubrk_setText, .. . The rule is
186 <para>Line. ICU: UBRK_LINE.</para>
193 <para>Sentence. ICU: UBRK_SENTENCE.</para>
200 <para>Word. ICU: UBRK_WORD.</para>
207 <para>Character. ICU: UBRK_CHARACTER.</para>
214 <para>Title. ICU: UBRK_TITLE.</para>
226 <refsect1><title>EXAMPLES</title>
228 The following command analyzes text in file <filename>text</filename>
229 using ICU chain configuration <filename>chain.xml</filename>:
231 cat text | yaz-icu -c chain.xml
233 The chain.xml might look as follows:
235 <icu_chain locale="en">
236 <transform rule="[:Control:] Any-Remove"/>
238 <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
239 <transliterate rule="xy > z"/>
247 <refsect1><title>SEE ALSO</title>
250 <refentrytitle>yaz</refentrytitle>
251 <manvolnum>7</manvolnum>
255 <ulink url="&url.icu;">ICU Home</ulink>
258 <ulink url="&url.icu.transform;">ICU Transforms</ulink>
263 <!-- Keep this comment at the end of the file
268 sgml-minimize-attributes:nil
269 sgml-always-quote-attributes:t
272 sgml-parent-document:nil
273 sgml-local-catalogs: nil
274 sgml-namecase-general:t