source: EcnlProtoTool/trunk/onigmo-5.15.0/README@ 279

Last change on this file since 279 was 279, checked in by coas-nagasima, 7 years ago

ファイルを追加、更新。

File size: 7.7 KB
Line 
1README 2014/07/18
2
3Onigmo (Oniguruma-mod) -- (C) K.Takata <kentkt AT csc DOT jp>
4
5https://github.com/k-takata/Onigmo
6
7Onigmo is a regular expressions library forked from Oniguruma.
8Some of new features introduced in Perl 5.10+ can be used.
9
10Some patches are merged from Ruby 2.x.
11
12See also the Wiki page:
13https://github.com/k-takata/Onigmo/wiki
14
15
16Main New features:
17 Regular Expressions (depends on the syntax):
18 \K, \R, \X, (?(cond)yes|no)
19 (?adlu), \g{name}, \g{n}, (?&name), (?n), (?R), (?0)
20 (?P<name>...), (?P=name), (?P>name)
21
22 API:
23 onig_search_gpos (for Perl-compatible \G)
24
25 Encoding:
26 CP932
27
28 Syntax:
29 Python
30
31
32New Source Files:
33 enc/cp932.c CP932 encoding.
34 enc/jis/props.h JIS character properties data.
35 enc/jis/props.kwd JIS character properties data.
36 enc/unicode/casefold.h Unicode case folding data.
37 enc/unicode/name2ctype.h Unicode properties data.
38
39 onig.py onig.dll/libonig.so loader.
40 testpy.py test program.
41
42 tool/download-ucd.sh downloads Unicode Character Database (UCD).
43 tool/case-folding.rb generates casefold.h from UCD.
44 tool/convert-jis-props.sh converts props.kwd to props.h.
45 tool/convert-name2ctype.sh converts name2ctype.kwd to name2ctypes.h.
46 tool/enc-unicode.rb generates name2ctype.kwd from UCD.
47
48 win32/Makefile.mingw Makefile for Win32 (MinGW)
49 win32/makedef.py creates onig.def.
50 win32/onig.rc resource file for onig.dll.
51
52
53ToDo:
54 * Reduce the size of Unicode Character Data.
55 * (?|...)
56 * Improve (?(cond)yes|no). (support look-ahead/behind assertions.)
57
58
59Oniguruma's README follows:
60======================================================================
61README 2007/05/31
62
63Oniguruma ---- (C) K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
64
65http://www.geocities.jp/kosako3/oniguruma/
66
67Oniguruma is a regular expressions library.
68The characteristics of this library is that different character encoding
69for every regular expression object can be specified.
70
71Supported character encodings:
72
73 ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
74 EUC-JP, EUC-TW, EUC-KR, EUC-CN,
75 Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
76 ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
77 ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
78 ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
79
80* GB18030: contributed by KUBO Takehiro
81* CP1251: contributed by Byte
82------------------------------------------------------------
83
84License
85
86 BSD license.
87
88
89Install
90
91 Case 1: Unix and Cygwin platform
92
93 1. ./configure
94 2. make
95 3. make install
96
97 * uninstall
98
99 make uninstall
100
101 * test (ASCII/EUC-JP)
102
103 make atest
104
105 * configuration check
106
107 onig-config --cflags
108 onig-config --libs
109 onig-config --prefix
110 onig-config --exec-prefix
111
112
113
114 Case 2: Win32 platform (VC++)
115
116 1. copy win32\Makefile Makefile
117 2. copy win32\config.h config.h
118 3. nmake
119
120 onig_s.lib: static link library
121 onig.dll: dynamic link library
122
123 * test (ASCII/Shift_JIS)
124 4. copy win32\testc.c testc.c
125 5. nmake ctest
126
127
128
129Regular Expressions
130
131 See doc/RE (or doc/RE.ja for Japanese).
132
133
134Usage
135
136 Include oniguruma.h in your program. (Oniguruma API)
137 See doc/API for Oniguruma API.
138
139 If you want to disable UChar type (== unsigned char) definition
140 in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then
141 include oniguruma.h.
142
143 If you want to disable regex_t type definition in oniguruma.h,
144 define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
145
146 Example of the compiling/linking command line in Unix or Cygwin,
147 (prefix == /usr/local case)
148
149 cc sample.c -L/usr/local/lib -lonig
150
151
152 If you want to use static link library(onig_s.lib) in Win32,
153 add option -DONIG_EXTERN=extern to C compiler.
154
155
156
157Sample Programs
158
159 sample/simple.c example of the minimum (Oniguruma API)
160 sample/names.c example of the named group callback.
161 sample/encode.c example of some encodings.
162 sample/listcap.c example of the capture history.
163 sample/posix.c POSIX API sample.
164 sample/sql.c example of the variable meta characters.
165 (SQL-like pattern matching)
166
167Test Programs
168 sample/syntax.c Perl, Java and ASIS syntax test.
169 sample/crnl.c --enable-crnl-as-line-terminator test
170
171
172Source Files
173
174 oniguruma.h Oniguruma API header file. (public)
175 onig-config.in configuration check program template.
176
177 regenc.h character encodings framework header file.
178 regint.h internal definitions
179 regparse.h internal definitions for regparse.c and regcomp.c
180 regcomp.c compiling and optimization functions
181 regenc.c character encodings framework.
182 regerror.c error message function
183 regext.c extended API functions. (deluxe version API)
184 regexec.c search and match functions
185 regparse.c parsing functions.
186 regsyntax.c pattern syntax functions and built-in syntax definitions.
187 regtrav.c capture history tree data traverse functions.
188 regversion.c version info function.
189 st.h hash table functions header file
190 st.c hash table functions
191
192 oniggnu.h GNU regex API header file. (public)
193 reggnu.c GNU regex API functions
194
195 onigposix.h POSIX API header file. (public)
196 regposerr.c POSIX error message function.
197 regposix.c POSIX API functions.
198
199 enc/mktable.c character type table generator.
200 enc/ascii.c ASCII encoding.
201 enc/euc_jp.c EUC-JP encoding.
202 enc/euc_tw.c EUC-TW encoding.
203 enc/euc_kr.c EUC-KR, EUC-CN encoding.
204 enc/sjis.c Shift_JIS encoding.
205 enc/big5.c Big5 encoding.
206 enc/gb18030.c GB18030 encoding.
207 enc/koi8.c KOI8 encoding.
208 enc/koi8_r.c KOI8-R encoding.
209 enc/cp1251.c CP1251 encoding.
210 enc/iso8859_1.c ISO-8859-1 encoding. (Latin-1)
211 enc/iso8859_2.c ISO-8859-2 encoding. (Latin-2)
212 enc/iso8859_3.c ISO-8859-3 encoding. (Latin-3)
213 enc/iso8859_4.c ISO-8859-4 encoding. (Latin-4)
214 enc/iso8859_5.c ISO-8859-5 encoding. (Cyrillic)
215 enc/iso8859_6.c ISO-8859-6 encoding. (Arabic)
216 enc/iso8859_7.c ISO-8859-7 encoding. (Greek)
217 enc/iso8859_8.c ISO-8859-8 encoding. (Hebrew)
218 enc/iso8859_9.c ISO-8859-9 encoding. (Latin-5 or Turkish)
219 enc/iso8859_10.c ISO-8859-10 encoding. (Latin-6 or Nordic)
220 enc/iso8859_11.c ISO-8859-11 encoding. (Thai)
221 enc/iso8859_13.c ISO-8859-13 encoding. (Latin-7 or Baltic Rim)
222 enc/iso8859_14.c ISO-8859-14 encoding. (Latin-8 or Celtic)
223 enc/iso8859_15.c ISO-8859-15 encoding. (Latin-9 or West European with Euro)
224 enc/iso8859_16.c ISO-8859-16 encoding.
225 (Latin-10 or South-Eastern European with Euro)
226 enc/utf8.c UTF-8 encoding.
227 enc/utf16_be.c UTF-16BE encoding.
228 enc/utf16_le.c UTF-16LE encoding.
229 enc/utf32_be.c UTF-32BE encoding.
230 enc/utf32_le.c UTF-32LE encoding.
231 enc/unicode.c Unicode information data.
232
233 win32/Makefile Makefile for Win32 (VC++)
234 win32/config.h config.h for Win32
235
236
237
238ToDo
239
240 ? case fold flag: Katakana <-> Hiragana.
241 ? add ONIG_OPTION_NOTBOS/NOTEOS. (\A, \z, \Z)
242 ?? \X (== \PM\pM*)
243 ?? implement syntax behavior ONIG_SYN_CONTEXT_INDEP_ANCHORS.
244 ?? transmission stopper. (return ONIG_STOP from match_at())
245
246and I'm thankful to Akinori MUSHA.
247
248
249Mail Address: K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
Note: See TracBrowser for help on using the repository browser.