1 | README 2014/07/18
|
---|
2 |
|
---|
3 | Onigmo (Oniguruma-mod) -- (C) K.Takata <kentkt AT csc DOT jp>
|
---|
4 |
|
---|
5 | https://github.com/k-takata/Onigmo
|
---|
6 |
|
---|
7 | Onigmo is a regular expressions library forked from Oniguruma.
|
---|
8 | Some of new features introduced in Perl 5.10+ can be used.
|
---|
9 |
|
---|
10 | Some patches are merged from Ruby 2.x.
|
---|
11 |
|
---|
12 | See also the Wiki page:
|
---|
13 | https://github.com/k-takata/Onigmo/wiki
|
---|
14 |
|
---|
15 |
|
---|
16 | Main New features:
|
---|
17 | Regular Expressions (depends on the syntax):
|
---|
18 | \K, \R, \X, (?(cond)yes|no)
|
---|
19 | (?adlu), \g{name}, \g{n}, (?&name), (?n), (?R), (?0)
|
---|
20 | (?P<name>...), (?P=name), (?P>name)
|
---|
21 |
|
---|
22 | API:
|
---|
23 | onig_search_gpos (for Perl-compatible \G)
|
---|
24 |
|
---|
25 | Encoding:
|
---|
26 | CP932
|
---|
27 |
|
---|
28 | Syntax:
|
---|
29 | Python
|
---|
30 |
|
---|
31 |
|
---|
32 | New Source Files:
|
---|
33 | enc/cp932.c CP932 encoding.
|
---|
34 | enc/jis/props.h JIS character properties data.
|
---|
35 | enc/jis/props.kwd JIS character properties data.
|
---|
36 | enc/unicode/casefold.h Unicode case folding data.
|
---|
37 | enc/unicode/name2ctype.h Unicode properties data.
|
---|
38 |
|
---|
39 | onig.py onig.dll/libonig.so loader.
|
---|
40 | testpy.py test program.
|
---|
41 |
|
---|
42 | tool/download-ucd.sh downloads Unicode Character Database (UCD).
|
---|
43 | tool/case-folding.rb generates casefold.h from UCD.
|
---|
44 | tool/convert-jis-props.sh converts props.kwd to props.h.
|
---|
45 | tool/convert-name2ctype.sh converts name2ctype.kwd to name2ctypes.h.
|
---|
46 | tool/enc-unicode.rb generates name2ctype.kwd from UCD.
|
---|
47 |
|
---|
48 | win32/Makefile.mingw Makefile for Win32 (MinGW)
|
---|
49 | win32/makedef.py creates onig.def.
|
---|
50 | win32/onig.rc resource file for onig.dll.
|
---|
51 |
|
---|
52 |
|
---|
53 | ToDo:
|
---|
54 | * Reduce the size of Unicode Character Data.
|
---|
55 | * (?|...)
|
---|
56 | * Improve (?(cond)yes|no). (support look-ahead/behind assertions.)
|
---|
57 |
|
---|
58 |
|
---|
59 | Oniguruma's README follows:
|
---|
60 | ======================================================================
|
---|
61 | README 2007/05/31
|
---|
62 |
|
---|
63 | Oniguruma ---- (C) K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
|
---|
64 |
|
---|
65 | http://www.geocities.jp/kosako3/oniguruma/
|
---|
66 |
|
---|
67 | Oniguruma is a regular expressions library.
|
---|
68 | The characteristics of this library is that different character encoding
|
---|
69 | for every regular expression object can be specified.
|
---|
70 |
|
---|
71 | Supported character encodings:
|
---|
72 |
|
---|
73 | ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
|
---|
74 | EUC-JP, EUC-TW, EUC-KR, EUC-CN,
|
---|
75 | Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
|
---|
76 | ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
|
---|
77 | ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
|
---|
78 | ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
|
---|
79 |
|
---|
80 | * GB18030: contributed by KUBO Takehiro
|
---|
81 | * CP1251: contributed by Byte
|
---|
82 | ------------------------------------------------------------
|
---|
83 |
|
---|
84 | License
|
---|
85 |
|
---|
86 | BSD license.
|
---|
87 |
|
---|
88 |
|
---|
89 | Install
|
---|
90 |
|
---|
91 | Case 1: Unix and Cygwin platform
|
---|
92 |
|
---|
93 | 1. ./configure
|
---|
94 | 2. make
|
---|
95 | 3. make install
|
---|
96 |
|
---|
97 | * uninstall
|
---|
98 |
|
---|
99 | make uninstall
|
---|
100 |
|
---|
101 | * test (ASCII/EUC-JP)
|
---|
102 |
|
---|
103 | make atest
|
---|
104 |
|
---|
105 | * configuration check
|
---|
106 |
|
---|
107 | onig-config --cflags
|
---|
108 | onig-config --libs
|
---|
109 | onig-config --prefix
|
---|
110 | onig-config --exec-prefix
|
---|
111 |
|
---|
112 |
|
---|
113 |
|
---|
114 | Case 2: Win32 platform (VC++)
|
---|
115 |
|
---|
116 | 1. copy win32\Makefile Makefile
|
---|
117 | 2. copy win32\config.h config.h
|
---|
118 | 3. nmake
|
---|
119 |
|
---|
120 | onig_s.lib: static link library
|
---|
121 | onig.dll: dynamic link library
|
---|
122 |
|
---|
123 | * test (ASCII/Shift_JIS)
|
---|
124 | 4. copy win32\testc.c testc.c
|
---|
125 | 5. nmake ctest
|
---|
126 |
|
---|
127 |
|
---|
128 |
|
---|
129 | Regular Expressions
|
---|
130 |
|
---|
131 | See doc/RE (or doc/RE.ja for Japanese).
|
---|
132 |
|
---|
133 |
|
---|
134 | Usage
|
---|
135 |
|
---|
136 | Include oniguruma.h in your program. (Oniguruma API)
|
---|
137 | See doc/API for Oniguruma API.
|
---|
138 |
|
---|
139 | If you want to disable UChar type (== unsigned char) definition
|
---|
140 | in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then
|
---|
141 | include oniguruma.h.
|
---|
142 |
|
---|
143 | If you want to disable regex_t type definition in oniguruma.h,
|
---|
144 | define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
|
---|
145 |
|
---|
146 | Example of the compiling/linking command line in Unix or Cygwin,
|
---|
147 | (prefix == /usr/local case)
|
---|
148 |
|
---|
149 | cc sample.c -L/usr/local/lib -lonig
|
---|
150 |
|
---|
151 |
|
---|
152 | If you want to use static link library(onig_s.lib) in Win32,
|
---|
153 | add option -DONIG_EXTERN=extern to C compiler.
|
---|
154 |
|
---|
155 |
|
---|
156 |
|
---|
157 | Sample Programs
|
---|
158 |
|
---|
159 | sample/simple.c example of the minimum (Oniguruma API)
|
---|
160 | sample/names.c example of the named group callback.
|
---|
161 | sample/encode.c example of some encodings.
|
---|
162 | sample/listcap.c example of the capture history.
|
---|
163 | sample/posix.c POSIX API sample.
|
---|
164 | sample/sql.c example of the variable meta characters.
|
---|
165 | (SQL-like pattern matching)
|
---|
166 |
|
---|
167 | Test Programs
|
---|
168 | sample/syntax.c Perl, Java and ASIS syntax test.
|
---|
169 | sample/crnl.c --enable-crnl-as-line-terminator test
|
---|
170 |
|
---|
171 |
|
---|
172 | Source Files
|
---|
173 |
|
---|
174 | oniguruma.h Oniguruma API header file. (public)
|
---|
175 | onig-config.in configuration check program template.
|
---|
176 |
|
---|
177 | regenc.h character encodings framework header file.
|
---|
178 | regint.h internal definitions
|
---|
179 | regparse.h internal definitions for regparse.c and regcomp.c
|
---|
180 | regcomp.c compiling and optimization functions
|
---|
181 | regenc.c character encodings framework.
|
---|
182 | regerror.c error message function
|
---|
183 | regext.c extended API functions. (deluxe version API)
|
---|
184 | regexec.c search and match functions
|
---|
185 | regparse.c parsing functions.
|
---|
186 | regsyntax.c pattern syntax functions and built-in syntax definitions.
|
---|
187 | regtrav.c capture history tree data traverse functions.
|
---|
188 | regversion.c version info function.
|
---|
189 | st.h hash table functions header file
|
---|
190 | st.c hash table functions
|
---|
191 |
|
---|
192 | oniggnu.h GNU regex API header file. (public)
|
---|
193 | reggnu.c GNU regex API functions
|
---|
194 |
|
---|
195 | onigposix.h POSIX API header file. (public)
|
---|
196 | regposerr.c POSIX error message function.
|
---|
197 | regposix.c POSIX API functions.
|
---|
198 |
|
---|
199 | enc/mktable.c character type table generator.
|
---|
200 | enc/ascii.c ASCII encoding.
|
---|
201 | enc/euc_jp.c EUC-JP encoding.
|
---|
202 | enc/euc_tw.c EUC-TW encoding.
|
---|
203 | enc/euc_kr.c EUC-KR, EUC-CN encoding.
|
---|
204 | enc/sjis.c Shift_JIS encoding.
|
---|
205 | enc/big5.c Big5 encoding.
|
---|
206 | enc/gb18030.c GB18030 encoding.
|
---|
207 | enc/koi8.c KOI8 encoding.
|
---|
208 | enc/koi8_r.c KOI8-R encoding.
|
---|
209 | enc/cp1251.c CP1251 encoding.
|
---|
210 | enc/iso8859_1.c ISO-8859-1 encoding. (Latin-1)
|
---|
211 | enc/iso8859_2.c ISO-8859-2 encoding. (Latin-2)
|
---|
212 | enc/iso8859_3.c ISO-8859-3 encoding. (Latin-3)
|
---|
213 | enc/iso8859_4.c ISO-8859-4 encoding. (Latin-4)
|
---|
214 | enc/iso8859_5.c ISO-8859-5 encoding. (Cyrillic)
|
---|
215 | enc/iso8859_6.c ISO-8859-6 encoding. (Arabic)
|
---|
216 | enc/iso8859_7.c ISO-8859-7 encoding. (Greek)
|
---|
217 | enc/iso8859_8.c ISO-8859-8 encoding. (Hebrew)
|
---|
218 | enc/iso8859_9.c ISO-8859-9 encoding. (Latin-5 or Turkish)
|
---|
219 | enc/iso8859_10.c ISO-8859-10 encoding. (Latin-6 or Nordic)
|
---|
220 | enc/iso8859_11.c ISO-8859-11 encoding. (Thai)
|
---|
221 | enc/iso8859_13.c ISO-8859-13 encoding. (Latin-7 or Baltic Rim)
|
---|
222 | enc/iso8859_14.c ISO-8859-14 encoding. (Latin-8 or Celtic)
|
---|
223 | enc/iso8859_15.c ISO-8859-15 encoding. (Latin-9 or West European with Euro)
|
---|
224 | enc/iso8859_16.c ISO-8859-16 encoding.
|
---|
225 | (Latin-10 or South-Eastern European with Euro)
|
---|
226 | enc/utf8.c UTF-8 encoding.
|
---|
227 | enc/utf16_be.c UTF-16BE encoding.
|
---|
228 | enc/utf16_le.c UTF-16LE encoding.
|
---|
229 | enc/utf32_be.c UTF-32BE encoding.
|
---|
230 | enc/utf32_le.c UTF-32LE encoding.
|
---|
231 | enc/unicode.c Unicode information data.
|
---|
232 |
|
---|
233 | win32/Makefile Makefile for Win32 (VC++)
|
---|
234 | win32/config.h config.h for Win32
|
---|
235 |
|
---|
236 |
|
---|
237 |
|
---|
238 | ToDo
|
---|
239 |
|
---|
240 | ? case fold flag: Katakana <-> Hiragana.
|
---|
241 | ? add ONIG_OPTION_NOTBOS/NOTEOS. (\A, \z, \Z)
|
---|
242 | ?? \X (== \PM\pM*)
|
---|
243 | ?? implement syntax behavior ONIG_SYN_CONTEXT_INDEP_ANCHORS.
|
---|
244 | ?? transmission stopper. (return ONIG_STOP from match_at())
|
---|
245 |
|
---|
246 | and I'm thankful to Akinori MUSHA.
|
---|
247 |
|
---|
248 |
|
---|
249 | Mail Address: K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
|
---|