¿À¸®Áö³¯ »çÀü/ ÄÚÆĽº·ÎºÎÅÍÀÇ ÆĶó¹ÌÅÍ ÃßÁ¤

$Id: learn.html 65 2007-01-30 00:52:53Z taku-ku $;

°³¿ä

ÇнÀ¿ë ÄÚÆĽº·ÎºÎÅÍ ÆĶó¹ÌÅÍ( ÄÚ½ºÆ®Ä¡) (À»)¸¦ ÃßÁ¤ÇÒ ¼ö ÀÖ½À´Ï´Ù. MeCab ÀÚ½ÅÀº Ç°»ç ü°è¿¡ ºñÀÇÁ¸ÀÎ ¼³°è°¡ µÇ¾î Àֱ⠶§¹®¿¡, µ¶ÀÚÀûÀÎ Ç°»ç ü°è, »çÀü, ÄÚÆĽº¿¡ ±Ù°ÅÇÏ´Â Çؼ®±â¸¦ ÀÛ¼ºÇÒ ¼ö ÀÖ½À´Ï´Ù. ÆĶó¹ÌÅÍ ÃßÁ¤¿¡´Â Conditinoal Random Fields (CRF) (À»)¸¦ »ç¿ëÇÏ°í ÀÖ½À´Ï´Ù.

ó¸®ÀÇ È帧

µ¥ÀÌÅÍ Ç÷ο쵵´Â ´ÙÀ½°ú °°ÀÌ µË´Ï´Ù.

ÆĶó¹ÌÅÍ ÃßÁ¤¿¡´Â ÀÌÇÏÀÇ ¼­ºê ŽºÅ©°¡ ÀÖ½À´Ï´Ù.

°¢°¢ ¼ø¼­¿¡ ¼³¸íÇØ °¥ °ÍÀÔ´Ï´Ù.

Seed »çÀüÀÇ Áغñ

MeCab ÀÇ »çÀüÀº CSV ±×¸®°í ±â¼úµË´Ï´Ù. Seed »çÀü°ú ¹èÆ÷ »çÀüÀÇ Æ÷¸· Æ®´Â ±âº»ÀûÀ¸·Î µ¿ÀÏÇÕ´Ï´Ù.

ÀÌÇÏ°¡ »çÀüÀÇ ¿£Æ®¸®ÀÇ ¿¹ÀÔ´Ï´Ù.


ÁøÇб³,0,0,0,
¸í»ç,
ÀϹÝ,*,*,*,*,
ÁøÇб³,
½Å°¡Å©ÄÚ¿ì,
½Å°¡Å©ÄÚ

¸ÅÈ­²É,0,0,0,
¸í»ç,
ÀϹÝ,*,*,*,*,
¸ÅÈ­²É,
¿ì¸Þ°í¿ä¹Ì,
¿ì¸Þ°í¿ä¹Ì

±â¾Ð,0,0,0,
¸í»ç,
ÀϹÝ,*,*,*,*,
±â¾Ð,
Å°¾ÆÆ®,
Å°¾ÆÆ®

¼öÁßÀͼ±,0,0,0,
¸í»ç,
ÀϹÝ,*,*,*,*,
¼öÁßÀͼ±,
½ºÀÌÃò¿ì¿äÅ©¼¾,
½ºÀÌÃò¿äÅ©¼¾

ÃÖÃÊÀÇ4 Ä÷³´«±îÁö´Â, Çʼö Ç׸ñÀ¸·Î,

µÇ°í ÀÖ½À´Ï´Ù. ¿ÞÂÊ ¿¬Á¢ »óÅ ¹øÈ£, ¿À¸¥ÂÊ ¿¬Á¢ »óÅ ¹øÈ£, ÄÚ½ºÆ®´Â, Seed »çÀü¿¡¼­´Â »ç¿ëµÇÁö ¾Ê±â ¶§¹®¿¡ 0 (À¸)·Î ÇصӴϴÙ.

5 Ä÷³´« ÀÌÈÄ´Â ¡¸Å»ý¡¹À̶ó°í ºÒ¸®´Â Ç׸ñÀÔ´Ï´Ù. MeCab ÇÏ, ½Ã½ºÅÛÀÇ ¹ü¿ë¼º (À»)¸¦ ³ôÀ̱â À§Çؼ­, ¡¸Ç°»ç¡¹ ¡¸È°¿ë¡¹ ¡¸Àб⡹ ¡¸¹ßÀ½¡¹À̶ó°í ÇÑ ¡¸´Ü¾î¿¡ ºÎ¿©µÇ¾î Á¤º¸¡¹¸¦ ½Ã½ºÅÛÀº ±¸º°ÇÏÁö ¾Ê°í ¡¸Å»ý¡¹À¸·Î¼­ Ãë±ÞÇÏ°í ÀÖ½À´Ï´Ù. À¯Àú´Â CSV ÇÏÁö¸¸ Çã¶ôÇÏ´Â ÇÑ ¸î°³¿¡¼­µµ Å»ýÀ» ºÎ¿©ÇÒ ¼ö ÀÖ½À´Ï´Ù. ´Ù¸¸, °¢ Ä÷³ÀÇ Å»ýÀÇ Á¤ÀÇ´Â °®Ãß¾î µÑ ÇÊ¿ä°¡ ÀÖ½À´Ï´Ù. (5 Ä÷³´«Àº Ç°»ç, 6 Ä÷³´«Àº Ç°»çÀçºÐ Á¾·ùµî) Åë»ó, Å»ý ¹øÈ£ÀÇ ÀþÀº °ÍÀ¸·ÎºÎÅÍ ¼ø¼­¿¡ ÀϹÝÀûÀÎ Å»ýÀ» ¿­°ÅÇØ °¥ °ÍÀÔ´Ï´Ù. ( ·Ê: Ç°»ç, Ç°»ç ¼¼ºÐ·ù, È°¿ëÇü, È°¿ëÇü, ¿øÇü, Àбâ, ¹ßÀ½)

Å»ýÀº ³»ºÎÀûÀ¸·Î´Â ¹è¿­·Î¼­ ´Ù·ç¾îÁý´Ï´Ù. 0 ¹ø°ÀÇ Å»ý, 1 ¹ø°ÀÇ Å»ý.. (¿Í)°ú ¸»ÇÏ´Â ºÎ¸£´Â ¹ýÀ¸·Î Å»ýÀ» ÂüÁ¶ÇÏ´Â ÀÏÀÌ ÀÖ½À´Ï´Ù. Å»ýÀÇ ¹øÈ£¿Í ³»ºÎ Ç¥Çö( Ç°»ç, µ¶ ºÁ µî) ÇÏ, À¯Àú ÀÚ½ÅÀÌ °ü¸®ÇØ ÁÖ¼¼¿ä.

»ó±âÀÇ ¿¹´Â, ipadic ÀÇ ¿¹ÀÔ´Ï´Ù. Å»ý¿­·Î¼­

ÇÏÁö¸¸ Á¤Àǵǰí ÀÖ½À´Ï´Ù.

MeCab (Àº)´Â È°¿ë 󸮸¦ ½Ç½ÃÇÏÁö ¾Ê½À´Ï´Ù. È°¿ëÇÏ´Â ¸»ÀÇ °æ¿ì´Â, À¯Àú°¡ »çÀü¿¡ È°¿ë (À»)¸¦ Àü°³ÇÒ ÇÊ¿ä°¡ ÀÖ½À´Ï´Ù.


µ¥¸®°í ³ª°£´Ù,0,0,0,
µ¿»ç,
ÀÚ¸³,*,*,
5´Ü¡¤»çÇà,
±âº»Çü,
µ¥¸®°í ³ª°£´Ù,
Æ®·¹´Ù½º,
Æ®·¹´Ù½º

µ¿¹ÝÃâÀ̾ß,0,0,0,
µ¿»ç,
ÀÚ¸³,*,*,
5´Ü¡¤»çÇà,
¹Ì¿¬Çü,
µ¥¸®°í ³ª°£´Ù,
Æ®·¹´Ù»ç,
Æ®·¹´Ù»ç

µ¿¹ÝÃâ,0,0,0,
µ¿»ç,
ÀÚ¸³,*,*,
5´Ü¡¤»çÇà,
¹Ì¿¬¿ì Á¢¼Ó,
µ¥¸®°í ³ª°£´Ù,
Æ®·¹´Ù¼Ò,
Æ®·¹´Ù¼Ò

µ¥¸®°í ³ª°¡,0,0,0,
µ¿»ç,
ÀÚ¸³,*,*,
5´Ü¡¤»çÇà,
¿¬¿ëÇü,
µ¥¸®°í ³ª°£´Ù,
Æ®·¹´Ù½Ã,
Æ®·¹´Ù½Ã

µ¥¸®°í ³ª°¡¶ó,0,0,0,
µ¿»ç,
ÀÚ¸³,*,*,
5´Ü¡¤»çÇà,
°¡Á¤Çü,
µ¥¸®°í ³ª°£´Ù,
Æ®·¹´Ù¼¼,
Æ®·¹´Ù¼¼

µ¥¸®°í ³ª°¡¶ó,0,0,0,
µ¿»ç,
ÀÚ¸³,*,*,
5´Ü¡¤»çÇà,
¸í·É e,
µ¥¸®°í ³ª°£´Ù,
Æ®·¹´Ù¼¼,
Æ®·¹´Ù¼¼

µ¿¹Ý ³»¹Ð±â,0,0,0,
µ¿»ç,
ÀÚ¸³,*,*,
5´Ü¡¤»çÇà,
°¡Á¤Ãà¾à 1,
µ¥¸®°í ³ª°£´Ù,
Æ®·¹´Ù»þ,
Æ®·¹´Ù»þ

¼³Á¤ ÆÄÀÏÀÇ Áغñ

dicrc

»çÀüÀÇ ´Ù¾çÇÑ µ¿ÀÛÀ» ÁöÁ¤ÇÏ´Â ÆÄÀÏÀÔ´Ï´Ù. ÀÌÇÏ°¡ ÃÖÀúÇÑÀÇ ¼³Á¤ÀÔ´Ï´Ù.

cost-factor = 800
bos-feature = BOS/EOS,*,*,*,*,*,*,*,*
eval-size = 6
unk-eval-size = 4

char.def

¹ÌÁö¾î ó¸®ÀÇ Á¤ÀÇ ÆÄÀÏÀÔ´Ï´Ù. Åë»ó ÀϺ»¾îÀÇ ÇüÅÂ¼Ò Çؼ®¿¡¼­´Â ÀÚÁ¾¿¡ ±Ù°ÅÇÏ´Â ¹ÌÁö ¸»Ã³¸®¸¦ ÇÕ´Ï´Ù. MeCab ±×·³, ¾î´À ¹®ÀÚ¸¦ ¾î´À ÀÚÁ¾À¸·Î¼­ Á¤ÀÇÇϴ°¡ ÇÑ ¼³ Á¤À» ¼¼¼¼ÇÏ°Ô ÁöÁ¤ÇÒ ¼ö ÀÖ½À´Ï´Ù. ÇÑÃþ ´õ, °¢ ÀÚÁ¾¿¡ ´ëÇØ, ¾î¶°ÇÑ ¹ÌÁö¾î 󸮸¦ ½Ç½ÃÇÒ±î ¼¼¼¼ÇÏ°Ô ÁöÁ¤ÇÒ ¼ö ÀÖ½À´Ï´Ù.

ÆÄÀÏÀÇ ÃÖÃʷδÂ, Ä«Å×°í¸®¸íÀÇ Á¤ÀÇ¿Í, °¢ Ä«Å×°í¸®ÀÇ ¹ÌÁö¾î ó¸®ÀÇ µ¿ÀÛ (À»)¸¦ Á¤ÀÇÇÕ´Ï´Ù.


Ä«Å×°í¸®¸í      
µ¿ÀÛ Å¸À̹Ö(0/1)  
±×·ì(0/1)  
±æÀÌ(0,1, 2... n)

·Ê

KANJI          0 0 2
SYMBOL         1 1 0
NUMERIC        1 1 0
ALPHA          1 1 0
HIRAGANA       0 1 2 

´ÙÀ½¿¡, °¢ Ä«Å×°í¸®°¡UCS2 ÀÇ ÄÚµå Æ÷ÀÎÆ®ÀÇ ¾îµð¿¡ ÇØ´çÇÏ´ÂÁö Á¤ÀÇÇÕ´Ï´Ù.

codepoint 
µðÆúÆ® Ä«Å×°í¸®¸í 
ȣȯ Ä«Å×°í¸®¸í1  
ȣȯ Ä«Å×°í¸®¸í2 .. 

ȤÀº,

low_codepoint..high_codepoint 
µðÆúÆ® Ä«Å×°í¸®¸í 
ȣȯ Ä«Å×°í¸®¸í1  
ȣȯ Ä«Å×°í¸®¸í2 .. 

·Ê

0x0009 SPACE
0x30A1..0x30FF  KATAKANA
0x30FC          KATAKANA HIRAGANA  # 
-

ÄÚµå Æ÷ÀÎÆ®´Â UCS2(Unicode) (À»)¸¦ 0x (À¸)·ÎºÎÅÍ ½ÃÀ۵ȴÙ16 Áø¼ö·Î ±â¼úÇÕ´Ï´Ù.

ÃÖÃÊÀÇ Ä«Å×°í¸®´Â, ±× ÄÚµå Æ÷ÀÎÆ®ÀÇ µðÆúÆ® Ä«Å×°í¸®ÀÔ´Ï´Ù. ÇÑÃþ ´õ, ȣȯ Ä«Å×°í¸®¸¦ ¿­°ÅÇÒ ¼ö ÀÖ½À´Ï´Ù. »ó±âÀÇ ¿¹¿¡¼­´Â, ÀåÀ½ ±âÈ£ ¡¸-¡¹ ÇÏ, µðÆúÆ®¿¡¼­´Â īŸī³ªÀÔ´Ï´Ù¸¸, È÷¶ó°¡³ª¸¦ ȣȯ Ä«Å×°í¸®·Î¼­ °¡Áý´Ï´Ù. ±×·ì µ¿ÀÛ¶§¿¡ ȣȯ Ä«Å×°í¸®´Â °°Àº ±×·ìÀ¸·Î¼­ º¸ÀÔ´Ï´Ù.

ÀÌÇÏ°¡ char.def ÀÇ ±¸Ã¼ÀûÀÎ ¿¹ÀÔ´Ï´Ù.

DEFAULT        0 1 0  # DEFAULT is a mandatory category!
SPACE          0 1 0  
KANJI          0 0 2
SYMBOL         1 1 0
NUMERIC        1 1 0
ALPHA          1 1 0
HIRAGANA       0 1 2 
KATAKANA       1 1 0
KANJINUMERIC   1 1 0
GREEK          1 1 0
CYRILLIC       1 1 0

# SPACE
0x0020 SPACE  # DO NOT REMOVE THIS LINE,  0x0020 is reserved for SPACE
0x00D0 SPACE
0x0009 SPACE
0x000B SPACE
0x000A SPACE

# ASCII
0x0021..0x002F SYMBOL
0x0030..0x0039 NUMERIC

... 

# KATAKANA
0x30A1..0x30FF  KATAKANA
0x31F0..0x31FF  KATAKANA  # Small KU .. Small RO
0x30FC          KATAKANA HIRAGANA  # 
-

unk.def

¹ÌÁö¾î¿ëÀÇ »çÀüÀÔ´Ï´Ù.

DEFAULT,0,0,0,
±âÈ£,
ÀϹÝ,*,*,*,*,*
SPACE,0,0,0,
±âÈ£,
°ø¹é,*,*,*,*,*
KANJI,0,0,0,
¸í»ç,
ÀϹÝ,*,*,*,*,*
KANJI,0,0,0,
¸í»ç,
»çÇà º¯°ÝÈ°¿ë Á¢¼Ó,*,*,*,*,*
HIRAGANA,0,0,
¸í»ç,
ÀϹÝ,*,*,*,*,*
HIRAGANA,0,0,0,
¸í»ç,
»çÇà º¯°ÝÈ°¿ë Á¢¼Ó,*,*,*,*,*
HIRAGANA,0,0,0,
¸í»ç,
°íÀ¯ ¸í»ç,
Áö¿ª,
ÀϹÝ,*,*,*
... 

Ç¥ÃþÀÇ ºÎºÐÀ» char.def ±×¸®°í Á¤ÀÇÇÑ Ä«Å×°í¸®¸íÀ¸·Î ÇÑ »çÀü ÆÄÀÏÀÔ´Ï´Ù. °¢ Ä«Å×°í¸®¿¡ ´ëÇؼ­ ¾î¶°ÇÑ ¼Ò»ý·ÄÀ» ºÎ¿©ÇÏ´ÂÁö¸¦ Á¤ÀÇÇÕ´Ï´Ù. 1 °³ÀÇ Ä«Å×°í¸®¿¡ º¹¼öÀÇ Å»ýÀ» Á¤ÀÇÇصµ ±¦Âú½À´Ï´Ù. ÇнÀ ÈÄ, ÀûÀýÇÑ ÄÚ½ºÆ®Ä¡°¡ ÀÚµ¿ÀûÀ¸·Î ÁÖ¾îÁý´Ï´Ù.

rewrite.def

Å»ý¿­·ÎºÎÅÍ ³»ºÎ »óżһý·Ä·Î º¯È¯ÇÏ´Â ¸ÅÇÎÀ» Á¤ÀÇÇÕ´Ï´Ù.

CRF ÇÏ, unigram, ¿ÞÂÊ ¹®¸Æ bigram, ¿ì¹®¸Æ bigram ÀÇ3 Á¤º¸¸¦ »ç¿ëÇØ Åë°è Á¤º¸¸¦ ÇÕ°è Çì¾Æ¸³´Ï´Ù. ¿¹¸¦ µé¸é ÀÌÇÏÀÇ ¡¸¾Æ¸§´Ù¿î °­¡¹À̶ó°í ÇÏ´Â ÀÌÇÏÀÇ ¿¹¿¡¼­´Â, »çÀü¿¡ Á¤Àǵǰí ÀÖ´Â Å»ýÀ¸·ÎºÎÅÍ unigram Å»ý, ¿ÞÂÊ ¹®¸Æ Å»ý( ±× ÇüżҸ¦ ÁÂÃø¿¡¼­ º¸¾ÒÀ» ¶§ÀÇ Å»ý), ¿ì¹®¸ÆÅ»ý( ±× ÇüżҸ¦ ÁÂÃø¿¡¼­ º¸¾ÒÀ» ¶§ÀÇ Å»ý) ÀÇ3 °³°¡ »ç¿ëµË´Ï´Ù. rewrite.def ÇÏ, »çÀüÀÇ Å»ýÀ¸·ÎºÎÅÍ °¢°¢ÀÇ ³»ºÎ Å»ý¿¡ÀÇ ¸ÅÇÎÀ» Á¤ÀÇÇÕ´Ï´Ù.

±¸Ã¼ÀûÀ¸·Î ÀÌÇÏ¿Í °°Àº ÀÏÀÌ ¸ÅÇÎ ÇÔ¼ö¸¦ ÀûÀýÈ÷ Á¤ÀÇÇÏ´Â °ÍÀ¸·Î ½ÇÇöµÉ ¼ö ÀÖ½À´Ï´Ù.

rewrite.def ¿¡´Â 3 °³ÀÇ ¼½¼ÇÀÌ ÀÖ½À´Ï´Ù.

°¢°¢ÀÇ ¼½¼ÇÀÇ µÚ¿¡, 1 Çà¿¡1 °³ÀÇ ¸ÅÇÎ ·êÀÌ °è¼Ó µË´Ï´Ù. ¸ÅÇÎ ·êÀº


¸ÅÄ¡ ÆÐÅÏ  
º¯È¯Ã³
±×·¸´Ù°í ÇÏ´Â Çü½Ä¿¡¼­ ±â¼úÇÕ´Ï´Ù. ¸ÅÇÎ ·êÀº ¼±µÎ·ÎºÎÅÍ ¼ø¼­¿¡ ÁÖ»ç µÇ¾î ÃÖÃÊ·Î ¸ÅÄ¡ÇÑ °ÍÀÌ »ç¿ëµË´Ï´Ù.

¸ÅÄ¡ ÆÐÅÏ¿¡¼­´Â °£´ÜÇÑ Á¤±Ô Ç¥ÇöÀ̸¦ »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù.

º¯È¯Ã³´Â $1 $2, $3.. ±×·¸´Ù°í ÇÏ´Â ¸ÅÅ©·Î¸¦ »ç¿ëÇØ Å»ýÀÇ °¢ ¿ä¼Ò (CSV ±×¸®°í ±â·ÏµÈ ¿ä¼Ò) ÀÇ ³»¿ëÀ» ÂüÁ¶ÇÒ ¼ö ÀÖ½À´Ï´Ù.

·Ê

[unigram rewrite]
# 
Àбâ,
¹ßÀ½À» Á¦°ÅÇØ, 
Ç°»ç1,2,3,4,
È°¿ëÇü,
È°¿ëÇü,
¿øÇü,
Àоî 
(À»)¸¦ »ç¿ëÇÑ´Ù
*,*,*,*,*,*,*,*  $1,$2,$3,$4,$5,$6,$7,$8
# 
ÀбⰡ ¾ø´Â °æ¿ì´Â ¹«½Ã
*,*,*,*,*,*,*    $1,$2,$3,$4,$5,$6,$7,*

[left rewrite]
(
Á¶»ç|
Á¶µ¿»ç),*,*,*,*,*,(
¾ø´Ù|
¾ø´Ù)    $1,$2,$3,$4,$5,$6,
¾ø´Ù
(
Á¶»ç|
Á¶µ¿»ç),
Á¾Á¶»ç,*,*,*,*,(
|
¿ä)   $1,$2,$3,$4,$5,$6,

...

[right rewrite]
(
Á¶»ç|
Á¶µ¿»ç),*,*,*,*,*,(
¾ø´Ù|
¾ø´Ù)    $1,$2,$3,$4,$5,$6,
¾ø´Ù
(
Á¶»ç|
Á¶µ¿»ç),
Á¾Á¶»ç,*,*,*,*,(
|
¿ä)   $1,$2,$3,$4,$5,$6,

..

feature.def

³»ºÎ »óÅÂÀÇ ¼Ò»ý·Ä·ÎºÎÅÍ CRF ÀÇ ¼Ò»ý·ÄÀ» ÃßÃâÇϱâ À§ÇÑ ÅÛÇø´À» Á¤ÀÇÇÑ ÆÄÀÏÀÔ´Ï´Ù

°¢ ÇàÀÌ 1 ÅÛÇø´¿¡ ´ëÀÀÇÕ´Ï´Ù. UNIGRAM ±×·³ ±ÛÀÚ ¸¸ÀÇ °ÍÀº UNIGRAM ¿ë ÀÇ ÅÛÇø´, BIGRAM ±×·³ ±ÛÀÚ ¸¸ÀÇ °ÍÀº ¿¬Á¢¿ëÀÇ ÅÛÇø´ÀÔ´Ï´Ù.

°¢ ÅÛÇø´¿¡¼­´Â, ÀÌÇÏÀÇ ¸ÅÅ©·Î¸¦ »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù

·Ê

UNIGRAM W0:%F[6]
UNIGRAM W1:%F[0]/%F[6]
UNIGRAM W2:%F[0],%F?[1]/%F[6]
UNIGRAM W3:%F[0],%F[1],%F?[2]/%F[6]
UNIGRAM W4:%F[0],%F[1],%F[2],%F?[3]/%F[6]

UNIGRAM T0:%t
UNIGRAM T1:%F[0]/%t
UNIGRAM T2:%F[0],%F?[1]/%t
UNIGRAM T3:%F[0],%F[1],%F?[2]/%t
UNIGRAM T4:%F[0],%F[1],%F[2],%F?[3]/%t

BIGRAM B00:%L[0]/%R[0]
BIGRAM B01:%L[0],%L?[1]/%R[0]
BIGRAM B02:%L[0]/%R[0],%R?[1]
BIGRAM B03:%L[0]/%R[0],%R[1],%R?[2]
BIGRAM B04:%L[0],%L?[1]/%R[0],%R[1],%R?[2]
BIGRAM B05:%L[0]/%R[0],%R[1],%R[2],%R?[3]
BIGRAM B06:%L[0],%L?[1]/%R[0],%R[1],%R[2],%R?[3]
... 

ÇнÀ¿ë ÄÚÆĽºÀÇ Áغñ

ÇнÀ µ¥ÀÌÅÍ´Â, MeCab ÀÇ µðÆúÆ® Ãâ·Â°ú µ¿ÀÏ Æ÷¸ËÀ¸·Î ±â¼úÇÕ´Ï´Ù.


Ÿ·Î    
¸í»ç,
°íÀ¯ ¸í»ç,
Àθí,
¸í,*,*,
Ÿ·Î,
Ÿ·Î,
Ÿ·Î

ÇÏ      
Á¶»ç,
°èÁ¶»ç,*,*,*,*,
ÇÏ,
ÇÏ,
¿Í

Çϳª²¿    
¸í»ç,
°íÀ¯ ¸í»ç,
Àθí,
¸í,*,*,
Çϳª²¿,
ÇϳªÄÚ,
ÇϳªÄÚ

ÇÏÁö¸¸      
Á¶»ç,
°ÝÁ¶»ç,
ÀϹÝ,*,*,*, 
ÇÏÁö¸¸,
°¡,
°¡

ÁÁ¾Æ    
¸í»ç,
Çü¿ëµ¿»ç ¾î°£,*,*,*,*, 
ÁÁ¾Æ,
ºóÆ´,
ºóÆ´

(ÀÌ)´Ù      
Á¶µ¿»ç,*,*,*, 
Ư¼ö¡¤´Ù,
±âº»Çü,
(ÀÌ)´Ù,
´Ù,
´Ù
.       
±âÈ£,
±¸µÎÁ¡,*,*,*,*, . , . , . 
EOS

¼ÒÁÖ    
¸í»ç,
ÀϹÝ,*,*,*,*,
¼ÒÁÖ,
¼î¿ìÃò¿ì,
¼î Ãò-

ÁÁ¾Æ    
¸í»ç,
Çü¿ëµ¿»ç ¾î°£,*,*,*,*,
ÁÁ¾Æ,
ºóÆ´,
ºóÆ´

ÀÇ      
Á¶»ç,
¿¬Ã¼È­,*,*,*,*, 
ÀÇ,
³ë,
³ë

¾Æ¹öÁö    
¸í»ç,
ÀϹÝ,*,*,*,*,
¾Æ¹öÁö,
¾Æ¹öÁö,
¾Æ¹öÁö
.       
±âÈ£,
±¸µÎÁ¡,*,*,*,*, . , . , . 
EOS
... 

ÅÇÀ¸·Î ´Ü¶ôÁö¾îÁø ÃÖÃÊÀÇ ºÎºÐÀÌ Ç¥Ãþ ¹®ÀÚÀÔ´Ï´Ù. ´ÙÀ½¿¡ Å»ý ¹è¿­À» CSV ±×¸®°í Ç¥ÇöÇÑ ¹®Àå ÀÚ·ÄÀÌ °è¼Ó µË´Ï´Ù. ¹®ÀåÀÇ ´Ü¶ô¿¡´Â EOS ¸¸ÀÇ ÇàÀ» µÓ´Ï´Ù.

ÇнÀ¿ë ¹ÙÀ̳ʸ® »çÀüÀÇ ÀÛ¼º

ÇöÀçÀÇ ÀÛ¾÷ µð·ºÅ丮¸¦ WORK (À¸)·Î ÇÕ´Ï´Ù. WORK ÀÌÇÏ¿¡ seed (¿Í)°ú final (¿Í)°ú ¸»ÇÏ´Â µÎ °³ÀÇ µð·ºÅ丮¸¦ ¸¸µé¾î ÁÖ¼¼¿ä.

cd $WORK
mkdir seed final

seed µð·ºÅ丮¿¡ Á¶±Ý Àü ¼³¸íÇÑ ÀÌÇÏÀÇ ÆÄÀÏÀ» Ä«ÇÇÇÕ´Ï´Ù.

·Ê

% cd $WORK/seed
% ls 
Adj.csv          Interjection.csv   Noun.name.csv    Noun.verbal.csv  Symbol.csv        rewrite.def
Adnominal.csv    Noun.adjv.csv      Noun.number.csv  Others.csv       Verb.csv          unk.def
Adverb.csv       Noun.adverbal.csv  Noun.org.csv     Postp-col.csv    char.def
Auxil.csv        Noun.csv           Noun.others.csv  Postp.csv        corpus
Conjunction.csv  Noun.demonst.csv   Noun.place.csv   Prefix.csv       dicrc
Filler.csv       Noun.nai.csv       Noun.proper.csv  Suffix.csv       feature.def

ÀÌÇÏÀÇ Ä¿¸àµå¸¦ ½ÇÇàÇØ, ÇнÀ¿ë ¹ÙÀ̳ʸ® »çÀüÀ» ÀÛ¼ºÇÕ´Ï´Ù.

% cd $WORK/seed
% /usr/local/libexec/mecab/mecab-dict-index


ÀÌÇÏ¿Í °°ÀÌ -d,  -o 
(À»)¸¦ »ç¿ëÇÒ ¼öµµ ÀÖ½À´Ï´Ù. 
% /usr/local/libexec/mecab/mecab-dict-index -d $WORK/seed -o $WORK/seed

CRF ÆĶó¹ÌÅÍÀÇ ÇнÀ

% cd $WORK/seed
% /usr/local/libexec/mecab/mecab-cost-train -c 1.0 corpus model


ÀÌÇÏ¿Í °°ÀÌ -d 
(À»)¸¦ »ç¿ëÇØ »çÀüÀ» ÁöÁ¤ÇÒ ¼öµµ ÀÖ½À´Ï´Ù<
% /usr/local/libexec/mecab/mecab-cost-train -d $WORK/seed -c 1.0 $WORK/seed/corpus $WORK/seed/model

mecab-cost-train (Àº)´Â ¹ÙÀ̳ʸ® ¸ðµ¨ÀÇ ÀÛ¼º¶§¿¡ ´ë·®ÀÇ ¸Þ¸ð¸®¸¦ ¼ÒºñÇÕ´Ï´Ù. ÀÌÇÏ¿Í °°ÀÌ ¹ÙÀ̳ʸ® ¸ðµ¨ÀÇ ÀÛ¼ºÀ» º°ÇÁ·Î¼¼½º·Î ½Ç½ÃÇÏ´Â °ÍÀ¸·Î ¸Þ¸ð¸® ¼Òºñ¸¦ ¾ïÁ¦ÇÒ ¼ö ÀÖ½À´Ï´Ù.

% /usr/local/libexec/mecab/mecab-cost-train -y -c 1.0 corpus model
% /usr/local/libexec/mecab/mecab-cost-train -b model.txt model

ÇÏÀÌÆÛ ÆĶó¹ÌÅÍC ÇÏ, ÇнÀÀÇ ¡¸Èû¡¹À» °áÁ¤ÇÕ´Ï´Ù. C (À»)¸¦ Å©°Ô Çϸé, ÇнÀ µ¥ÀÌÅÍ·Î ÇÒ ¼ö ÀÖÀ» »Ó(¸¸Å­) ÇÇÆ®ÇÏ·Á°í ÇÕ´Ï´Ù¸¸, °úÇнÀÇÒ °¡´É¼ºÀÌ ÀÖ½À´Ï´Ù. ÀÛ°Ô Çϸé, °úÇнÀÀ» ÇÇÇÏ·Á°í ÇÕ´Ï´Ù¸¸, ÃæºÐÇÑ ÇнÀÀ» ÇÒ ¼ö ¾øÀ» °¡´É¼ºÀÌ ÀÖ½À´Ï´Ù. ÀûÀýÇÑ C ÇÏ, ±³Â÷ °ËÁ¤µîÀÇ ¸ðµ¨ ¼±Åà ¼ö¹ýÀ¸·Î ¹ß°ßÀûÀ¸·Î ã¾Æ³¾ ¼ö ¹Û¿¡ ¾ø½À´Ï´Ù. µðÆúÆ®ÀÇ °ªÀº 1. 0 µÇ°í ÀÖ½À´Ï´Ù.

-f ¿É¼Ç¿¡ ÀÇÇؼ­ Å»ý ºóµµÀÇ ¹ÝÀÀÀ» ÀÏÀ¸Å°´Â ÃÖ¼ÒÀÇ ¹°¸®·®À» ÁöÁ¤ÇÒ ¼ö ÀÖ½À´Ï´Ù. ¿¹¸¦ µé¸é, -f 3 (À¸)·Î Çϸé, ÇнÀ µ¥ÀÌÅÍÁß¿¡3 ȸÀÌ»ó ÃâÇöÇÑ Å»ý¸¸À» »ç¿ëÇÕ´Ï´Ù. ÀûÀýÇÑ Å»ý ¹ÝÀÀÀ» ÀÏÀ¸Å°´Â ÃÖ¼ÒÀÇ ¹°¸®·®Àº, ±³Â÷ °ËÁ¤µîÀÇ ¸ðµ¨ ¼±Åà ¼ö¹ýÀ¸·Î ¹ß°ßÀûÀ¸·Î ã¾Æ³¾ ¼ö ¹Û¿¡ ¾ø½À´Ï´Ù.

ÇнÀÁß, ÀÌÇÏ¿Í °°Àº Á¤º¸°¡ Ãâ·ÂµË´Ï´Ù.

reading corpus ... adding virtual node: 
¸í»ç,
°íÀ¯ ¸í»ç,
Áö¿ª,
ÀϹÝ,*,*,
µ¿ÀÏ,
Æ®¿ì´ÏÄ¡,
Æ®¿ì´ÏÄ¡
adding virtual node: 
ºÎ»ç,
Á¶»ç·ù Á¢¼Ó,*,*,*,*,
²Ï,
Ä«³ª¸®,
Ä«³ª¸®

Number of sentences: 32
Number of features:  47547
eta:                 0.00010
freq:                1
C(sigma^2):          1.00000

iter=0 err=1.00000 F=0.41186 target=1691.68869 diff=1.00000
iter=1 err=1.00000 F=0.68727 target=1077.14848 diff=0.36327
iter=2 err=0.87500 F=0.81904 target=621.20311 diff=0.42329
iter=3 err=0.81250 F=0.86354 target=384.72432 diff=0.38068
iter=4 err=0.68750 F=0.93685 target=233.72722 diff=0.39248
..

¹èÆ÷¿ë »çÀüÀÇ ÀÛ¼º

% cd $WORK/seed
% /usr/local/libexec/mecab/mecab-dict-gen -o ../final -m model


ÀÌÇÏ¿Í °°ÀÌ -d,  -o 
(À»)¸¦ »ç¿ëÇØ »çÀüÀ» ÁöÁ¤ÇÒ ¼öµµ ÀÖ½À´Ï´Ù
% /usr/local/libexec/mecab/mecab-dict-gen -o $WORK/final -d $WORK/seed -m $WORK/seed/model

¹èÆ÷¿ë »çÀüÀº, seed »çÀü°ú ´Ù¸¥ µð·ºÅ丮¿¡ Ãâ·ÂÇÏÁö ¾ÊÀ¸¸é ¾ÈµË´Ï´Ù. Åë»ó, ¹èÆ÷ »çÀü µð·ºÅ丮 final (À»)¸¦ ¾îÄ«À̺ê(archive) ÇØ À¯Àú¿¡°Ô ¹èÆ÷ÇÕ´Ï´Ù.

Çؼ®¿ë ¹ÙÀ̳ʸ® »çÀüÀÇ ÀÛ¼º

% cd $WORK/final
% /usr/local/libexec/mecab/mecab-dict-index 


ÀÌÇÏ¿Í °°ÀÌ -d,  -o 
(À»)¸¦ »ç¿ëÇÒ ¼öµµ ÀÖ½À´Ï´Ù. 
% /usr/local/libexec/mecab/mecab-dict-index -d $WORK/final -o $WORK/final

Áö±Ý ¸¸µç »çÀüÀ» »ç¿ëÇØ ½ÇÁ¦·Î Çؼ®ÇØ º¸°Ú½À´Ï´Ù.

% mecab -d $WORK/final

¼ÒÁÖ ÁÁ¾ÆÇÏ´Â ¾Æ¹öÁö. 

¼ÒÁÖ    
¸í»ç,
ÀϹÝ,*,*,*,*,
¼ÒÁÖ,
¼î¿ìÃò¿ì,
¼î Ãò-

ÁÁ¾Æ    
¸í»ç,
Çü¿ëµ¿»ç ¾î°£,*,*,*,*,
ÁÁ¾Æ, 
ºóÆ´, 
ºóÆ´

ÀÇ      
Á¶»ç,
¿¬Ã¼È­,*,*,*,*,
ÀÇ,
³ë,
³ë

¾Æ¹öÁö    
¸í»ç,
ÀϹÝ,*,*,*,*,
¾Æ¹öÁö, 
¾Æ¹öÁö, 
¾Æ¹öÁö
.       
±âÈ£,
±¸µÎÁ¡,*,*,*,*,.,.,. 
EOS

Æò°¡

Å×½ºÆ® µ¥ÀÌÅ͸¦ ÁغñÇÕ´Ï´Ù. Å×½ºÆ® µ¥ÀÌÅÍ´Â MeCab ÀÇ µðÆúÆ® Ãâ·Â°ú µ¿ÀÏ Æ÷¸ËÀ¸·Î ±â¼úÇÕ´Ï´Ù.

¿ì¼±, mecab-test-gen (À»)¸¦ »ç¿ëÇØ Å×½ºÆ®ÄÚÆĽº(test) (À¸)·ÎºÎÅÍ, ¹®À常(test.sen) (À»)¸¦ ÃßÃâÇÕ´Ï´Ù.

% /usr/local/libexec/mecab/mecab-test-gen < test > test.sen

test.sen (À»)¸¦ Á¶±Ý Àü ¸¸µç »çÀüÀ¸·Î Çؼ®ÇÕ´Ï´Ù.

% mecab -d $WORK/final test.sen > test.result

Æò°¡ ½ºÅ©¸³Æ® mecab-system-eval (À»)¸¦ ½ÇÇàÇÕ´Ï´Ù. Á¦ÀÏ Àμö°¡ ½Ã½ºÅÛÀÇ °á°ú, Á¦2 Àμö°¡ Á¤´äÀÇ ÆÄÀÏÀÔ´Ï´Ù.

% /usr/local/libexec/mecab/mecab-system-eval test.result test
                    precision          recall              F
LEVEL 0:    98.6887(647112/655710) 98.9793(647112/653785) 98.8338
LEVEL 1:    98.2163(644014/655710) 98.5055(644014/653785) 98.3607
LEVEL 2:    97.2230(637501/655710) 97.5093(637501/653785) 97.3659
LEVEL 4:    96.8367(634968/655710) 97.1218(634968/653785) 96.9791

-l ¿É¼Ç¿¡ ÀÇÇؼ­, ¾î´À Å»ýÀÇ ·¹º§À» »ç¿ëÇØ Æò°¡ÇÒ±î ÁöÁ¤ÇÒ ¼ö ÀÖ½À´Ï´Ù.


$Id: learn.html 65 2007-01-30 00:52:53Z taku-ku $;