$Id: dic-detail.html 65 2007-01-30 00:52:53Z taku-ku $;
´Ü¾î »çÀüÀÇ ±¸Á¶¸¦ ÀÌÇØÇÏ´Â °ÍÀ¸·Î, MeCab (À»)¸¦ ¹ü¿ëÀûÀÎ ÅؽºÆ® º¯È¯ Åø·Î¼ ÀÌ¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù. ¿¹¸¦ µé¸é, È÷¶ó°¡³ª to īŸī³ª º¯È¯, ·Î¸¶ÀÚ to È÷¶ó°¡³ª º¯È¯, Auto Link µîÀ» MeCab ¸¸À¸·Î ½ÇÇàÇÒ ¼ö ÀÖ½À´Ï´Ù
´Ü¾î »çÀüÀ» ±¸ÃàÇÏ·Á¸é, ÃÖÀú ÀÌÇÏÀÇ ÆÄÀÏÀ» ÀÛ¼ºÇÒ ÇÊ¿ä°¡ ÀÖ½À´Ï´Ù.
´Ü¾î »çÀüÀÔ´Ï´Ù
¿£Æ®¸®´Â, ÀÌÇÏ¿Í °°Àº CSV ±×¸®°í Ãß°¡ÇÕ´Ï´Ù.
test,1223,1223,6058,foo,bar,baz
ÃÖÃÊÀÇ4 °³´Â Çʼö ¿£Æ®¸®·Î, °¢°¢
µÇ°í ÀÖ½À´Ï´Ù. ÄÚ½ºÆ®Ä¡´Â short int (16bit Á¤¼ö) ÀÇ ¹üÀ§¿¡ °ÅµÑ ÇÊ¿ä°¡ ÀÖ½À´Ï´Ù.
5 Ä÷³´« ÀÌÈÄ´Â, À¯Àú Á¤ÀÇÀÇ CSV ÇʵåÀÔ´Ï´Ù. ±âº»ÀûÀ¸·Î ¾î¶² ³»¿ëÀÌ¶óµµ CSV (ÀÌ)°¡ Çã¶ôÇÏ´Â ÇÑ Ãß°¡ÇÒ ¼ö ÀÖ½À´Ï´Ù.
ÃÖÃÊÀÇ Çà¿¡ ¿¬Á¢Ç¥ÀÇ »çÀÌÁî( Àü°Ç »çÀÌÁî, ÈÄ°Ç»çÀÌÁî) (À»)¸¦ ¾¹´Ï´Ù. ±× ´ÙÀ½Àº, ¿¬Á¢Ç¥ÀÇ Àü°ÇÀÇ ¹®¸Æ ID, ÈÄ°ÇÀÇ ¹®¸ÆID (¿Í)°ú, °Å±â¿¡ ´ëÀÀÇÏ´Â ÄÚ½ºÆ®¸¦ ¾¹´Ï´Ù.
¾î´À ´Ü¾î A, B ÇÏÁö¸¸ ¿¬Á¢À» ÀÌ·ê ¶§,
µË´Ï´Ù. Áï, ´Ü¾î »çÀü¿¡ µî·ÏÇß´Ù ID ÇÏÁö¸¸ ¿¬Á¢Ç¥¸¦ ÂüÁ¶ÇÒ ¶§ÀÇ Å°°¡ µË´Ï´Ù. ÄÚ½ºÆ®Ä¡´Â short int (16bit Á¤¼ö) ÀÇ ¹üÀ§¿¡ °ÅµÑ ÇÊ¿ä°¡ ÀÖ½À´Ï´Ù.
100 120 0 0 1 0 1 10 0 2 5
»ó±âÀÇ ¿¹¿¡¼´Â, Àü°ÇÀÇ ¹®¸ÆÀÇ »çÀÌÁî°¡100, ÈÄ°ÇÀÇ ¹®¸ÆÀÇ »çÀÌÁî°¡ 120 µÇ¾î ÀÖ½À´Ï´Ù. ¶Ç, Àü°Ç ¹®¸Æ 0 (À¸)·ÎºÎÅÍ Èİǹ®¸Æ 1 ¿¡ÀÇ ÃµÀÌ ÄÚ½ºÆ®°¡ 10 µÇ°í ÀÖ½À´Ï´Ù.
¹ÌÁö¾î ó¸®ÀÇ ·êÀÔ´Ï´Ù. ÀÌÂÊ (À»)¸¦ ºÁ ÁÖ¼¼¿ä.
ÀÌÇÏ°¡ ÃÖÀúÇÑÀÇ ¼³Á¤ (DEFAULT (¿Í)°ú SPACE) ÀÔ´Ï´Ù
DEFAULT 1 0 0 SPACE 0 1 0 0x0020 SPACE
¹ÌÁö¾î¿¡ ´ëÇÑ Ç°»ç¿ÀÇ Å×À̺íÀÔ´Ï´Ù. ÀÌÂÊ (À»)¸¦ ºÁ ÁÖ¼¼¿ä.
ÀÌÇÏ°¡ ÃÖÀúÇÑÀÇ ¼³Á¤ (DEFAULT (¿Í)°ú SPACE) ÀÔ´Ï´Ù
DEFAULT,0,0,0,* SPACE,0,0,0,*
´ÙÀ½ÀÇ Ä¿¸àµå¸¦ ½ÇÇàÇÏ´Â °ÍÀ¸·Î, Çؼ®¿ëÀÇ ¹ÙÀ̳ʸ® »çÀüÀ» ÀÛ¼ºÇÕ´Ï´Ù.
% /usr/local/libexec/mecab/mecab-dict-index
example µð·ºÅ丮¿¡ ¸î°³ÀÇ ÀÀ¿ë¿¹°¡ ÀÖ½À´Ï´Ù.
Hatena Keyword (¿Í)°ú °°Àº Auto Link (À»)¸¦ ½ÇÀåÇØ º¸°Ú½À´Ï´Ù
´Ü¾î·Î¼ Å°¿öµå, Ç°»ç·Î¼ Å°¿öµå¿¡ ´ëÀÀÇÑ´Ù URL (À»)¸¦ ¾¹´Ï´Ù. ¿¬Á¢ »óÅ´Â1 »óÅ·ΠÃæºÐÇؼ, ¿ÞÂÊ ¹®¸Æ/ ¿ì¹®¸ÆID ÇÔ²² 0 (À¸)·Î ÇÕ´Ï´Ù. ÄÚ½ºÆ®Ä¡´Â ±ä Å°¿öµå°¡ ¿ì¼±µÇµµ·Ï(µíÀÌ) ¼³Á¤ÇÕ´Ï´Ù. ¿¹¸¦ µé¸é ÀÌÇÏ¿Í °°Àº ÇÔ¼ö¸¦ »ç¿ëÇϸé ÁÁÀ» °ÍÀÔ´Ï´Ù.
cost = (int)max(-36000, 400 * (length^1.5))url.csv
Google,0,0,-5878,http://www.google.com/ Yahoo,0,0,-4472,http://www.yahoo.com/ ChaSen,0,0,-5878,http://chasen.org/ ÄìÅä,0,0,-3200,http://www.city.kyoto.jp/ ...
1 »óÅÂÀ̹ǷÎ, ¿¬Á¢Ç¥ÀÇ »çÀÌÁî´Â 1 x 1 µË´Ï´Ù. ÈÄ°Ç 0 (À¸)·ÎºÎÅÍ Àü°Ç 0 ÀÇ ¿¬Á¢ ÄÚ½ºÆ®´Â 0 (À¸)·Î ÇÕ´Ï´Ù.
1 1 0 0 0
ÃÖÀúÇÑÀÇ ¼³Á¤ (DEFAULT (¿Í)°ú SPACE) ÀÔ´Ï´Ù
DEFAULT 1 0 0 SPACE 0 1 0 0x0020 SPACE
ÃÖÀúÇÑÀÇ ¼³Á¤ (DEFAULT (¿Í)°ú SPACE) ÀÔ´Ï´Ù
DEFAULT,0,0,0,* SPACE,0,0,0,*
autolink ±×·¸´Ù°í ÇÏ´Â Æ÷¸ËÀ» ÀÛ¼ºÇØ, ±×°ÍÀÌ µðÆúÆ®ÀÇ Ãâ·ÂÀÌ µÇµµ·Ï(µíÀÌ) ÇÕ´Ï´Ù
cost-factor = 800 bos-feature = BOS/EOS output-format-type=autolink node-format-autolink = <a href="%H">%M</a> unk-format-autolink = %M eos-format-autolink = n
% /usr/local/libexec/mecab/mecab-dict-index -f euc-jp -c euc-jp reading ./unk.def .. 2 emitting double-array: 100% |###########################################| reading ./dic.csv .. 4 emitting double-array: 100% |###########################################| emitting matrix : 100% |########################################### done! % mecab -d . ÄìÅä¿¡ °¬´Ù. <a href="http://www.city.kyoto.jp/"> ÄìÅä</a> ¿¡ °¬´Ù. Yahoo (¿Í)°úGoogle <a href="http://www.yahoo.com/">Yahoo</a> (¿Í)°ú<a href="http://www.google.com/">Google</a>
´Ü¾î·Î¼ È÷¶ó°¡³ª1 ¹®ÀÚ, Ç°»ç·Î¼ °¢ È÷¶ó°¡³ª ´ëÀÀÇϴ īŸī³ª1 ¹®ÀÚ¸¦ ¾¹´Ï´Ù. ¿¬Á¢ »óÅ´Â1 »óÅ·ΠÃæºÐÇؼ, ¿ÞÂÊ ¹®¸Æ/ ¿ì¹®¸ÆID ÇÔ²² 0 (À¸)·Î ÇÕ´Ï´Ù. ¾Ö¸Å¼ºÀÌ ¾ø±â ¶§¹®¿¡ ÄÚ½ºÆ®Ä¡´Â 0 (À¸)·Î ÇÕ´Ï´Ù
¡È,0,0,0, ºê ¾Æ,0,0,0, ¾Æ ÀÖ¾î,0,0,0, ÀÌ ,0,0,0, ¿ì ³×,0,0,0, ¿¡ ,0,0,0, ¿À ,0,0,0, ,0,0,0, ,0,0,0, ,0,0,0, ,0,0,0, Àΰ¡,0,0,0, Ä« ³ª¹«,0,0,0, Å° ±¸,0,0,0, Äí ,0,0,0, ÄÉ ¿Í,0,0,0, ÄÚ ÇÏÁö¸¸,0,0,0, °¡ ,0,0,0, ±â ,0,0,0, ±× ,0,0,0, °Ô ,0,0,0, °í ,0,0,0, »ç ÇØ,0,0,0, ½Ã ,0,0,0, ½º Å°,0,0,0, ¼¼ ,0,0,0, ¼Ò ÀÚ¸®,0,0,0, ´õ ±ÛÀÚ,0,0,0, Áö µÎ,0,0,0, Áî ,0,0,0, Á¦ ,0,0,0, Á¶ ,0,0,0, Ÿ ,0,0,0, Ä¡ °³,0,0,0, Æ® (ÀÌ)¶ó°í,0,0,0, Å× (¿Í)°ú,0,0,0, Æ® (ÀÌ)´Ù,0,0,0, ´Ù ,0,0,0, Áö ,0,0,0, Áî ±×¸®°í,0,0,0, µ¥ ,0,0,0, µå ,0,0,0, ,0,0,0, ³ª ¿¡,0,0,0, ´Ï ,0,0,0, ´© ±×·±µ¥,0,0,0, ³× ÀÇ,0,0,0, ³ë ÇÏ,0,0,0, ÇÏ È÷,0,0,0, È÷ ,0,0,0, ÈÄ ¿ì¿Í,0,0,0, Çì ,0,0,0, È£ ,0,0,0, ¹Ù ,0,0,0, ºñ ,0,0,0, ºê ,0,0,0, º£ ,0,0,0, º¸ ,0,0,0, ÆÄ ,0,0,0, ºñ ,0,0,0, ÇÁ ,0,0,0, Æä ,0,0,0, Æ÷ ,0,0,0, ¸¶ ºÁ,0,0,0, ¹Ì ,0,0,0, ¹« °,0,0,0, ¸Þ µµ,0,0,0, ¸ð ,0,0,0, (ÀÌ)³ª,0,0,0, ¾ß ,0,0,0, ,0,0,0, À¯ ,0,0,0, ,0,0,0, ¿ä µé,0,0,0, ¶ó ,0,0,0, ¸® ,0,0,0, ¸£ ,0,0,0, ·¹ ,0,0,0, ·Î ,0,0,0, ¿ø,0,0,0, ¿Í ,0,0,0, ,0,0,0, (À»)¸¦,0,0,0, ÀÀ,0,0,0,
1 »óÅÂÀ̹ǷÎ, ¿¬Á¢Ç¥ÀÇ »çÀÌÁî´Â 1 x 1 µË´Ï´Ù. ÈÄ°Ç 0 (À¸)·ÎºÎÅÍ Àü°Ç 0 ÀÇ ¿¬Á¢ ÄÚ½ºÆ®´Â 0 (À¸)·Î ÇÕ´Ï´Ù.
1 1 0 0 0
ÃÖÀúÇÑÀÇ ¼³Á¤ (DEFAULT (¿Í)°ú SPACE) ÀÔ´Ï´Ù
DEFAULT 1 0 0 SPACE 0 1 0 0x0020 SPACE
ÃÖÀúÇÑÀÇ ¼³Á¤ (DEFAULT (¿Í)°ú SPACE) ÀÔ´Ï´Ù
DEFAULT,0,0,0,* SPACE,0,0,0,*
katakana ±×·¸´Ù°í ÇÏ´Â Æ÷¸ËÀ» ÀÛ¼ºÇØ, ±×°ÍÀÌ µðÆúÆ®ÀÇ Ãâ·ÂÀÌ µÇ´Â µí (À¸)·Î ÇÕ´Ï´Ù
dictionary-charset = euc-jp cost-factor = 800 bos-feature = BOS/EOS output-format-type=katakana node-format-katakana = %H unk-format-katakana = %M eos-format-katakana = n
% /usr/local/libexec/mecab/mecab-dict-index -f euc-jp -c euc-jp reading ./unk.def .. 2 emitting double-array: 100% |###########################################| reading ./dic.csv .. 4 emitting double-array: 100% |###########################################| emitting matrix : 100% |########################################### done! % mecab -d . °³¿ÍÀÔ´Ï´Ù ÄÚ·¹ÇÏÅ×½ºÆ®µ¥½º
$Id: dic-detail.html 65 2007-01-30 00:52:53Z taku-ku $;