Zhu Xiaomin
Step One: Download Gate 5.1 from https://sourceforge.net/projects/gate/files/gate/5.1/gate-5.1-build3431-installer-win.exe/download
Step Two:
Install the program
Step Three: Go to C:/Program files/GATE-5.1/plugins/Lang_Chinese/resources/models/model-paum-pku-utf8.zip and unzip this file to a location that you can remember
Step Four:
1. Start GATE
2. File,
Open Manage Creole Plugins
3. Find Lang_Chinese
and click in the box under “Load Now”, OK
4. Processing
Resources (Right Click), type in Chinese
Segmenter, OK
5. Language Resources (Right Click), New, GATE Corpus, Name the corpus(for example, Chinesecorpus), OK
6. Right Click on Chinesecorpus, Populate, Browse to the folder that contains the corpus and add that path to the Directory URL, Click on the pencil symbol, Type txt, Add, OK, Encoding Type utf-8, OK.
7. Right Click Applications, select Corpus Pipeline, OK
8. Double Click Corpus Pipeline, Double Click Chinese Segmenter in Loaded Processing Resources, then Chinese Segmenter moves into Selected Processing Resources. Make sure the following is correct:
learningAlg = PAUM
learningMode = SEGMENTING
modelURL = model-paum-pku-utf8 (the place where you unzip the file in Step Three)
textCode = utf8
textFilesURL = (browse to the corpus folder)
9. Click on Run this Application. This can take some (approximately 5 minutes for 40 texts) time depending on the size of the corpus.
(Provided by Zhu Xiaomin on June 11, 2010)
Gait was developed by Cunningham, Hamish et al [The University of Sheffield (http://gate.ac.uk/)]. (2001-2010).
Step One:
Install Java(JRE) on your computer
You can
download Java from http://sdlc-esd.sun.com/ESD6/JSCDL/jre/6u18-b79/jxpiinstall.exe?AuthParam=1269156422_b6361febd3fd5bf0c616837bde692629&GroupName=JSC&FilePath=/ESD6/JSCDL/jre/6u18-b79/jxpiinstall.exe&File=jxpiinstall.exe&BHost=javadl.sun.com
Check your Java version:
1. Click Start
2. Type cmd and press enter
3. This will open the command prompt window
4. Type java –version and press enter
5. You will
get a message: java version “
Step Two
Download
Standford Postagger from http://nlp.stanford.edu/software/stanford-postagger-full-2010-05-26.tgz
Step
Three
Unzip the file to places you are comfortable with using an archive manager software, such as WinRAR, 7-Zip, or WinZip.
You might
want to change the name of this unzipped folder to stanTagger. I do this
because the original name is too long: stanford-postagger-full-
Step Four
In stanTagger folder create two folders to hold your files, e.g myCorpus and myTaggedCorpus, Now put some text files (or your corpus) in myCorpus. Make sure there are no spaces in your file names. For example, writtenArgument.txt instead of written Argument.txt
Step Five
1. Start your command window as described in Step One
2. Go to the folder that contains the Stanford Tagger:
This is how you do it:
cd places
where you unzip the Stanford Postagger\stanTagger
3. Run the program using your command prompt window:
For tagging one segmented Chinese text:
java -mx
For tagging more than one segmented Chinese texts:
FOR %a IN (Place
where Stanford Postagger is unzipped\stanTagger\myCorpus\*.txt) DO java -mx
4. After
typing the script above press enter
(June 11, 2010)