Using PowerGrep and ParaConc to Process English and Chinese Texts
Zhu Yubin
I..... Instructions
for the cleaning-up and tagging of English texts
II.... Instructions
for the cleaning-up and tagging of Chinese texts
1. Put the cursor in the search box of PowerGrep, press Enter key four times and there will be four Άs in a vertical line in the search box, while in the replacement box, put
</P>Ά
<P>
in it. (In this way, all the paragraph endings will be tagged with </P> and their following new paragraphs will be tagged with <P>.)
2. In the texts, if a line was separated by two Άs from the following new line, put two Άs in the search box, and then put one (Press the whitespace key) in the replacement box. (All the unnecessarily separated lines in one paragraph will be combined together.)
3. If all the sentences are separated by two whitespaces in the English texts, put (Press the whitespace key twice) in the search box, and then put </S><S> in the replacement box.
4. In the search box, put </P> while in the replacement box, put </S></P>.
5. In the search box, put <P> while in the replacement box, put <P><S>.
With the above five steps, all the sentences in the texts will begin with <S> and end with </S>, and all the paragraphs start with <P> and finish with </P>.
6. Check the beginning and ending part of the text and revise those mistaken tags if necessary.
1. Put
Ά
(Press Enter key once and whitespace four times.)
in the search box, and then put
</P>Ά
Ά
<P>
in the replacement box.
2. Put
^[\w|,]
in the search box, and put
</S><S>.
in the replacement box.
(In PowerGreps regular expressions, ^ stands for exclusion, while
| stands for option.)
3.1 Put
。^[]
in the search box, and put
。</S><S>
in the replacement box.
3.2 Put
!^[]
in the search box, and put
!</S><S>
in the replacement box.
3.3 Put
?^[]
in the search box, and put
?</S><S>
in the replacement box.
4. Change <P> with <P><S>.
5. Change </P> with </S></P>.
6. Use TextPro to combine separate lines in paragraphs.
Note: In the above instructions, stands for one space, that is, press whitespace key once. TextPro is provided by
With the above steps, all the sentences and paragraphs are delimited with tags. If we want to use ParaConc to align and search the parallel texts, the following steps are necessary.
1. Change </S><S> to
</S>Ά
Ά<S>
2. Change <P><S> to
<P>Ά
Ά<S>
3. Change </S></P> to
</S>Ά
Ά</P>
4. Human intervention is necessary for the correct alignment.