Radical Count is a program built based off of `ucount`. Given file(s) and the Kangxi radical expression, it counts the occurences of characters with the corresponding radical in the input file(s).
Radical Count is a program built based off of `ucount`. Given file(s) and the Kangxi radical expression, it counts the occurences of characters with the corresponding radical in the input file(s). If you are using the index mode, please refer to the "No." column of this [list](https://www.yellowbridge.com/chinese/radicals.php) for reference.
For more information on the Kangxi Radical System, please visit: https://en.wikipedia.org/wiki/Kangxi_radical or https://www.yellowbridge.com/chinese/radicals.php
## **Installation**
To build Radical Count, type`make radicalcount` into the terminal.
To build Radical Count, run`make radicalcount` in the `build` directory.
## **How to Run Radical Count**
Go into the bin diretory and run the following command to use Radical Count.
./radicalcount <Radical Expression> <Path(s) of Input File(s)>
./radicalcount [OPTIONS] <Radical Expression> <Path(s) of Input File(s)>
For sample testcases, please refer to [radicaltest.xml](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/blob/delta-radicalgrep/QA/radicaltest/radicaltest.xml).
## Example 1: Single Radical Input
...
...
@@ -22,7 +24,7 @@ To build Radical Count, type `make radicalcount` into the terminal.
## Example 2: Double Radical Input
Input: 氵_子_ ../QA/radicaltest/testfiles/test1
Input: 氵_子_ ../../QA/radicaltest/testfiles/test1
Output: 氵: 3 ../../QA/radicaltest/testfiles/test1
子: 4 ../../QA/radicaltest/testfiles/test1
...
...
@@ -45,7 +47,7 @@ To build Radical Count, type `make radicalcount` into the terminal.
As mentioned before, it is possible to have more than one radical with the same index. In this example, 火 and 灬 are both radical 86 in the Kangxi dictionary. Similar cases include 氵and 水, as well as 忄and 心. These just get counted together.
Radical Grep is a tool built off of icgrep. It searches for the given Chinese radicals, and returns the sentence(s) that correspond with the input. Note that radicals will be processed according to the Kangxi Radical-Stroke indices.
Radical Grep is a tool built off of icgrep. It searches for the given Chinese radicals, and returns the sentence(s) that correspond with the input. Note that radicals will be processed according to the Kangxi Radical-Stroke system. If you are using the index mode, please refer to the "No." column of this [list](https://www.yellowbridge.com/chinese/radicals.php) for reference.
## **Introduction**
The 214 Kangxi radicals are sorted in increasing order by stroke count. Originally introduced in 1615, many modern Chinese dictionaries still use the Kangxi system. In our program, we used the Unihan `kRSKangxi` property to generate Unicode sets for all 214 radicals. One important key to note is that some radicals may have the same index. For instance, characters 火 and 灬 are both the 86th radical of the dictionary and would map to the same Unicode set. Thus, 灯 and 点 would be characters in the same set.
...
...
@@ -10,15 +9,13 @@ For more information on the Kangxi Radical System, please visit: https://en.wiki
## **Installation**
To build radical grep, the working environment needs to have all requirements of the icgrep build met. This can be done with the `make` command on the terminal.
To build only Radical Grep and it's dependencies, run `make radicalgrep`.
To build only Radical Grep and it's dependencies, run `make radicalgrep` in the `build` directory.
## **How to Run Radical Grep**
To run Radical Grep, run the following commands in the bin directory:
To run Radical Grep, run the following commands in the build/bin directory:
./radicalgrep <Radical Expression> <Path of Input File>
./radicalgrep [OPTIONS] <Radical Expression> <Path(s) of Input File>
For sample testcases, please refer to [radicaltest.xml](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/blob/delta-radicalgrep/QA/radicaltest/radicaltest.xml).
...
...
@@ -43,7 +40,7 @@ In the first iteration, Radical Grep takes in pre-programmed inputs and returns
In the second iteration, Radical Grep can take Kangxi radical character(s) as input (e.g. 子_ or 氵_子_). It returns the sentence with the correspondings radicals marked in red text. Iteration 2 of Radical Grep can be run using the same input format as iteration 1.
Another program, `Radical Count` was implemented in this iteration. The program and relevant documentation can be found [here](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/blob/delta-radicalgrep/README-radicalcount.md).
Another program, `Radical Count` was implemented in this iteration. The documentation can be found [here](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/blob/delta-radicalgrep/README-radicalcount.md) and the program can found in parabix-devel/tools/wc/radical_count.
## Changelog
...
...
@@ -160,17 +157,16 @@ If the file to be searched is written in simplified Chinese and the radical expr
Output: 中國大陸的國標碼使用漢**語**拼音排列
部首檢字也有其局限性,**許**多漢字難以歸部
###### ** Output is printed in red on the terminal. **
###### ** Output is printed in red on the terminal. Colorization was turned on for this iteration.**
## **Iteration 3: Adding New Features & Final Iteration**
Plans for iteration 3 include:
1. ~~Implement switch between two search modes, users can choose any search mode; kangxi radical indices and actual kangxi radical.~~ **(Done)**
2. Add more functions/command line flags.
3. ~~Fix colorization issue.~~ **(Done)**
4. Implement a search mode which allows for both index and kangxi radical. (e.g. 水_143_)
5. Implement a search mode like this 水_{火/水}_.
6. Graphical interface (If time allows.)
2. ~~Fix colorization issue.~~ **(Done)**
3. Implement a search mode which allows for both index and kangxi radical. (e.g. 水_143_)
4. Implement a search mode like this 水_{火/水}_.
5. Graphical interface (If time allows.)
## Changelog
...
...
@@ -207,9 +203,9 @@ If the user enters uses Radical Grep in index mode and searchs for index 0 or an
## **References**
*[Unicode Standard Annex #38: Unihan](http://www.unicode.org/reports/tr38/)
* Unihan Database (Unihan.zip)
*[UCD-Scripts](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/tree/master/UCD-scripts) was used in [unihan-scripts](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/tree/delta-radicalgrep/unihan-scripts)
@@ -33,7 +33,7 @@ This is a map that lists all 214 radicals and their corresponding Unicode set th
Instead of using a numeric key, the actual Kangxi radical is used and mapped to their corresponding values. Note that one unicode set may belong to different radicals (e.g. 水 and 氵both map to set 85).
*`get_uset()`:
This function maps the inputted radical to the corresponding UnicodesSet predefined in [kRSKangXi.h](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/blob/delta-radicalgrep/include/unicode/data/kRSKangXi.h).
This function maps the inputted radical to the corresponding UnicodesSet predefined in `radical_table`. If the program is in index mode (`-i`), the function looks for the requested radical in `_unicodeset_radical_table` and checks if the input is valid. In the case of an invalid input, an error message will appear and terminate the program.
Members of class `RadicalValuesEnumerator`:
*`parse_input()`:
...
...
@@ -50,7 +50,7 @@ This file is the main framework of Radical Grep. The LLVM input parser takes in
1. Get the input and the file(s) to be searched.
2. Analyze the input to get the corresponding radical Unicode set(s).
3. Search each file.
4.Recolour the matching words in the sentence and output the result.
4.If colourization is on, recolour the matching words in the sentence and output the result. Else, just return the matching sentences as-is.
### Variables
*`input_radical`: The inputted radical expression.
...
...
@@ -59,6 +59,10 @@ This file is the main framework of Radical Grep. The LLVM input parser takes in
*`allfiles`: Stores the filepaths. When a file has finished being processed, it gets popped from the vector so that a new file can be looked at.
*`indexMode`: An optional command flag; indicates if radical indices are being used.
*`color`: Command options for colourization.
*`radicalREs`: Stores the return value of `generateREs()`.
### Functions
...
...
@@ -77,4 +81,4 @@ This file is the main framework of Radical Grep. The LLVM input parser takes in
**Authored by Team Delta:** Anna Tang, Lexie Yu (Yu Ruo Nan), Pan Chu Wen
@@ -46,7 +46,7 @@ This map lists all 214 Kangxi radical sets and their respective keys. The key of
Instead of using a numeric key, the actual Kangxi radical is used and mapped to their corresponding values. Note that one unicode set may belong to different radicals (e.g. 水 and 氵both map to set 85).
*`get_uset()`:
This function finds the requested unicode set for the inputted radical, from `radical_table`. In `radicalcount.cpp`, this is invoked with the object `ucd_radical`.
This function maps the inputted radical to the corresponding UnicodesSet predefined in `radical_table`. If the program is in index mode (`-i`), the function looks for the requested radical in `_unicodeset_radical_table` and checks if the input is valid. In the case of an invalid input, an error message will appear and terminate the program. In `radicalcount.cpp`, this function is invoked with the object `ucd_radical`.
`parse_input()`:
This function parses the inputted radical expression (e.g. 氵_ or 氵_子_ ) and stores it in a variable of type `input_radical`, which is predefined to represent a pair data structure. In `radicalcount.cpp`, `ci` is used to hold the parsed expression.
...
...
@@ -57,5 +57,5 @@ The Radical Count program uses the kRSKangxi property to distinguish all 214 rad
**Authored by Team Delta:** Anna Tang, Lexie Yu (Yu Ruo Nan), Pan Chu Wen
staticcl::list<std::string>inputfiles(cl::Positional,cl::desc("<Input File>"),cl::OneOrMore,cl::cat(radicalgrepFlags));//search for multiple input files is supported
staticcl::opt<bool>indexMode("i",cl::desc("Use radical index instead of the radical character to perform search."),cl::init(false),cl::cat(radicalgrepFlags));
//Adpated from grep_interface.cpp
//static cl::OptionCategory colorization("Colorization Options", "Turn on or turn off colorization for output.");
ColoringTypeColorFlag;
//options for colourization; (e.g. -c auto)
staticcl::opt<ColoringType,true>Color("c",cl::desc("Set the colorization of the output."),
...
...
@@ -83,7 +81,6 @@ int main(int argc, char* argv[])