`radicalgrep.cpp` is the main framework for the Radical Grep program. The auxilary functions and radical-set maps can be found in `radical_interface.h`.
### **kRSKangXi.h**
The Radical Count program uses the kRSKangxi property to distinguish all 214 radicals in the [Kangxi Radical System](https://en.wikipedia.org/wiki/Kangxi_radical). In `kRSKangxi.h`, we have a unicode set for each radical, where each set contains the codepoint ranges for the Chinese characters with the corresponding radical. We generated this header file by using [unihan-scripts](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/tree/delta-radicalgrep/unihan-scripts), based off of [Unihan_RadicalStrokeCounts.txt](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/blob/delta-radicalgrep/unihan-scripts/Unihan/Unihan_RadicalStrokeCounts.txt) of the Unihan database.
Radical Grep uses the kRSKangxi property to distinguish all 214 radicals in the [Kangxi Radical System](https://en.wikipedia.org/wiki/Kangxi_radical). In `kRSKangxi.h`, we have a unicode set for each radical, where each set contains the codepoint ranges for the Chinese characters with the corresponding radical. We generated this header file by using [unihan-scripts](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/tree/delta-radicalgrep/unihan-scripts), based off of [Unihan_RadicalStrokeCounts.txt](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/blob/delta-radicalgrep/unihan-scripts/Unihan/Unihan_RadicalStrokeCounts.txt) of the Unihan database.
This is a map that lists all 214 radicals and their corresponding Unicode set there were predefined from [kRSKangXi.h](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/blob/delta-radicalgrep/include/unicode/data/kRSKangXi.h). This is not used in the current iteration, but will be implemented later on.
This is a map that lists all 214 radicals and their corresponding Unicode set there were predefined from [kRSKangXi.h](https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/blob/delta-radicalgrep/include/unicode/data/kRSKangXi.h). Th
Instead of using a numeric key, the actual Kangxi radical is used and mapped to their corresponding values. Note that one unicode set may belong to different radicals (e.g. 水 and 氵both map to set 85).
*`get_uset()`:
This function maps the inputted radical to the corresponding UnicodesSet predefined in `radical_table`. If the program is in index mode (`-i`), the function looks for the requested radical in `_unicodeset_radical_table` and checks if the input is valid. In the case of an invalid input, an error message will appear and terminate the program.
This function maps the inputted radical to the corresponding UnicodesSet predefined in `radical_table`. If the program is in index mode (`-i`), the function looks for the requested radical in `_unicodeset_radical_table` and checks if the input is valid. In mixed mode (`-m`), the functions searches for the radical in both tables. In the case of an invalid input, an error message will appear and terminate the program.
Members of class `RadicalValuesEnumerator`:
*`parse_input()`:
This function parses the inputted radical expression (e.g. 氵_ or 氵_子_ ) and stores it in a vector `radical_list`. The variable `radical_num` Store the number of inputted radical(s).
This function parses the inputted radical expression (e.g. 氵_ or 氵_子_ ) and stores it in a vector `radical_list`. In alt mode (`-alt`), further parsing must be done. Radicals that are not bounded by parentheses are put in storage buffer vectors `zi` and `zi2`. The radicals in the parentheses are sent to `reParse()` for further processing.
*`reParse()`:
This function in alt mode, and tokenizes the radicals that are bounded by the parenthesis. When given a radical expression of `X_Y_{A/B}_`, `reParse()` tokenizes {A/B} with '\' as the delimiter. `A` and `B` are pushed into the vector `reTemp`.
*`createREs()`:
This function finds the inputted radical from `radical_list`, and searches for it by invoking `get_uset()`.
This function takes the radicals that have been parsed from `parse_input()` and `reParse()`, and returns a vector `REs`. `REs` is a regular expression that represents the inputted radical expression, and contains "alt" nodes of each radical character that were retrieved from `radical_list`, `reTemp`, `zi` and `zi2`.
## **radicalgrep.cpp**
...
...
@@ -59,26 +62,39 @@ This file is the main framework of Radical Grep. The LLVM input parser takes in
*`allfiles`: Stores the filepaths. When a file has finished being processed, it gets popped from the vector so that a new file can be looked at.
*`indexMode`: An optional command flag; indicates if radical indices are being used.
*`radicalREs`: Stores the return value of `generateREs()`.
*`color`: Command options for colourization.
*`matchFound`: Indicates if matching line(s) have been found in the files.
*`radicalREs`: Stores the return value of `generateREs()`.
### Command Line Flags
*`indexMode`: Indicates if radical indices are being used.
*`mixMode`: Indicates if radical indices and radical characters are being used in conjunction.
*`altMode`: Indicates if alternative character options are provided.
*`Color`: Command options for colourization.
*`LineNumberOption`: Prints out the line count of the file for each outputted line.
*`WithFilenameOption`: Prints out file name for each outputted line.
*`CLKCountingOption`: Prints out the runtime of the search.
### Functions
*`generateREs(std::string input_radical)`: This function parse the input and gets the REs by invoking `createREs()`.
*`generateREs(std::string input_radical)`: This function parse the input and returns the regular expression of `input_radical`.
*`setColoring()`: Defined in file `grep_engine.cpp`; it changes the terminal's text colour to red for the characters with the corresponding radicals.
*`initFileResult(allfiles)`: Defined in file `grep_engine.cpp`; this is a construction and initialization function. Like the name suggests, it initializes the input paths, path size and so on.
*`initREs(radicalREs) `: Defined in file `grep_engine.cpp`; this is also a construction and initialization function. It takes care of all Unicode related tasks of the REs provided by `generateREs()`.
*`initREs(radicalREs) `: Defined in file `grep_engine.cpp`; this is also a construction and initialization function. It takes care of all Unicode related tasks of the regular expression provided by `generateREs()`.
*`grepCodeGen()`: Defined in file `grep_engine.cpp`; this is the main function of Radical Grep. It is a code generation function; which returns the number of equivalent characters found.
*`grepCodeGen()`: Defined in file `grep_engine.cpp`; this is the main function of Radical Grep. It generates the grep pipeline.
*`searchAllFiles()`: Defined in file `grep_engine.cpp`; this function searches all the files for matches at the same time. If there results have been found, it returns true. Else, it returns false.
**Authored by Team Delta:** Anna Tang, Lexie Yu (Yu Ruonan), Pan Chuwen
**Authored by Team Delta:** Anna Tang, Lexie Yu (Yu Ruo Nan), Pan Chu Wen
//A functor used to invoke get_uset() in createREs()
staticUnicodeSetTableucd_radical;
constUCD::UnicodeSet&&UnicodeSetTable::get_uset(stringradical,boolindexMode,boolmixedMode)//Map the input radical to the corresponding UnicodeSet predefined in kRSKangXi.h
constUCD::UnicodeSet&&get_uset(stringradical,boolindexMode,boolmixedMode);//Map the input radical to the corresponding UnicodeSet predefined in kRSKangXi.h
//Map the input radical to the corresponding UnicodeSet predefined in kRSKangXi.h
staticcl::list<std::string>inputfiles(cl::Positional,cl::desc("<Input File>"),cl::OneOrMore,cl::cat(radicalgrepFlags));//search for multiple input files is supported
staticcl::opt<bool>indexMode("i",cl::desc("Use radical index instead of the radical character to perform search.\n Link to Radical Indices: https://www.yellowbridge.com/chinese/radicals.php"),cl::init(false),cl::cat(radicalgrepFlags));
//category for Radical Grep specific cmd line flags
staticcl::OptionCategoryradicalgrepFlags("Command Flags","Options for Radical Grep");
//Input; the radical expression & file(s) to search
//Radical Grep Input Flags; index mode, mixed mdde, and alt mode
staticcl::opt<bool>indexMode("i",cl::desc("Use radical index instead of the radical character to perform search.\n Link to Radical Indices: https://www.yellowbridge.com/chinese/radicals.php"),cl::init(false),cl::cat(radicalgrepFlags));
staticcl::opt<bool>mixMode("m",cl::desc("Use both radical character and radical index to perform search."),cl::init(false),cl::cat(radicalgrepFlags));
staticcl::opt<bool>altMode("alt",cl::desc("Use regular expressions to search for multiple phrases."),cl::init(false),cl::cat(radicalgrepFlags));
//Adpated from grep_interface.cpp
//Adpated from grep_interface.cpp; icgrep output flags - colourization, line number, file name, runtime
ColoringTypeColorFlag;
staticcl::opt<ColoringType,true>Color("c",cl::desc("Set the colorization of the output."),
staticcl::opt<ColoringType,true>Color("c",cl::desc("Set the colorization of the output."),//Turn on/off colourization
cl::values(clEnumValN(alwaysColor,"always","Turn on colorization when outputting to a file and terminal"),
clEnumValN(autoColor,"auto","Turn on colorization only when outputting to terminal"),
clEnumValN(neverColor,"never","Turn off output colorization")
staticcl::opt<bool,true>LineNumberOption("n",cl::location(LineNumberFlag),cl::desc("Show the line number with each matching line."),cl::cat(radicalgrepFlags));
staticcl::opt<bool,true>WithFilenameOption("h",cl::location(WithFilenameFlag),cl::desc("Show the file name with each matching line."),cl::cat(radicalgrepFlags));
staticcl::opt<bool,true>CLKCountingOption("clk",cl::location(CLKCountingFlag),cl::desc("Show the runtime of the function."),cl::cat(radicalgrepFlags));
std::vector<fs::path>allfiles;//Store all path of files
staticcl::opt<bool,true>LineNumberOption("n",cl::location(LineNumberFlag),cl::desc("Show the line number with each matching line."),cl::cat(radicalgrepFlags));
staticcl::opt<bool,true>WithFilenameOption("h",cl::location(WithFilenameFlag),cl::desc("Show the file name with each matching line."),cl::cat(radicalgrepFlags));
staticcl::opt<bool,true>CLKCountingOption("clk",cl::location(CLKCountingFlag),cl::desc("Show the runtime of the function."),cl::cat(radicalgrepFlags));
std::vector<fs::path>allfiles;//Stores all the inputted file paths
std::vector<re::RE*>generateREs(std::stringinput_radical,boolaltMode);//This function parse the input and get the results