Unix + Python scripting to generate 中文 stats
- 7/28/2023 — See proj-3 directory and examine chatlog.txt file to see bash tr one-liner. Also in OneNote.
- input.txt = 846 unique characters.
- output.txt = 733 unique characters, meaning we stripped out 113 Latin characters, puncutation, etc. using python script.
- 8/10/2023 — Reran 6c.py in proj-3 and added new undesired characters from output.txt to list of punctuation to be excluded.
- input.txt = 990 lines
- output.txt = 779 unique characters after deleting undesired Latin characters, puncutation, etc. using python script.
- 2/15/2025 — old scripts etc. are stored in
~/Dropbox/0-tues
. Might refer to old filenameszhongwen.md
andch-food.md
.