On the Idiosyncrasies of the Mandarin Chinese Classifier System

Code and data for On the Idiosyncrasies of the Mandarin Chinese Classifier System (NAACL 2019).

Code Dependencies

The code needs to be run in Python3. Packages that need to be installed are pickle, tqdm and nltk.

Data Format

For the noun_classifier and noun_adj_classifier files, the data are pickled as Python Counter objects with keys being (noun, classifier) pairs and (noun, adjective, classifier) tuples, and values being the number of times each key appears in our corpus. The data is in traditional Mandarin Chinese. The supersense files are pickled as Python dictionaries with keys being noun/adjective supersenses and values being counters (in which counter keys being (noun, adjective, classifier) tuples under that specific sense and counter values being the number of times the tuples appear in our corpus).

Running Experiments

To compute the mutual information between classifiers and nouns, run

python compute_mutual_information.py -ICN

To compute the mutual information between classifiers and adjectives, run

python compute_mutual_information.py -ICA

To compute the mutual information between classifiers and noun supersenses, run

python compute_mutual_information.py -ICNS

To compute the mutual information between classifiers and adjective supersenses, run

python compute_mutual_information.py -ICAS

To compute the mutual information between classifiers and noun synsets, run

python compute_mutual_information.py -ICS

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
adj_supersenses.pkl		adj_supersenses.pkl
compute_mutual_information.py		compute_mutual_information.py
noun_adj_classifier.pkl		noun_adj_classifier.pkl
noun_classifier.pkl		noun_classifier.pkl
noun_supersenses.pkl		noun_supersenses.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On the Idiosyncrasies of the Mandarin Chinese Classifier System

Code Dependencies

Data Format

Running Experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

On the Idiosyncrasies of the Mandarin Chinese Classifier System

Code Dependencies

Data Format

Running Experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages