PASS is a Dutch data-to-text system for soccer reports. A description of the system can be found at https://www.aclweb.org/anthology/W17-3513 and an evaluation of the system's quality can be found at http://www.aclweb.org/anthology/C18-1082 . PASS was developed for Python 3 and consists of several modules:
- Topic collection module: collects topics from the match data and gives them a chronological order
- Lookup module: opens the template database and retrieves all the template categories and corresponding templates that could be used to describe an event
- Ruleset module: checks for each template category if the conditions to use said category have been matched
- Template selection module: selects a template from the possible templates in a weighted random fashion
- Governing module: walks through every topic in a stepwise order and interacts with all the other modules necessary to generate the text
- Text collection module: takes the text describing each event and combines them in a predetermined order
- Template filler module: empty slots in the templates are filled with the right kind of information
- Information variety module: ensures that certain types of information in the report will not be repeated
- Reference variety module: tries to spot the same referent in two subsequent sentences. If the module is able to find this, it will use a different form to address the referent in the second sentence
You can find the templates for the win/loss and neutral conditions in Databases. The collected soccer data can be found in the InfoXMLs and NewInfoXMLs folders (Dutch Leagues 2015-16 COMPLETE2 gives more information about the data in InfoXMLs).
Only one library is needed to run PASS: BeautifulSoup (https://www.crummy.com/software/BeautifulSoup/bs4/doc/).
To generate reports immediately, you can either do it via an IDE such as PyCharm, using PASS.py: change the path on line 47 to a data file of your desire, and indicate whether you want to save the result or not.
Using PASS via command line is possible as well: delete or comment out lines 40 and 47 from PASS.py and then use the following command:
python3 PASS.py main(link_to_data, save_state (y or n))
For participants in the ReproGen shared task, an extra folder (Evaluation) was created, containing a Microsoft Word document with the texts and questionnaire as it was distributed to participants for the study described in the 2017 paper. Do note that this document is fully in Dutch. The database containing the results of this study is available at: https://figshare.com/articles/dataset/PASS_evaluation_results/14866692 .