This module contains methods to preprocess and prepare data, collected from JetBrains Academy/Hyperskill educational platform, for further analysis. More information about the platform and the use of Hyperstyle can be found here.
Execute one of the following commands with necessary arguments:
poetry run preprocess_submissions [arguments]or
docker run hyperstyle-analysis-prod:<VERSION> poetry run preprocess_submissions [arguments]Required arguments:
| Argument | Description |
|---|---|
| submissions_path | Path to .csv file with submissions. The file must contain the following columns: id, step (or step_id), code, user_id, time. The following columns are optional: client. |
| preprocessed_submissions_path | Path to .csv output file with preprocessed submissions with issues. If not provided the output will be printed into console. |
Optional arguments:
| Argument | Description |
|---|---|
| --users-to-submissions-path | Path to file with user to submission relation (if data is not presented in submissions dataset or was anonymized). |
| --diff-ratio | Ration to remove submissions which has lines change more then in diff-ratio times. Default is 10.0. |
| --max-attempts | Remove submissions series with more then max-attempts attempts. Default is 5. |
Output csv file will be saved to preprocessed_submissions_path and will contain all data from csv in submissions_path and several additional columns:
group- the number of group in the all students submissions sequences, starts from 0;attempt- the number of attempt in the current group, starts from 0;total_attempts- the number of total attempts in the current group.
An example of preprocessed_submissions_path can be found in the tests:
| id | step_id | code | group | attempt | total_attempts |
|---|---|---|---|---|---|
| 1 | 1 | e = 2.718281828459045 # put your python code here print('%.5f' % e) | 0 | 1 | 3 |
| 2 | 1 | e = 2.718281828459045 # put your python code here print('%.5f' % e) | 0 | 2 | 3 |
| ... | ... | ... | ... | ... | ... |