The list_result_files() function fails with an unhelpful error message when it encounters directories or files that don't follow the expected ethoscope naming convention (e.g., backup directories with .backup suffix).
This issue can be difficult to debug because:
- The error message doesn't identify which file/directory is problematic
- Manual testing with properly formatted data works fine, masking the issue
- Large datasets make manual inspection impractical (we have 21000 folders as of now...)
Error Message
Error: 1 parsing failure
Traceback:
1. build_query(result_dir, query, index_file)
2. list_result_files(result_dir, index_file)
3. parse_datetime(files_info$datetime)
4. parse_time(match[, 2], format = "%H-%M-%S")
5. readr::stop_for_problems(out)
Root Cause
The function uses list.files(result_dir, recursive=T, pattern="*\\.db$") which can pick up files in directories with non-standard names. When it encounters a directory named like 2025-07-02_15-32-33.backup, it tries to parse 15-32-33.backup as a time string, causing readr::parse_time() to fail.
Steps to Reproduce
- Create a directory structure with a malformed directory name:
/some_machine_id/ETHOSCOPE_XXX/2025-07-02_15-32-33.backup/file.db
- Run
list_result_files() on the parent directory
- Observe the cryptic parsing failure error
Current Workaround
Manually identify and remove/rename malformed directories:
find /path/to/data -name "*backup*"
rm -rf /path/to/malformed/directory
Proposed Solution
Enhance the list_result_files() function to:
-
Validate datetime format before parsing:
# Add validation step
datetime_pattern <- "^\\d{4}-\\d{2}-\\d{2}_\\d{2}-\\d{2}-\\d{2}$"
valid_datetime <- grepl(datetime_pattern, files_info$datetime)
if(!all(valid_datetime)) {
invalid_files <- all_db_files[!valid_datetime]
warning("Found files with invalid datetime format:")
for(file in head(invalid_files, 5)) {
warning(" ", file)
}
# Optionally filter out invalid files or stop with informative error
}
-
Provide more informative error messages:
parse_datetime <- function(x){
match <- stringr::str_split(x, "_", simplify=TRUE)
tryCatch({
d <- parse_date(match[,1])
t <- parse_time(match[,2], format="%H-%M-%S")
data.table::data.table(date=d, time = t)
}, error = function(e) {
# Identify problematic entries
problems <- readr::problems(readr::parse_time(match[,2], format="%H-%M-%S"))
if(nrow(problems) > 0) {
stop(sprintf("Failed to parse datetime from file paths. Problematic entries:\n%s\nCheck for malformed directory names or backup files.",
paste(x[problems$row[1:min(5, nrow(problems))]], collapse="\n")))
}
stop(e)
})
}
Environment
- R version: 4.5.1 (2025-06-13)
- scopr version: 0.3.3
- readr version: 2.1.5
- OS: [your OS]
The
list_result_files()function fails with an unhelpful error message when it encounters directories or files that don't follow the expected ethoscope naming convention (e.g., backup directories with.backupsuffix).This issue can be difficult to debug because:
Error Message
Root Cause
The function uses
list.files(result_dir, recursive=T, pattern="*\\.db$")which can pick up files in directories with non-standard names. When it encounters a directory named like2025-07-02_15-32-33.backup, it tries to parse15-32-33.backupas a time string, causingreadr::parse_time()to fail.Steps to Reproduce
list_result_files()on the parent directoryCurrent Workaround
Manually identify and remove/rename malformed directories:
find /path/to/data -name "*backup*" rm -rf /path/to/malformed/directoryProposed Solution
Enhance the
list_result_files()function to:Validate datetime format before parsing:
Provide more informative error messages:
Environment