Skip to content

Commit f8720f8

Browse files
committed
Bump version to 0.1.21, add Copilot instructions, and update parquet test for key metadata handling
1 parent b0b3607 commit f8720f8

3 files changed

Lines changed: 95 additions & 14 deletions

File tree

.github/copilot-instructions.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# CKutils Copilot Instructions
2+
3+
## Project Overview
4+
5+
CKutils is an R package providing high-performance utility functions for simulation modelling, with C++ backends via Rcpp. Key domains: data.table operations, statistical distributions (GAMLSS-compatible), and package management utilities.
6+
7+
## Architecture
8+
9+
- **R files** ([R/](R/)): Public API with roxygen2 documentation
10+
- **C++ files** ([src/](src/)): Performance-critical implementations using Rcpp
11+
- **Tests** ([inst/tinytest/](inst/tinytest/)): tinytest framework, validated against gamlss.dist
12+
13+
### Key Modules
14+
| File | Purpose |
15+
|------|---------|
16+
| `R/rng_distr.R` | `fr*` random generation functions (BCPEo, BCT, NBI, SICHEL, etc.) |
17+
| `R/lookup_dt.R` | Fast key-based table lookups with `lookup_dt()`, `absorb_dt()` |
18+
| `R/dt_ops.R` | data.table utilities: `clone_dt()`, `del_dt_rows()` |
19+
| `src/distr_*.cpp` | Vectorised distribution functions (`fd*`, `fp*`, `fq*` for PDF/CDF/quantile) |
20+
| `src/lookup_dt.cpp` | C++ backend for `lookup_dt()` using data.table API |
21+
22+
## Development Workflow
23+
24+
### Pre-push validation (REQUIRED)
25+
```bash
26+
./check-before-push.sh
27+
```
28+
This runs: roxygen2 → R CMD build → R CMD INSTALL → R CMD check --as-cran → tinytest suite
29+
30+
### Quick iteration during development
31+
```r
32+
# Regenerate documentation and install
33+
roxygen2::roxygenise()
34+
# Run tests
35+
tinytest::test_package("CKutils")
36+
```
37+
38+
### Build requirements
39+
- C++17 (set in `src/Makevars`)
40+
- Dependencies: `data.table >= 1.18.0`, `Rcpp`, `dqrng`, `arrow >= 22.0.0`
41+
- Tests require: `gamlss.dist` for distribution validation
42+
43+
## Coding Conventions
44+
45+
### R Functions
46+
- Use **roxygen2** for all exports with `@export` tag
47+
- Document all parameters with types and defaults in `@param`
48+
- Include runnable `@examples` for every exported function
49+
- Prefix fast C++ wrappers with `f` (e.g., `fdBCPEo` vs gamlss.dist's `dBCPEo`)
50+
51+
### Distribution Functions Pattern
52+
Each distribution (e.g., BCPEo) follows naming convention:
53+
- `fd*` - density (PDF), accepts `log_` parameter
54+
- `fp*` - probability (CDF), accepts `lower_tail` parameter
55+
- `fq*` - quantile (inverse CDF)
56+
- `fr*` - random generation, uses `dqrng::dqrunif` for high-quality RNG
57+
58+
### C++ Code
59+
- Use `recycling_helpers.h` for parameter recycling (all vectors same length)
60+
- Include SIMD hints where applicable: `#pragma GCC ivdep`
61+
- Validate inputs early with `Rcpp::stop()` for clear error messages
62+
- For data.table integration, include `<datatableAPI.h>`
63+
64+
### data.table Operations
65+
- Functions modify tables **in place** by default (data.table semantics)
66+
- Use `copy()` explicitly when non-destructive behaviour is needed
67+
- Factor columns in `lookup_dt()` must have consecutive integer levels starting from 1
68+
69+
## Testing Pattern
70+
71+
Tests in `inst/tinytest/test-*.R` follow this structure:
72+
```r
73+
# Skip if optional dependency unavailable
74+
if (!requireNamespace("gamlss.dist", quietly = TRUE)) {
75+
exit_file("gamlss.dist not available")
76+
}
77+
78+
# Generate test data with seed for reproducibility
79+
data <- generate_test_data(n = 100, seed = 123)
80+
81+
# Compare against reference implementation
82+
expect_equal(fdBCPEo(...), gamlss.dist::dBCPEo(...), info = "PDF matches reference")
83+
```
84+
85+
## Common Gotchas
86+
87+
1. **Lookup tables**: Keys must be factors OR consecutive 1-based integers; use `is_valid_lookup_tbl()` to validate
88+
2. **Random number generation**: Always use `dqrng::dqrunif` instead of base R's `runif` for reproducibility
89+
3. **Windows compatibility**: C++ code must handle in-place modifications carefully; prefer `clone()` in Rcpp
90+
4. **R CMD check**: Avoid non-portable compiler flags; `check-before-push.sh` sets safe defaults

DESCRIPTION

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Package: CKutils
22
Type: Package
33
Title: Some Utility Functions I Use Regularly
4-
Version: 0.1.20
5-
Date: 2026-01-19
4+
Version: 0.1.21
5+
Date: 2026-01-20
66
Authors@R: c(
77
person("Chris", "Kypridemos", email = "christodoulosk@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-0746-9229")),
88
person("Max", "Birkett", email = "pp0u8134@liverpool.ac.uk", role = "ctb", comment = c(ORCID = "0000-0002-6076-6820")),

inst/tinytest/test-misc_functions.R

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1528,21 +1528,12 @@ if (requireNamespace("arrow", quietly = TRUE)) {
15281528

15291529
# Test 19: Create parquet with metadata and verify keys are restored
15301530
temp_dir <- tempdir()
1531+
15311532
temp_parquet_with_keys <- file.path(temp_dir, "test_with_keys.parquet")
15321533
test_dt_keys <- data.table(id = 1:10, name = letters[1:10], value = rnorm(10))
15331534
setkey(test_dt_keys, id)
1534-
# Write with key metadata
1535-
arrow::write_parquet(
1536-
test_dt_keys,
1537-
temp_parquet_with_keys,
1538-
properties = arrow::ParquetWriterProperties$create(
1539-
arrow::schema(test_dt_keys),
1540-
compression = "snappy"
1541-
),
1542-
arrow_properties = arrow::ParquetArrowWriterProperties$create(
1543-
store_schema = TRUE
1544-
)
1545-
)
1535+
# Write initial parquet file
1536+
arrow::write_parquet(test_dt_keys, temp_parquet_with_keys, compression = "snappy")
15461537
# Add metadata manually for keys
15471538
tbl <- arrow::read_parquet(temp_parquet_with_keys, as_data_frame = FALSE)
15481539
tbl_with_meta <- tbl$ReplaceSchemaMetadata(

0 commit comments

Comments
 (0)