|
| 1 | +# CKutils Copilot Instructions |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +CKutils is an R package providing high-performance utility functions for simulation modelling, with C++ backends via Rcpp. Key domains: data.table operations, statistical distributions (GAMLSS-compatible), and package management utilities. |
| 6 | + |
| 7 | +## Architecture |
| 8 | + |
| 9 | +- **R files** ([R/](R/)): Public API with roxygen2 documentation |
| 10 | +- **C++ files** ([src/](src/)): Performance-critical implementations using Rcpp |
| 11 | +- **Tests** ([inst/tinytest/](inst/tinytest/)): tinytest framework, validated against gamlss.dist |
| 12 | + |
| 13 | +### Key Modules |
| 14 | +| File | Purpose | |
| 15 | +|------|---------| |
| 16 | +| `R/rng_distr.R` | `fr*` random generation functions (BCPEo, BCT, NBI, SICHEL, etc.) | |
| 17 | +| `R/lookup_dt.R` | Fast key-based table lookups with `lookup_dt()`, `absorb_dt()` | |
| 18 | +| `R/dt_ops.R` | data.table utilities: `clone_dt()`, `del_dt_rows()` | |
| 19 | +| `src/distr_*.cpp` | Vectorised distribution functions (`fd*`, `fp*`, `fq*` for PDF/CDF/quantile) | |
| 20 | +| `src/lookup_dt.cpp` | C++ backend for `lookup_dt()` using data.table API | |
| 21 | + |
| 22 | +## Development Workflow |
| 23 | + |
| 24 | +### Pre-push validation (REQUIRED) |
| 25 | +```bash |
| 26 | +./check-before-push.sh |
| 27 | +``` |
| 28 | +This runs: roxygen2 → R CMD build → R CMD INSTALL → R CMD check --as-cran → tinytest suite |
| 29 | + |
| 30 | +### Quick iteration during development |
| 31 | +```r |
| 32 | +# Regenerate documentation and install |
| 33 | +roxygen2::roxygenise() |
| 34 | +# Run tests |
| 35 | +tinytest::test_package("CKutils") |
| 36 | +``` |
| 37 | + |
| 38 | +### Build requirements |
| 39 | +- C++17 (set in `src/Makevars`) |
| 40 | +- Dependencies: `data.table >= 1.18.0`, `Rcpp`, `dqrng`, `arrow >= 22.0.0` |
| 41 | +- Tests require: `gamlss.dist` for distribution validation |
| 42 | + |
| 43 | +## Coding Conventions |
| 44 | + |
| 45 | +### R Functions |
| 46 | +- Use **roxygen2** for all exports with `@export` tag |
| 47 | +- Document all parameters with types and defaults in `@param` |
| 48 | +- Include runnable `@examples` for every exported function |
| 49 | +- Prefix fast C++ wrappers with `f` (e.g., `fdBCPEo` vs gamlss.dist's `dBCPEo`) |
| 50 | + |
| 51 | +### Distribution Functions Pattern |
| 52 | +Each distribution (e.g., BCPEo) follows naming convention: |
| 53 | +- `fd*` - density (PDF), accepts `log_` parameter |
| 54 | +- `fp*` - probability (CDF), accepts `lower_tail` parameter |
| 55 | +- `fq*` - quantile (inverse CDF) |
| 56 | +- `fr*` - random generation, uses `dqrng::dqrunif` for high-quality RNG |
| 57 | + |
| 58 | +### C++ Code |
| 59 | +- Use `recycling_helpers.h` for parameter recycling (all vectors same length) |
| 60 | +- Include SIMD hints where applicable: `#pragma GCC ivdep` |
| 61 | +- Validate inputs early with `Rcpp::stop()` for clear error messages |
| 62 | +- For data.table integration, include `<datatableAPI.h>` |
| 63 | + |
| 64 | +### data.table Operations |
| 65 | +- Functions modify tables **in place** by default (data.table semantics) |
| 66 | +- Use `copy()` explicitly when non-destructive behaviour is needed |
| 67 | +- Factor columns in `lookup_dt()` must have consecutive integer levels starting from 1 |
| 68 | + |
| 69 | +## Testing Pattern |
| 70 | + |
| 71 | +Tests in `inst/tinytest/test-*.R` follow this structure: |
| 72 | +```r |
| 73 | +# Skip if optional dependency unavailable |
| 74 | +if (!requireNamespace("gamlss.dist", quietly = TRUE)) { |
| 75 | + exit_file("gamlss.dist not available") |
| 76 | +} |
| 77 | + |
| 78 | +# Generate test data with seed for reproducibility |
| 79 | +data <- generate_test_data(n = 100, seed = 123) |
| 80 | + |
| 81 | +# Compare against reference implementation |
| 82 | +expect_equal(fdBCPEo(...), gamlss.dist::dBCPEo(...), info = "PDF matches reference") |
| 83 | +``` |
| 84 | + |
| 85 | +## Common Gotchas |
| 86 | + |
| 87 | +1. **Lookup tables**: Keys must be factors OR consecutive 1-based integers; use `is_valid_lookup_tbl()` to validate |
| 88 | +2. **Random number generation**: Always use `dqrng::dqrunif` instead of base R's `runif` for reproducibility |
| 89 | +3. **Windows compatibility**: C++ code must handle in-place modifications carefully; prefer `clone()` in Rcpp |
| 90 | +4. **R CMD check**: Avoid non-portable compiler flags; `check-before-push.sh` sets safe defaults |
0 commit comments