Skip to content

Commit b2cd883

Browse files
committed
Lots of updates to handbook PACT data
1 parent d668f76 commit b2cd883

30 files changed

+440
-3731
lines changed

01-database.qmd

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
output-file: database.html
3+
---
4+
5+
# Dolt
6+
7+
The Research Information Gateway (RIG) is structured as a relational database. Relational databases like MySQL provide structured data storage with powerful querying capabilities, allowing complex relationships between different types of data to be efficiently modeled and accessed. The RIG uses Dolt as its database solution, which combines Git-style version control with MySQL database functionality, offering the benefits of both technologies in one integrated system.
8+
9+
## Benefits of DOLT
10+
11+
1. Complete data history tracking and rollback capabilities
12+
2. Git-like operations (fork, clone, branch, merge, push, pull)
13+
3. Conflict-free collaborative data editing
14+
4. Flexible import and export of data through CLI, an R package ([`doltr`](https://github.com/ecohealthalliance/doltr)) and the DoltHub API
15+
5. MySQL compatibility for seamless integration with existing systems such as Africa CDC's [Knowledge Hub](https://khub.africacdc.org/)
16+
17+
This approach brings software development best practices to data management while maintaining a familiar SQL interface.
18+

01-dolt.qmd

Lines changed: 0 additions & 125 deletions
This file was deleted.

02-database-schema.qmd

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Schema
2+
3+
<details>
4+
<summary>The final unified table, <code>africa_unified</code>, integrates data from various sources with the following structure: (click to expand)</summary>
5+
6+
| Field | Description |
7+
|-------|-------------|
8+
| **activity_id** | Unique identifier for each activity or record |
9+
| **Title** | Name of the publication or activity |
10+
| **Publication Link** | URL to access the publication |
11+
| **Description** | Detailed explanation of the content |
12+
| **Cover Image** | Link to associated image |
13+
| **Data Category** | Primary classification of the data |
14+
| **Data sub-Category** | Secondary classification for more granular organization |
15+
| **Publication Category** | Type of publication (e.g., journal article, report) |
16+
| **Geographical Coverage** | Areas covered by the data |
17+
| **Geographical Coverage Country ISO** | Standard country codes for geographic areas |
18+
| **Citation Link** | Reference information for academic citation |
19+
| **Associated Authors** | Names of contributors |
20+
| **Activity Type** | Classification of action or research type |
21+
| **activity_start_date** | When the activity began |
22+
| **activity_end_date** | When the activity concluded |
23+
| **funder_name** | Organization(s) providing financial support |
24+
| **topic_name** | Subject matter classification |
25+
| **diseases** | Health conditions addressed |
26+
| **disease_types** | Categories of diseases covered |
27+
| **au_region_name** | African Union regional classification |
28+
| **data_source** | Origin of the data |
29+
</details>
30+
31+
## Note
32+
33+
The database contains identifying information about researchers and is currently private. Contact [Andrew Agaba](mailto:[email protected]) for more information and access to the database.

02-dolthub.qmd

Lines changed: 0 additions & 31 deletions
This file was deleted.

03-dolt-install.qmd

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
output-file: dolt-install.html
3+
---
4+
5+
# Installing Dolt {#sec-install-dolt}
6+
7+
Dolt is extremely simple to install. Dolt is a single ~100 megabyte program. To install it, you download or compile that program and put it on your PATH. To install in specific operating systems, follow the instructions below:
8+
9+
::: {.panel-tabset}
10+
11+
## Windows {#sec-install-dolt-windows}
12+
13+
### winget {#sec-install-dolt-windows-winget .unnumbered}
14+
15+
```bash
16+
winget install dolt
17+
```
18+
19+
<br/>
20+
21+
### Chocolatey {#sec-install-dolt-windows-chocolatey .unnumbered}
22+
23+
```bash
24+
choco install dolt
25+
```
26+
27+
Both `.msi` files and `.zip` files are available.
28+
29+
<br/>
30+
31+
### Scoop {#sec-install-dolt-windows-scoop .unnumbered}
32+
33+
```bash
34+
scoop install dolt
35+
```
36+
37+
<br/>
38+
39+
### MSI Files {#sec-install-dolt-windows-msi .unnumbered}
40+
41+
The easiest way to install Dolt on Windows is to use the MSI files that are provided with each release. They can be found in the Assets section of every release. Grab the latest [here](https://github.com/dolthub/dolt/releases/latest).
42+
43+
<br/>
44+
45+
### `.zip` Archive {#sec-install-dolt-windows-zip .unnumbered}
46+
47+
For those preferring to install Dolt manually a zipped archive is provided with the requisite executables. It can be found in assets along with the [latest release](https://github.com/dolthub/dolt/releases/latest).
48+
49+
## macOS {#sec-install-dolt-mac}
50+
51+
### Install Script {#sec-install-dolt-macos-install-script .unnumbered}
52+
53+
The download script for Linux can be used, as OSX is a `*nix` system. It will download the appropriate binary, and place it in `/usr/local/bin`:
54+
55+
```bash
56+
sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | bash'
57+
```
58+
59+
<br/>
60+
61+
### Homebrew {#sec-install-dolt-macos-homebrew .unnumbered}
62+
63+
A Homebrew formula is available with with every release, so Mac users using Homebrew for package management can build Dolt from source with a single command:
64+
65+
```bash
66+
$ brew install dolt
67+
==> Downloading https://homebrew.bintray.com/bottles/dolt-0.18.3.catalina.bottle.tar.gz
68+
==> Downloading from https://d29vzk4ow07wi7.cloudfront.net/c03cc532d5045fa090cb4e0f141883685de3765bf1d221e400c750b3ae89e328?response-content-disposition=attachment%3Bfilename%3D%22dolt-0.18.3.catalina.bottle.tar.gz%22&Policy=eyJTdGF0
69+
######################################################################## 100.0%
70+
==> Pouring dolt-0.18.3.catalina.bottle.tar.gz
71+
🍺 /usr/local/Cellar/dolt/0.18.3: 7 files, 56.9MB
72+
```
73+
74+
Which will install Dolt as follows:
75+
76+
```bash
77+
$ ls -ltr $(which dolt)
78+
lrwxr-xr-x 1 oscarbatori admin 30 Aug 26 16:49 /usr/local/bin/dolt -> ../Cellar/dolt/0.18.3/bin/dolt
79+
```
80+
81+
<br/>
82+
83+
### MacPorts {#sec-install-dolt-macos-macport .unnumbered}
84+
85+
On macOS, Dolt can also be installed via a [community-managed port](https://ports.macports.org/port/dolt/) via [MacPorts](https://www.macports.org/):
86+
87+
```bash
88+
sudo port install dolt
89+
```
90+
91+
## Linux {#sec-install-dolt-linux}
92+
93+
For Linux users, an installation script is available that will detect your architecture, download the appropriate binary, and place in `/usr/local/bin`:
94+
95+
```bash
96+
sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | sudo bash'
97+
```
98+
99+
The use of `sudo` is required to ensure the binary lands in your path.
100+
101+
:::

03-rstats.qmd

Lines changed: 0 additions & 12 deletions
This file was deleted.

04-dolthub.qmd

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# DoltHub {#sec-dolthub}
2+
3+
[DoltHub](https://www.dolthub.com) is GitHub for Dolt databases - a platform to share, collaborate on, and manage Dolt databases. DoltHub hosts public data for free and provides a modern, secure web GUI for database management.
4+
5+
As a [Dolt remote](https://docs.dolthub.com/concepts/dolt/git/remotes), you can [clone](https://docs.dolthub.com/cli-reference/cli#dolt-clone), [push](https://docs.dolthub.com/cli-reference/cli#dolt-push), [pull](https://docs.dolthub.com/cli-reference/cli#dolt-pull) and [fetch](https://docs.dolthub.com/cli-reference/cli#dolt-fetch) from DoltHub. It adds collaborative features including:
6+
7+
- [Permissions](https://docs.dolthub.com/concepts/dolthub/permissions)
8+
- [Pull requests](https://docs.dolthub.com/concepts/dolthub/prs)
9+
- [Issues](https://docs.dolthub.com/concepts/dolthub/issues)
10+
- [Forks](https://docs.dolthub.com/concepts/dolthub/forks)
11+
- A built-in SQL workbench for exploring and modifying databases through the web
12+
13+
## DoltHub API {#sec-dolthub-api}
14+
15+
DoltHub offers an [API](https://docs.dolthub.com/products/dolthub/api) with the following capabilities:
16+
17+
1. [Authentication](https://docs.dolthub.com/products/dolthub/api/authentication)
18+
2. [SQL API](https://docs.dolthub.com/products/dolthub/api/sql) - For read/write SQL queries to DoltHub databases
19+
3. [CSV API](https://docs.dolthub.com/products/dolthub/api/csv) - For downloading CSV versions of DoltHub tables
20+
4. [Database API](https://docs.dolthub.com/products/dolthub/api/database) - For interacting with databases and pull requests
21+
5. [Hooks](https://docs.dolthub.com/products/dolthub/api/hooks) - For receiving notifications about database changes

05-rstats.qmd

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# R {#sec-rstats}
2+
3+
[R](https://cran.r-project.org) is a free and open-source programming language and software environment for statistical computing and graphics. Installation instructions are available on the [R Project website](https://cran.r-project.org/doc/manuals/r-release/R-admin.html). Developed in the early 1990s at the University of Auckland, R is widely used by statisticians, data analysts, and researchers across various fields.
4+
5+
R provides a comprehensive range of statistical and graphical techniques including linear and nonlinear modeling, statistical tests, time-series analysis, classification, and clustering. Its active community continuously develops new packages and extensions, making it powerful for data science applications. For the Research Information Gateway, R's ability to integrate with tools like Airtable through community-developed packages makes it particularly well-suited.
6+
7+
## R Data Processing Workflow
8+
9+
### Overview
10+
The Research Information Gateway uses R to process dataset files before importing them into Dolt. This workflow handles data cleaning, transformation, and standardization.
11+
12+
### Key Processing Steps
13+
14+
1. **Setup & Data Loading**
15+
- Installs required packages using pacman
16+
- Loads custom functions from the R folder
17+
- Reads source CSV files and reference data
18+
19+
2. **Data Transformation**
20+
- Converts dummy variables back to categorical data
21+
- Combines related fields (such as name components)
22+
- Converts separate year and month values into proper date formats
23+
24+
3. **Data Enrichment**
25+
- Maps entity names to standardized codes
26+
- Identifies regional classifications
27+
- Applies consistent mapping across related fields
28+
29+
4. **Data Cleanup**
30+
- Removes duplicates and empty columns
31+
- Standardizes NULL/NA/empty values
32+
- Validates field lengths
33+
34+
5. **Export & Database Import**
35+
- Exports processed data to CSV
36+
- Uses Dolt commands to import with appropriate primary keys
37+
38+
This workflow ensures data is properly structured, standardized, and ready for use in the Research Information Gateway's Dolt database.

0 commit comments

Comments
 (0)