Skip to content

Commit 089fcba

Browse files
authored
Merge pull request #876 from elementary-data/prod
Prod
2 parents 09be383 + 017bb2e commit 089fcba

27 files changed

Lines changed: 953 additions & 333 deletions

docs/_snippets/column-metrics.mdx

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
1+
**Default monitors by type:**
2+
13
| Data quality metric | Column Type |
2-
|----------------------| ----------- |
4+
|----------------------|-------------|
35
| `null_count` | any |
46
| `null_percent` | any |
57
| `min_length` | string |
@@ -13,4 +15,9 @@
1315
| `zero_percent` | numeric |
1416
| `standard_deviation` | numeric |
1517
| `variance` | numeric |
18+
19+
**Opt in monitors by type:**
20+
21+
| Data quality metric | Column Type |
22+
|----------------------|-------------|
1623
| `sum` | numeric |
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<Accordion title="How test configurations are prioritized?">
2+
3+
The configuration of Elementary is dbt native and follows the same priorities of `dbt configuration`.
4+
The more granular and specific configuration overrides the less granular one.
5+
6+
Elementary searches and prioritizes configuration in the following order:
7+
8+
For models:
9+
1. Test arguments.
10+
2. Model configuration.
11+
3. Global vars in `dbt_project.yml`.
12+
13+
For sources:
14+
1. Test arguments.
15+
2. Table configuration.
16+
3. Source configuration.
17+
4. Global vars in `dbt_project.yml`.
18+
19+
20+
</Accordion>
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: "Security and Privacy"
3+
sidebarTitle: "Security and privacy"
4+
---
5+
6+
## Security principals
7+
8+
Elementary Cloud is designed with the core principle of least privilege.
9+
Our cloud service does not require permissions to access the customer data.
10+
Therefor, we instruct our customers to create a dedicated role for Elementary with `read only` access only to the Elementary schema in your data warehouse.
11+
12+
As long as you follow the onboarding process instructions, it will be impossible for Elementary Cloud to read data from your warehouse that does not reside in the Elementary schema.
13+
This ensures that Elementary cloud will not mistakenly access your data, and minimizes the risk in case of a data breach.
14+
Our product and architecture are always evolving, but our commitment to secure design always remains.
15+
16+
17+
## How it works
18+
19+
1. You install the Elementary dbt package in your dbt project and configure it to write to it's own schema, the Elementary schema.
20+
2. The package writes test results, run results, logs and metadata to the Elementary schema.
21+
3. The cloud service only requires `read access` to the Elementary schema, not to schemas where your sensitive data is stored.
22+
4. The cloud service connects to sync the Elementary schema using an **encrypted connection** and a **static IP address** that you will need to add to your allowlist.
23+
24+
<Frame>
25+
<img
26+
src="https://res.cloudinary.com/diuctyblm/image/upload/v1682954617/security-architecture_mdmh8i.png"
27+
alt="Elementary cloud security"
28+
/>
29+
</Frame>
30+
31+
32+
## What is stored in the Elementary schema?
33+
34+
The Elementary schema stores only metadata, aggregated metrics and logs.
35+
You can find the details of the tables [here](/guides/modules-overview/dbt-package).
36+
37+
The only exception to that is the `test_results_samples` which can be disabled. This is a feature that shows a sample of a few raw failed rows for failed tests, to help them triage and understand the problem.
38+
To avoid this sampling, set the var `test_sample_rows_count: 0` in your `dbt_project.yml` (default is 5 sample rows).
39+
40+
41+
## Secrets and data protection
42+
43+
- **Tokens and credentials** - For customer secrets (tokens and credentials) we use [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html). Secrets Manager uses envelope encryption with AWS KMS keys and data keys to protect each secret value. Whenever the secret value in a secret changes, Secrets Manager generates a new data key to protect it. The data key is encrypted under a KMS key and stored in the metadata of the secret. [See this link for more details](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html).
44+
- **Customer data (Elementary schema replica)** - The synced customer data is encrypted at rest using server-side encryption (AES-256).
45+
46+
## Compliance
47+
48+
[Contact us](mailto:legal@elementary-data.com) for auditing reports and penetration testing results.
49+
50+
## Have more questions?
51+
52+
We would be happy to answer!
53+
Reach out to us on [email](mailto:legal@elementary-data.com) or [Slack](https://join.slack.com/t/elementary-community/shared_invite/zt-1b9vogqmq-y~IRhc2396CbHNBXLsrXcA).

docs/guides/add-elementary-tests.mdx

Lines changed: 32 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22
title: "Add anomaly detection tests"
33
---
44

5-
After you [install the dbt package](/quickstart#install-the-dbt-package), you can add Elementary data anomaly detection
6-
tests.
5+
After you [install the dbt package](/quickstart#install-the-dbt-package), you can add Elementary data anomaly detection tests.
76

87
## Data anomaly detection dbt tests
98

@@ -27,13 +26,13 @@ alt="Demo"
2726
Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows).
2827

2928
Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket`
30-
field), and then we compute the row count per bucket for the last `days_back` days (by default 14).
29+
field), and then we compute the row count per bucket for the last [`days_back`](/guides/anomaly-detection-configuration/days-back) days (by default 14).
3130

3231
The test then compares the row count of buckets within the detection period (last 2 days by default, controlled by the
3332
`backfill_days` var), and compares it with the row count of the previous time buckets.
3433
If there were any anomalies during the detection period, the test will fail.
3534

36-
For advanced configuration of Elementary anomaly tests, please click [here](/guides/add-elementary-tests#advanced-configuration-for-your-elementary-tests)
35+
For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration).
3736

3837

3938
<CodeGroup>
@@ -96,7 +95,7 @@ The test then compares the freshness of buckets within the detection period (las
9695
`backfill_days` var), and compares it with the freshness of the previous time buckets.
9796
If there were any anomalies during the detection period, the test will fail.
9897

99-
For advanced configuration of Elementary anomaly tests, please click [here](/guides/add-elementary-tests#advanced-configuration-for-your-elementary-tests)
98+
For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration).
10099

101100
<CodeGroup>
102101

@@ -150,7 +149,7 @@ timestamp ("now") and the most recent event timestamp.
150149
- If both an `event_timestamp_column` and an `update_timestamp_column` are provided, the test will measure over time
151150
the difference between these two columns.
152151

153-
For advanced configuration of Elementary anomaly tests, please click [here](/guides/add-elementary-tests#advanced-configuration-for-your-elementary-tests)
152+
For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration).
154153

155154
<CodeGroup>
156155

@@ -199,7 +198,7 @@ It is best to configure it on low-cardinality fields.
199198
The test counts rows grouped by given columns/expressions, and can be configured using the `dimensions`
200199
and `where_expression` keys.
201200

202-
For advanced configuration of Elementary anomaly tests, please click [here](/guides/add-elementary-tests#advanced-configuration-for-your-elementary-tests)
201+
For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration).
203202

204203
<CodeGroup>
205204

@@ -265,7 +264,7 @@ Executes column level monitors and anomaly detection on all the columns of the t
265264
are [detailed here](/guides/data-anomaly-detection#tests-and-monitors-types) and can be configured using
266265
the `all_columns_anomalies` key.
267266

268-
For advanced configuration of Elementary anomaly tests, please click [here](/guides/add-elementary-tests#advanced-configuration-for-your-elementary-tests)
267+
For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration).
269268

270269
<CodeGroup>
271270

@@ -323,10 +322,13 @@ Executes column level monitors and anomaly detection. Specific monitors
323322
are [detailed here](/guides/data-anomaly-detection#tests-and-monitors-types) and can be configured using
324323
the `column_anomalies` key.
325324

326-
For advanced configuration of Elementary anomaly tests, please click [here](/guides/add-elementary-tests#advanced-configuration-for-your-elementary-tests)
325+
For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration).
327326

328327
<CodeGroup>
329328

329+
330+
For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration).
331+
330332
```yml Models
331333
version: 2
332334
@@ -563,28 +565,35 @@ sources:
563565

564566
## Configure your elementary anomaly detection tests
565567

566-
The elementary anomaly detection tests described above can work out-of-the-box with default configuration. However,
567-
we support additional configuration that can be used to customize their behavior, depending on your needs.
568-
Read all about data anomaly detection tests configuration [here](/guides/elementary-tests-configuration).
568+
<Tip>If your data set has a timestamp column that represents the creation time of a field, it is highly recommended configuring it as a `timestamp_column`.</Tip>
569+
570+
To support different types of data sets, the tests have configuration that can be used to customize their behavior.
571+
Read more about [data anomaly detection tests configuration here](/guides/elementary-tests-configuration).
569572

570-
We recommend adding a tag to the tests so you could execute these in a dedicated run using the selection
571-
parameter `--select tag:elementary`.
572-
If you wish to only be warned on anomalies, configure the severity of the tests to warn.
573+
We recommend adding a tag to the tests so you could execute these in a dedicated run using the selection parameter `--select tag:elementary`.
574+
If you wish to only be warned on anomalies, configure the `severity` of the tests to `warn`.
573575

574576

575577
## What happens on each test?
576578

577-
Upon running a test, your data is split into time buckets based on the `time_bucket` field and is limited by
578-
the `days_back` var. The test then compares a certain metric (e.g. row count) of the buckets that are within the detection
579-
period (`backfill_days`) to the row count of all the previous time buckets within the `days_back` period.
579+
Upon running a test, your data is split into time buckets based on the [`time_bucket`](/guides/anomaly-detection-configuration/time-bucket) field and is limited by
580+
the [`days_back`](/guides/anomaly-detection-configuration/days-back) var. The test then compares a certain metric (e.g. row count) of the buckets that are within the detection
581+
period ([`backfill_days`](/guides/anomaly-detection-configuration/backfill-days)) to the row count of all the previous time buckets within the [`days_back`](/guides/anomaly-detection-configuration/days-back) period.
580582
If there were any anomalies in the detection period, the test will fail.
581-
On each test elementary package executes the relevant monitors, and searches for anomalies by comparing to historical
582-
metrics.
583-
At the end of the `dbt test` run, all results and collected metrics are merged into the elementary models.
583+
On each test elementary package executes the relevant monitors, and searches for anomalies by comparing to historical metrics.
584+
585+
<Tooltip tip="Anomaly detection tests core concepts">
586+
<img
587+
src="https://github.com/elementary-data/assets-hosting/master/anomaly_detection/elementary-anomaly-detection-core-concepts.png"
588+
alt="Elementary anomaly detection tests core concepts"
589+
/>
590+
</Tooltip>
591+
592+
To learn more, refer to [core concepts](/guides/how-anomaly-detection-works).
593+
584594

585595
## What does it mean when a test fails?
586596

587-
When a test fail, it means that an anomaly was detected on this metric and dataset. To learn more, refer
588-
to [anomaly detection](/guides/data-anomaly-detection#anomaly-detection).
597+
When a test fail, it means that an anomaly was detected on this metric and dataset. To learn more, refer to [core concepts](/guides/how-anomaly-detection-works) and [anomaly detection](/guides/data-anomaly-detection).
589598

590599

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
title: "anomaly_direction"
3+
sidebarTitle: "anomaly_direction"
4+
---
5+
6+
`anomaly_direction: both | spike | drop`
7+
8+
By default, data points are compared to the expected range and check if these are below or above it.
9+
For some data monitors, you might only want to flag anomalies if they are above the range and not under it, and vice versa.
10+
For example - when monitoring for freshness, we only want to detect data delays and not data that is “early”.
11+
The anomaly_direction configuration is used to configure the direction of the expected range, and can be set to both, spike or drop.
12+
13+
14+
- _Default: `both`_
15+
- _Supported values: `both`, `spike`, `drop`_
16+
- _Relevant tests: All anomaly detection tests_
17+
- _Configuration level: test_
18+
19+
<Frame caption="anomaly_direction change impact">
20+
<img
21+
src="https://res.cloudinary.com/diuctyblm/image/upload/v1681301375/Anomaly%20detection%20tests/anomaly_direction_r1sdl9.png"
22+
alt="anomaly_direction change impact"
23+
/>
24+
</Frame>
25+
26+
<RequestExample>
27+
28+
```yaml test
29+
models:
30+
- name: this_is_a_model
31+
tests:
32+
33+
- elementary.volume_anomalies:
34+
anomaly_direction: drop
35+
36+
- elementary.all_columns_anomalies:
37+
column_anomalies:
38+
- null_count
39+
- missing_count
40+
- zero_count
41+
anomaly_direction: spike
42+
43+
```
44+
45+
</RequestExample>
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: "anomaly_sensitivity"
3+
sidebarTitle: "anomaly_sensitivity"
4+
---
5+
6+
`anomaly_sensitivity: [int]`
7+
8+
Configuration to define how the expected range is calculated.
9+
A sensitivity of 3 means that the expected range is within 3 standard deviations from the average of the training set.
10+
Smaller sensitivity means this range will be reduced and more values will be potentially flagged as anomalies.
11+
Larger values will have the opposite effect and will reduce the number of anomalies as the expected range will be larger.
12+
13+
14+
- _Default: 3_
15+
- _Relevant tests: All anomaly detection tests_
16+
- _Configuration level: var, test config_
17+
18+
<Frame caption="anomaly_sensitivity change impact">
19+
<img
20+
src="https://res.cloudinary.com/diuctyblm/image/upload/v1683997014/anomaly_sensitivity_n9dpof.png"
21+
alt="anomaly_sensitivity change impact"
22+
/>
23+
</Frame>
24+
25+
<RequestExample>
26+
27+
```yaml dbt_project.yml
28+
vars:
29+
anomaly_sensitivity: 3
30+
```
31+
32+
```yaml test
33+
models:
34+
- name: this_is_a_model
35+
tests:
36+
- elementary.volume_anomalies:
37+
anomaly_sensitivity: 3
38+
```
39+
40+
</RequestExample>
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: "backfill_days"
3+
sidebarTitle: "backfill_days"
4+
---
5+
6+
`backfill_days: [int]`
7+
8+
Configuration to define the detection period.
9+
If the backfill_days are set to 2, only data points in the last 2 days will be included in the detection period and could be flagged anomalous.
10+
If backfill_days is set to 7 days, the detection period will be 7 days long.
11+
12+
For incremental models, this is also the period for re-calculating metrics.
13+
If metrics for buckets in the backfill days were already calculated, Elementary will overwrite them. The reason behind it is to monitor recent backfills of data, if there were any.
14+
This configuration should be changed according to your data delays.
15+
16+
- _Default: 2_
17+
- _Relevant tests: Anomaly detection tests with `timestamp_column`_
18+
- _Configuration level: test, var_
19+
20+
<Frame caption="backfill_days change impact">
21+
<img
22+
src="https://res.cloudinary.com/diuctyblm/image/upload/v1681301376/Anomaly%20detection%20tests/backfill_days_mdjmon.png"
23+
alt="backfill_days change impact"
24+
/>
25+
</Frame>
26+
27+
28+
<RequestExample>
29+
30+
```yaml dbt_project.yml
31+
vars:
32+
backfill_days: 2
33+
```
34+
35+
```yaml test
36+
models:
37+
- name: this_is_a_model
38+
tests:
39+
- elementary.volume_anomalies:
40+
backfill_days: 7
41+
```
42+
43+
</RequestExample>
44+
45+
46+
47+
#### How it works?
48+
49+
The `backfill_days` param only works for tests that have `timestamp_column` configuration.
50+
51+
It works differently according to the table materialization:
52+
53+
- **Regular tables and views** - `backfill_days` defines the detection period.
54+
- **Incremental models and sources** - `backfill_days` defines the detection period, and the period for which metrics will be re-calculated.
55+
56+

0 commit comments

Comments
 (0)