Measurement volume heuristic for faulty measurements detection by LDiazN · Pull Request #145 · ooni/data

LDiazN · 2026-02-10T12:02:21Z

This PR implements the volume analysis heuristic for detecting faulty measurements

Creates the analysis function that inspects the measurement volume for probes in 1 minute groups
Saves results to the database
Also created the airflow task
Added to the hourly_batch_measurement_processing dag

closes #144

codecov · 2026-02-10T12:03:01Z

Codecov Report

❌ Patch coverage is 96.42857% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.23%. Comparing base (e70e168) to head (44bd91b).

Files with missing lines	Patch %	Lines
oonipipeline/src/oonipipeline/tasks/volume.py	86.66%	2 Missing ⚠️
oonipipeline/tests/conftest.py	89.47%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #145      +/-   ##
==========================================
+ Coverage   82.77%   83.23%   +0.46%     
==========================================
  Files          78       81       +3     
  Lines        4871     4981     +110     
==========================================
+ Hits         4032     4146     +114     
+ Misses        839      835       -4

Flag	Coverage Δ
oonidata	`77.86% <ø> (ø)`
oonipipeline	`86.65% <96.42%> (+0.63%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hellais · 2026-02-10T12:36:19Z

dags/pipeline.py

        op_make_observations_hourly
        >> op_make_analysis_hourly
        >> op_make_event_detector_hourly
+        >> op_make_volume_analysis_hourly


I don't think we need to make this depend on the previous steps. It can run on it's own separate from them on it's own even concurrently.

hellais · 2026-02-10T12:44:18Z

oonipipeline/src/oonipipeline/analysis/volume.py

+    query = """
+    SELECT
+        probe_cc, probe_asn, engine_version,
+        software_version, platform, architecture,


I would suggest adding here the software_name key as well

hellais · 2026-02-10T12:52:36Z

I ran an analysis with the threshold you suggested above on the data from the first 10 days of January and I get 541 anomalies.

This is the query I ran:

    SELECT
        probe_cc, probe_asn, engine_version, software_name,
        software_version, platform, architecture,
        toStartOfMinute(measurement_start_time) as minute_start,
        test_name,
        count() as total
    FROM fastpath
    WHERE
        measurement_start_time >= '2026-01-01' AND
        measurement_start_time < '2026-01-10'
    GROUP BY probe_cc, probe_asn, engine_version, test_name, software_name, software_version, platform, architecture, minute_start
    HAVING total >= 200

I then counted the occurrences of anomalies by the ['probe_cc', 'probe_asn', 'platform', 'software_version', 'test_name', 'software_name'] key like this:

df_volume_anomaly[
    ['probe_cc', 'probe_asn', 'platform', 'software_version', 'software_name','test_name', 'total']
].groupby(['probe_cc', 'probe_asn', 'platform', 'software_version', 'test_name', 'software_name']).count().reset_index().to_markdown()

and get:

probe_cc	probe_asn	platform	software_version	test_name	software_name	total
CN	9808	React OS	3.28.0-alpha	web_connectivity	ooniprobe-react-os	508
ES	3352	android	5.3.0	web_connectivity	ooniprobe-android-unattended	1
ES	3352	macos	3.28.0	web_connectivity	ooniprobe-cli	1
ES	57269	windows	3.26.0	web_connectivity	ooniprobe-cli-laligagate	19
IL	51825	android	5.3.0	web_connectivity	ooniprobe-android	2
MY	9930	linux	3.18.1	web_connectivity	miniooni	7
US	10796	android	5.3.0	web_connectivity	ooniprobe-android	2
US	11351	android	5.3.0	web_connectivity	ooniprobe-android	1

The mean of the thresholds is:

probe_cc	probe_asn	platform	software_version	test_name	software_name	total mean
CN	9808	React OS	3.28.0-alpha	web_connectivity	ooniprobe-react-os	261.72
ES	3352	android	5.3.0	web_connectivity	ooniprobe-android-unattended	218
ES	3352	macos	3.28.0	web_connectivity	ooniprobe-cli	217
ES	57269	windows	3.26.0	web_connectivity	ooniprobe-cli-laligagate	225.895
IL	51825	android	5.3.0	web_connectivity	ooniprobe-android	363
MY	9930	linux	3.18.1	web_connectivity	miniooni	229.143
US	10796	android	5.3.0	web_connectivity	ooniprobe-android	741.5
US	11351	android	5.3.0	web_connectivity	ooniprobe-android	1248

It seems like this should be a pretty reasonable starting point as we are capturing a fair number of anomalies, but at the same time it's not so noisy.

LDiazN added 6 commits February 6, 2026 15:22

Add fixture to test volume analysis

60d31ed

Add volume analysis tests

e433f92

Fixing testing of volume analysis query

ccf99d4

Fix broken tests

617edd8

Add assert to check expected time in event

5a5d590

Add task to analyse measurement volume for measurement anomalies

0fa1509

LDiazN requested a review from hellais February 10, 2026 12:02

LDiazN self-assigned this Feb 10, 2026

LDiazN added 3 commits February 10, 2026 13:07

simplify tests code

894e810

black reformat

c37eefb

Add volume analysis files

cf90993

hellais added the funder/otfcred2025 label Feb 10, 2026

hellais reviewed Feb 10, 2026

View reviewed changes

LDiazN added 4 commits February 10, 2026 13:59

Remove hourly analysis from dependencies

c1c2665

Add software_name key details and probe_id

985fdef

Change time definition to be similar to other tables

d78bba0

Change time to ts in tests

44bd91b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measurement volume heuristic for faulty measurements detection#145

Measurement volume heuristic for faulty measurements detection#145
LDiazN wants to merge 13 commits intomainfrom
volume-analysis

LDiazN commented Feb 10, 2026

Uh oh!

codecov bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

hellais Feb 10, 2026

Uh oh!

hellais Feb 10, 2026

Uh oh!

LDiazN Feb 10, 2026

Uh oh!

hellais commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LDiazN commented Feb 10, 2026

Uh oh!

codecov bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hellais Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

hellais Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

LDiazN Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

hellais commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 10, 2026 •

edited

Loading