Skip to content

Fix a race condition in the tests#118

Merged
JamesWrigley merged 1 commit into
masterfrom
racey-tests
Jan 15, 2025
Merged

Fix a race condition in the tests#118
JamesWrigley merged 1 commit into
masterfrom
racey-tests

Conversation

@JamesWrigley

Copy link
Copy Markdown
Member

Quick explanation of what the GC tests for RemoteChannels test does:

  1. Create RemoteChannels rr and fstore on worker 1 and worker 2 respectively from the master process. At this point only the master process knows about rr and fstore.
  2. Master process calls put!(fstore, rr), i.e. we remotecall worker 2 and put rr (which is owned worker 1 but is currently only known about by the master) into fstore.
  3. Remotecall into worker 1 and check that it knows about rr.

Step 3 should succeed despite us never previously explicitly communicating with worker 1, because serialize(::ClusterSerializer, ::RemoteChannel) will send a message to the owner of the RemoteChannel to inform them of its existence (see send_add_client()). This happens asynchronously in step 2, and on rare occasions worker 1 would not process that message before step 3, causing the test to fail.

Now we give the check 10s to succeed.

Cherry-picked from JuliaParallel/DistributedNext.jl#8.

@JamesWrigley JamesWrigley self-assigned this Jan 6, 2025
@codecov

codecov Bot commented Jan 6, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.24%. Comparing base (3a43532) to head (f9d3d89).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #118      +/-   ##
==========================================
- Coverage   79.33%   79.24%   -0.09%     
==========================================
  Files          10       10              
  Lines        1916     1913       -3     
==========================================
- Hits         1520     1516       -4     
- Misses        396      397       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JamesWrigley

Copy link
Copy Markdown
Member Author

(bump)

@vchuravy vchuravy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment? I always find timeouts scary, since it may be able to fail.

Quick explanation of what the `GC tests for RemoteChannels` test does:
1. Create `RemoteChannel`s `rr` and `fstore` on worker 1 and worker 2
   respectively from the master process. At this point only the master process
   knows about `rr` and `fstore`.
2. Master process calls `put!(fstore, rr)`, i.e. we remotecall worker 2 and put
   `rr` (which is owned worker 1 but is currently only known about by the
   master) into `fstore`.
3. Remotecall into worker 1 and check that it knows about `rr`.

Step 3 should succeed despite us never previously explicitly communicating with
worker 1, because `serialize(::ClusterSerializer, ::RemoteChannel)` will send a
message to the owner of the `RemoteChannel` to inform them of its existence (see
`send_add_client()`). This happens asynchronously in step 2, and on rare
occasions worker 1 would not process that message before step 3, causing the
test to fail.

Now we give the check 10s to succeed.
@JamesWrigley

Copy link
Copy Markdown
Member Author

Sure, added in f9d3d89.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants