Fix some race conditions in xca6ll/hot/amcssth. by gareth-rees · Pull Request #83 · Ravenbrook/mps

gareth-rees · 2022-12-24T15:18:52Z

Use memory-model-aware builtins in GCC and Clang when a memory location may be written by one thread and read by another, avoiding race conditions due to out-of-order updates on ARM.
Call dylan_make_wrappers while the test is still single-threaded, preventing multiple threads from racing to call it.
Prevent dylan_init from creating a padding object, as we must not have an exact root pointing at a padding object.

Note that this does not fix all the race conditions—there is at least one more that we have not figured out—but it reduces the frequency of failures in amcssth to less than one in a hundred on my 8-core Apple M2.

Work towards #59 (Rare failure in amcssth in hot variety on Linux)

gareth-rees · 2023-01-13T20:28:39Z

@rptb1 You added #59 to the "Successfully merging this pull request may close these issues" list in the GitHub interface. I've removed it again because as far as I know this pull request does not fix that issue—it fixes some of the races but not all of them.

* Use memory-model-aware builtins in GCC and Clang when a memory location may be written by one thread and read by another, avoiding race conditions due to out-of-order updates on ARM. * Call dylan_make_wrappers while the test is still single-threaded, preventing multiple threads from racing to call it. * Prevent dylan_init from creating a padding object, as we must not have an exact root pointing at a padding object.

rptb1 · 2023-01-14T10:38:22Z

@rptb1 You added #59 to the "Successfully merging this pull request may close these issues" list in the GitHub interface. I've removed it again because as far as I know this pull request does not fix that issue—it fixes some of the races but not all of them.

Do you think it's better practice (in GitHub) to only link completely fixed issues using the "Development" field, and link partially fixed issues from comments? It would make sense if GitHub closes those issues automatically.

(This could affect #97 )

gareth-rees · 2023-01-14T16:26:04Z

Do you think it's better practice (in GitHub) to only link completely fixed issues using the "Development" field, and link partially fixed issues from comments? It would make sense if GitHub closes those issues automatically.

GitHub automatically closes the issues in the "Development" field when the pull request is merged, so if you don't want the issue to be closed automatically then you shouldn't put it there.

…repare for review

rptb1 · 2023-02-01T00:18:26Z

Executing proc.review.entry

Applying entry.universal and entry.impl
Source documents are available, and include: "Built-in Functions for Memory Model Aware Atomic Operations" (GCC), "LLVM Atomic Instructions and Concurrency Guide" (LLVM), and "Clang Language Extensions", section on "Atomic operations". Review status unknown! Two of these are nicely referenced from the code.

Entry passed.

Entry took 15 mins.

rptb1 · 2023-02-01T13:31:20Z

Executing proc.review.plan.

proc.review.plan.time: Less than 100 lines have changed. The change is low risk, since it only introduces some barriers, but reviewing them will need a good understanding of the source documents, which are approx 6500 words. So 100 lines @ 10 lines/min is about 10 mins of checking the code lines, plus 50 mins for source checking, so 1 hour for checking.
proc.review.plan.roles: @thejayps @rptb1 and @UNAA008 reviewing. Let's have at least two people doing proc.review.role.check.source, and one other for all the others.
proc.review.plan.schedule: Review will start 2023-02-02 15:00 UTC for about 2h.

Planning took 15 mins.

rptb1 · 2023-02-02T15:17:52Z

Executing proc.review.kickoff

Start 15:05.
proc.review.ko.roles: @UNAA008 and @thejayps concentrate on source consistency, @rptb1 does the rest and is proc.review.role.leader.
Reconvene for logging at 16:00.

Kickoff took 15 minutes.

UNAA008

I concentrated on understanding consistency of the atomic operations with the source material.
Macros defined in testlib.h appear consistent.
I was troubled by the comment about MSVC being Intel only.
I don't know if the MPS in general is asserted to conform to the C++11 memory model.
It took me 30 minutes to be reasonably sure I understood what the source material was saying.
I didn't understand the purpose of the changes in fmtdytst.c.
Possible minor defect in amcss.c - multiple delclarations on one line? https://github.com/Ravenbrook/mps/pull/83/files#diff-2fe9be832421a20cd9426718eb3a93a0c04a930fa37eed15207bdeace863dfefR111

rptb1

Executing proc.review.check

Read #59
Partially read https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html#g_t_005f_005fatomic-Builtins
Examining use of __atomics.
M: I think amcssth may not be consistent with its purpose, which is to stress the AMC pool. Instead, it seems to be busy stressing itself, and not acting like a common mutator program by using locks. This is major because we might spend a lot of time fixing amcssth. #59 (comment) says "Still occurs if we turn off garbage collection by parking the area." Why should we be debugging such a case? Also, rule.code.simple, rule.code.justified, rule.code.independent.
M: The fact that the above arises suggests a problem with rule.generic.purpose.
Transformations appear to preserve correctness.
mi: branch/2021-01-25/amcssth-races seems to be an undocumented source?
proc.review.check.metrics: 2 Major, 3 minor, 35 mins checking, read entire doc, but focussed on diffs really. Problem: poor concentration due to ME/CFS.

rptb1 · 2023-02-02T15:43:12Z

code/testlib.h

+/* See <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html>
+ * and <https://clang.llvm.org/docs/LanguageExtensions.html> */
+#define atomic_load(SRC, DEST) __atomic_load(SRC, DEST, __ATOMIC_ACQUIRE)
+#define atomic_store(DEST, SRC) __atomic_store(DEST, SRC, __ATOMIC_RELEASE)


m: Unclear why __ATOMIC_ACQUIRE and __ATOMIC_RELEASE as opposed to other possibilities. rule.generic.clear

rptb1 · 2023-02-02T15:44:26Z

code/testlib.h

+#elif defined(MPS_BUILD_MV)
+
+/* Microsoft Visual C/C++ does not need memory-model-aware load and store as
+ * loads and stores of register-sized values are atomic on Intel. */


m: Is the atomicity important, or is it memory ordering? Should at least say which and why. Possible reasoning error also. rule.generic.self

thejayps

Found 2 minor (2m)
40 mins checking
Checked source code diffs, PR comments and partially consulted 3 interface documents for atomic operations.

thejayps · 2023-02-02T15:58:01Z

code/testlib.h

@@ -292,6 +292,25 @@ extern void randomize(int argc, char *argv[]);
 extern void testlib_init(int argc, char *argv[]);




m: Two separate documents describing the interface to llvm atomic operations exist and describe different interfaces.
https://clang.llvm.org/docs/LanguageExtensions.html#c11-atomic-operations
https://www.llvm.org/docs/Atomics.html

The former directs us to use a part of the interface to check that atomic operations exist. It seems that the code below is using the interface described in the second document. If my understanding of this is correct, then it isn't clear to me why one is used and not the other. My knowledge of this area isn't well developed so the clarity issue may be in the documentation of the interfaces rather than in the documentation of this pull request. However, some explanation by way of comment on this pull request of how these interfaces were identified and chosen could be useful for adding extra clarity.

@rptb1 summary: It's not clear to @thejayps why A not B. rule.generic.clarity

code/fmtdytst.c

rptb1 · 2023-02-02T16:46:50Z

Executing proc.review.log

Start time 16:03.
proc.review.log.sums: @UNAA008 0M, 1m. @thejayps 0M, 2m. @rptb1 2M, 3m.
NI for proc.review: You're not restricted to thinking about defects introduced by a change. Find any major defects.
NI for proc.review: The editor can escalate stuff and still get their change through. Review exit doesn't say "did you fix everything mentioned" it says "did you take action..."
"The road to hell is paved with correctness preserving diffs."
NM: @thejayps If this test does not fulfill its purpose then there isn't one that does the job.
Nm: @thejayps Tests are not readily identifiable based on their location within the directory structure. You need knowledge to know what might be a test and what is actually part of the MPS library and what isn't.
End time 16:45. Brainstorm at 16:55.

Logging took 42 mins.

rptb1 · 2023-02-02T17:24:37Z

Executing proc.review.brainstorm

Start time 16:55.
Fix some race conditions in xca6ll/hot/amcssth. #83 (review) point 4: @thejayps Regularly, randomly select documents, and argue for their deletion. @rptb1 I like this because it combats accretion. @rptb1 Similar: every document should have a review date, that review should include your point, and purpose checks, etc. Not just "does it work". Would need resource allocation. Regularly put @thejayps in charge of regularly doing things. :) There is a rule that we don't copy and paste code, but amcssth is totally copy-pasted. I, @rptb1 broke that and we're paying the price. How'd that happen? @UNAA008 says perhaps it was lack of rigor.
Fix some race conditions in xca6ll/hot/amcssth. #83 (comment) point 6: What jobs? What jobs are needed? How can we tell if any are not being done? There's a gap there. (That's just broadening the issue.) @UNAA008 Whenever you write a non-trivial function you should write test code alongside it. Connecting that code would justify that code, and we'd know there was coverage. Some sort of coverage mapping. @thejayps Tests need to be driven by requirements. There should be links between the two. @rptb1 These are breaks in the chain of justification: the "why links". Maybe we could be better at enforcing that any new stuff (or changed stuff) has an unbroken "why" chain. Especially new files.
@thejayps Brainstorm doesn't seem to permit new issues to come up. We could mention this possibility in proc.review.brainstorm. Capture and move on.
@UNAA008 What about stuff that occurs to me a day later? proc.review could mention that possibility. The conversation should continue, but how? Or maybe a followup meeting or one-to-ones by the leader. Or the proc.review.role.improver could ask people.

Possible new issues:

there's no index of tests
design.mps.tests doesn't talk about individual tests
tests designs are missing

End at 17:25. 30 mins bang on!

rptb1 · 2023-02-02T17:27:36Z

We haven't assigned proc.review.role.editor or proc.review.role.improver and @thejayps is away next week. We will perhaps do some of these as a group for training purposes and pair programming.

…s. Clarifying reminders based on review testing <#83 (comment)>.

rptb1 · 2023-02-19T17:26:18Z

4. @thejayps Brainstorm doesn't seem to permit new issues to come up. We could mention this possibility in proc.review.brainstorm. Capture and move on.

Fixed in 8848b70

…t 4.

thejayps · 2023-02-21T10:11:16Z

5. @UNAA008 What about stuff that occurs to me a day later? proc.review could mention that possibility. The conversation should continue, but how? Or maybe a followup meeting or one-to-ones by the leader. Or the proc.review.role.improver could ask people.

Comment written by @rptb1 on @thejayps session:

So this is a bit meta-circular: @thejayps asked me about improving proc.review to mention capture of brainstorming thoughts that come up later. (And his asking me is an example of that!) I'm writing this to demonstrate what I would do about that:

located the relevant brainstorm (above) Fix some race conditions in xca6ll/hot/amcssth. #83 (comment)
I checked that the review log (this pull request conversation) is still active (marked pending) and so will be seen again at some point
I added my thought (this comment).
The above could be summarised in proc.review as something like "If you think of something after review then append to the review log as long as it hasn't been closed, otherwise raise an issue."
Done!

gareth-rees requested review from UNAA008, rptb1 and thejayps December 24, 2022 15:19

gareth-rees force-pushed the 59-memory-aware branch from 4fd7155 to b71e18c Compare December 24, 2022 15:20

gareth-rees changed the title ~~[#59] Fix some race conditions in xca6ll/hot/amcssth.~~ Fix some race conditions in xca6ll/hot/amcssth. Dec 24, 2022

This was referenced Dec 24, 2022

Rare failure in amcssth in hot variety on Linux #59

Open

MMQA test cases fail on xca6ll #84

Open

gareth-rees force-pushed the branch/2022-12-23/hardened-runtime branch from f66d805 to 2ab4580 Compare January 6, 2023 18:32

rptb1 linked an issue Jan 13, 2023 that may be closed by this pull request

Rare failure in amcssth in hot variety on Linux #59

Open

rptb1 added test Issue with a test case arch.a6 Relates to arm64/aarch64 labels Jan 13, 2023

gareth-rees removed a link to an issue Jan 13, 2023

Rare failure in amcssth in hot variety on Linux #59

Open

gareth-rees changed the base branch from branch/2022-12-23/hardened-runtime to master January 13, 2023 20:31

gareth-rees force-pushed the 59-memory-aware branch from b71e18c to e089937 Compare January 13, 2023 20:34

Catch-up merge branch 'master' into 59-memory-aware to apply CI and p…

14adaee

…repare for review

UNAA008 reviewed Feb 2, 2023

View reviewed changes

rptb1 requested changes Feb 2, 2023

View reviewed changes

thejayps requested changes Feb 2, 2023

View reviewed changes

rptb1 mentioned this pull request Feb 3, 2023

Adapting Ravenbrook and MM group review procedure to public MPS #123

Draft

rptb1 added a commit that referenced this pull request Feb 3, 2023

Giving GitHub entities their official names and linking to GitHub doc…

6949ee7

…s. Clarifying reminders based on review testing <#83 (comment)>.

rptb1 added the blocked Unable to proceed. See comment for reason. label Feb 14, 2023

rptb1 mentioned this pull request Feb 16, 2023

Process improvements for 2023-02-10/17 #152

Merged

rptb1 added the pending Something needs doing, even if closed. label Feb 17, 2023

rptb1 added a commit that referenced this pull request Feb 19, 2023

Adding proc.review.brainstorm.new in response to <#83 (comment)> poin…

8848b70

…t 4.

thejayps added needs analaysis The issue needs analysis before it can be resolved. optional Will cause failures / of benefit. Worth assigning resources. labels Mar 27, 2023

		@@ -292,6 +292,25 @@ extern void randomize(int argc, char *argv[]);
		extern void testlib_init(int argc, char *argv[]);

Conversation

gareth-rees commented Dec 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gareth-rees commented Jan 13, 2023

Uh oh!

rptb1 commented Jan 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gareth-rees commented Jan 14, 2023

Uh oh!

rptb1 commented Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rptb1 commented Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rptb1 commented Feb 2, 2023

Uh oh!

UNAA008 left a comment • edited by rptb1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rptb1 left a comment

Choose a reason for hiding this comment

Uh oh!

rptb1 Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rptb1 Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thejayps left a comment

Choose a reason for hiding this comment

Uh oh!

thejayps Feb 2, 2023 • edited by rptb1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rptb1 commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rptb1 commented Feb 2, 2023

Uh oh!

rptb1 commented Feb 2, 2023

Uh oh!

rptb1 commented Feb 19, 2023

Uh oh!

thejayps commented Feb 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gareth-rees commented Dec 24, 2022 •

edited

Loading

rptb1 commented Jan 14, 2023 •

edited

Loading

rptb1 commented Feb 1, 2023 •

edited

Loading

rptb1 commented Feb 1, 2023 •

edited

Loading

UNAA008 left a comment •

edited by rptb1

Loading

rptb1 Feb 2, 2023 •

edited

Loading

rptb1 Feb 2, 2023 •

edited

Loading

thejayps Feb 2, 2023 •

edited by rptb1

Loading

rptb1 commented Feb 2, 2023 •

edited

Loading

thejayps commented Feb 21, 2023 •

edited

Loading