Skip to content

boards/arm/rp2040: allow flash write operation on rp2040 in SMP mode#17224

Merged
acassis merged 1 commit into
apache:masterfrom
sumpfralle:rp2040-cpu-pause
Oct 26, 2025
Merged

boards/arm/rp2040: allow flash write operation on rp2040 in SMP mode#17224
acassis merged 1 commit into
apache:masterfrom
sumpfralle:rp2040-cpu-pause

Conversation

@sumpfralle
Copy link
Copy Markdown
Contributor

@sumpfralle sumpfralle commented Oct 21, 2025

Summary

Previously the function up_cpu_pause was used during flash write operations for preventing all other CPUs from executing code from flash.
The above function was removed in d8cb775.
Thus, writing to flash on rp2040 was only possible in non-SMP mode.

This PR allows write operations to the flash on rp2040 even in SMP mode. It blocks all but the current CPU for the duration of the critical function (write or erase).

The implementation is based on the previous implementation (see d8cb775).
It is slightly enhanced by supporting more than two CPUs and it avoids global variables.

Closes: #16203

BEWARE: this is just a draft. Please take a look at the "discussion" at the end.

Impact

It is possible to write to flash on rp2040 even in SMP mode.

Testing

  1. enable CONFIG_PIPES and CONFIG_RP2040_FLASH_FILE_SYSTEM
  2. apply the following patch (disables "smartfs", which would block the mtd):
    --- a/boards/arm/rp2040/common/src/rp2040_common_bringup.c
    +++ b/boards/arm/rp2040/common/src/rp2040_common_bringup.c
    @@ -726,7 +726,7 @@ int rp2040_common_bringup(void)
         }
     #endif
    
    -#ifdef CONFIG_RP2040_FLASH_FILE_SYSTEM
    +#ifdef disabled_CONFIG_RP2040_FLASH_FILE_SYSTEM
    
       mtd_dev = rp2040_flash_mtd_initialize();
  3. run rp2040_flash_mtd_initialize(); in your startup code
  4. run in the NuttX shell: netcat -l 10000 | dd of=/dev/mtd bs=1k seek=512
  5. run on your host: nc -N "$nuttx_host" 10000 <filename
  6. watch the dd process in the NuttX shell to finish after the connection of netcat was closed

Without the patch, this process would hang (after commenting out the removed up_cpu_pause calls).

Running the smp app (CONFIG_TESTING_SMP) during the above write operation does not cause issues.

Details to be discussed?

  1. The proposed implementation of the "pause all other CPUs" feature is generic enough to be helpful for other architectures, as well. E.g. the esp32_spiram.c and esp32s3_spiram.c use a similar implementation for guarding their flash accesses (see g_cpu_pause). rp2350 will surely need this, too, when flash write support is added.
    • Should I move this set of functions (*_smp_isolation) somewhere else (e.g. below sched/sched/)? Which name would be appropriate? (*_smp_isolation is just my silly choice)
    • Should I propose a separate pull request with the respective changes for esp32 in order to allow others to test them? (I do not have such a board)
    • Should I revive sched_note_cpu_pause? Currently it is probably unused.
  2. The implementation uses "spin_lock_t" (instead of bool flags). I picked this for readability. Is this a burden? Should I use bool instead?
  3. The implementation moves the "enter_critical_section" call closer to the flash write operation (inside the "SMP isolation" calls). Previously enter_critical_section was called before up_cpu_pause. Now it is called afterwards.
    • I am not sure, why this is necessary, but the original call location leads to random hangs after writing around 50 kB.
  4. The functions enter_smp_isolation and leave_smp_isolation call sched_lock/sched_unlock. This may feel a bit excessive, but I think, it is necessary:
    • a) nxsched_smp_call_async tries to run the "pause handler" directly on the current CPU, if this_cpu() is part of the given cpuset (see source). This would starve our current task and prevent the execution of "leave_smp_isolation". This behavior of "nxsched_smp_call_async" is probably a bug.
    • b) We should not allow other SMP-aware functions to run, because they may rely on SMP being enabled (causing deadlocks?). This is possible, since CONFIG_SMP is used everywhere as the indicator for "multiple processes running in parallel". This is not true during the "smp isolation" context created by these functions.
  5. At the moment, the blocked CPUs may receive and handle interrupts. At least I do not see anything preventing this. Maybe we should run "enter_critical_section" for each busy-locked CPU in order to avoid this?

@github-actions github-actions Bot added Arch: arm Issues related to ARM (32-bit) architecture Size: M The size of the change in this PR is medium labels Oct 21, 2025
@linguini1
Copy link
Copy Markdown
Contributor

Summary

Previously the function up_cpu_pause was used during flash write operations for preventing all other CPUs from executing code from flash. The above function was removed in d8cb775. Thus, writing to flash on rp2040 was only possible in non-SMP mode.

This PR allows write operations to the flash on rp2040 even in SMP mode. It blocks all but the current CPU for the duration of the critical function (write or erase).

The implementation is based on the previous implementation (see d8cb775). It is slightly enhanced by supporting more than two CPUs and it avoids global variables.

Thank you for the detailed test information! It's always appreciated a lot.

Just to wrap my head around this, did d8cb775 break the RP2040 code for SMP flash writing entirely? As in, it worked before this commit and then began to fail after that commit (I assume you found this with git bisect)? If so, that is a huge issue with the way that commit was tested before merge, we might want to see if there are other cases where things broke.

Details to be discussed?

1. The proposed implementation of the "pause all other CPUs" feature is generic enough to be helpful for other architectures, as well. E.g. the `esp32_spiram.c` and `esp32s3_spiram.c` use a similar implementation for guarding their flash accesses (see `g_cpu_pause`). `rp2350` will surely needs this, too, when flash write support is added.
   
   * Should I move this set of function (`*_smp_isolation`) somewhere else (e.g. below `sched/sched/`)? Which name would be appropriate? (`*_smp_isolation` is just my silly choice)

I think that's not a bad idea if it's generally applicable. I think it would be fine to merge this fix first since it solves (presumably) a regression, and then afterward you could see about re-factoring the solution to use your more general implementation?

   * Should I propose a separate pull request with the respective changes for esp32 in order to allow others to test them? (I do not have such a board)

Yes, there are some people working on Espressif boards and can probably run your changes against their internal CI to catch anything. I would like to note that it's very appreciated that you mention this, since there are a lot of PRs which only test one board and assume the changes apply just as well elsewhere.

   * Should I revive `sched_note_cpu_pause`? Currently it is probably unused.

What I'm wondering is, do the up_cpu_pause/resume and etc. functions removed in the linked commit not achieve the same task that you'd achieve by creating your general SMP isolation functions? Because then you could just revive the old implementations. I haven't looked thoroughly, so if your new implementation would be more streamlined (you mentioned no globals) then just go with that. I'm not familiar with the semantics of sched_note_cpu_pause vs up_cpu_pause, etc.

2. The implementation uses "spin_lock_t" (instead of `bool` flags). I picked this for readability. Is this a burden? Should I use `bool` instead?

3. The implementation moves the "enter_critical_section" call closer to the flash write operation (inside the "SMP isolation" calls). Previously `enter_critical_section` was called _before_ `up_cpu_pause`. Now it is called afterwards.
   
   * I am not sure, _why_ this is necessary, but the original call location leads to random hangs after writing around 50 kB.

4. The functions `enter_smp_isolation` and `leave_smp_isolation` call `sched_lock`/`sched_unlock`. This may feel a bit excessive, but I think, it is necessary:

Can't really help with any of this, sorry. I'm not familiar enough with much of the scheduling logic to say anything of value.

   * a) `nxsched_smp_call_async` tries to run the "pause handler" directly on the current CPU, if `this_cpu()` is part of the given cpuset (see [source](https://github.com/apache/nuttx/blob/master/sched/sched/sched_smp.c#L317)). This would starve our current task and prevent the execution of "leave_smp_isolation". This behavior of "nxsched_smp_call_async" is probably a bug.

Maybe you could raise an issue for it and post to the mailing list to get more eyes on this?

   * b) We should not allow other SMP-aware functions to run, because they may rely on SMP being enabled (causing deadlocks?). This is possible, since `CONFIG_SMP` is used everywhere as the indicator for "multiple processes running in parallel". This is not true during the "smp isolation" context created by these functions.

5. At the moment, the blocked CPUs may receive and handle interrupts. At least I do not see anything preventing this. Maybe we should run "enter_critical_section" for each busy-locked CPU in order to avoid this?

Ditto, not informed enough to help with this either.

@sumpfralle
Copy link
Copy Markdown
Contributor Author

Thanks for your comments!

Just to wrap my head around this, did d8cb775 break the RP2040 code for SMP flash writing entirely?

It was a compile-time breakage. The referenced function was simply missing. See #16203. No other file contained a reference to that function.

Probably it is a rare use-case (for now) to write to the internal flash storage.

What I'm wondering is, do the up_cpu_pause/resume and etc. functions removed in the linked commit not achieve the same task that you'd achieve by creating your general SMP isolation functions?

This is not really my area of expertise, but I guess, d8cb775 wanted to replace a dirty implementation (which was used in too many places) with a clean implementation. But that is just guessing. I cannot really understand that commit message, but this is probably based on my lack of understanding for NuttX's scheduling infrastructure.

Previously the function "up_cpu_pause" was used for preventing all other
CPUs from executing code from flash.
The above function was removed in d8cb775.

Now flash operations work on rp2040 in SMP mode by blocking all but the
current CPU for the duration of the critical function (write or erase).

Closes: apache#16203

Signed-off-by: Lars Kruse <devel@sumpfralle.de>
@linguini1
Copy link
Copy Markdown
Contributor

It was a compile-time breakage. The referenced function was simply missing. See #16203. No other file contained a reference to that function.

Yikes, that's worse. CI should have caught it. Might have to add another RP2040 config which includes SmartFS for flash writing so it can be caught.

Probably it is a rare use-case (for now) to write to the internal flash storage.

I think it shouldn't be. My team has done this several times already for rocketry applications and it's a good feature. If it breaks, that's quite bad. Thank you for your fix!

@sumpfralle sumpfralle marked this pull request as draft October 22, 2025 14:22
@sumpfralle
Copy link
Copy Markdown
Contributor Author

Any comments regarding my proposal for fixing the flash write issue on rp2040?

Copy link
Copy Markdown
Contributor

@acassis acassis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please remove Draft and let other to comment

@xiaoxiang781216 xiaoxiang781216 marked this pull request as ready for review October 25, 2025 04:46
@acassis acassis merged commit 8c91052 into apache:master Oct 26, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: arm Issues related to ARM (32-bit) architecture Size: M The size of the change in this PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] rp2040: up_cpu_pause and up_cpu_resume are referenced (these functions were removed before)

3 participants