-
Notifications
You must be signed in to change notification settings - Fork 140
ASoC: SOF: Intel: BYT: refine the IPC interrupt handling #1490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I have only tested this on BSW so wait to see the on device test result on baytrail from the CI. |
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not even going to look at this as is.
You are changing the entire IPC without a shred of explanations and using registers in completely different ways.
Please.
Let's be serious, shall we?
| SHIM_IMRX_DONE); | ||
|
|
||
| if ((ipcx & SHIM_BYT_IPCX_DONE) && | ||
| (imrx & SHIM_IMRX_DONE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you changing this?
| SHIM_IMRX_BUSY); | ||
|
|
||
| if ((ipcd & SHIM_BYT_IPCD_BUSY) && | ||
| (imrx & SHIM_IMRX_BUSY)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keyonjie I get the rationale for masking the IPC's before waking the thread but you did not answer @plbossart 's question about why you are changing this.
In my opinion, this check for imrx & SHIM_IMRX_BUSY or imrx & SHIM_IMRX_DONE in the thread is meaningless. You would never get to the thread if they werent true with your changes.
| snd_sof_dsp_update_bits64_unlocked(sdev, BYT_DSP_BAR, | ||
| SHIM_IMRX, | ||
| SHIM_IMRX_DONE, | ||
| SHIM_IMRX_DONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not touching this PR without a document explaining the ideas.
Sorry, I foresaw this challenge, but I thought I had explained clearly in the commit message. Let me think more about how to make it more straight-forward. |
We should mask interrupt once we got it in the handler, otherwise, we may get interrupt storm in case where the irq thread has no chance to be scheduled. Signed-off-by: Keyon Jie <[email protected]>
|
@keyonjie why not disable IPC interrupt (IMRX/IMRD?) in the handler and disable in the thread just like we do for HDA? |
Yes and no here. "why not disable IPC interrupt (IMRX/IMRD?) in the handler?" Yes for this, my PR is changing this to align with SKL+. "and disable in the thread" No for this in my PR, In BYT, there isn't a common bit for enabling/disabling IPC interrupt like HDA_DSP_ADSPIS_IPC in HDA case, that means we have to use IMRX for that. |
And is that a problem? |
|
@keyonjie We've worked with BYT for a very long time, so the bar to change anything is very very high now. |
|
Hi @plbossart @ranj063 I just filed an issue here: Where you will see the issue very straight-forward. |
Understood, sorry I worked on SKL+ at that time and didn't pay more attention to BYT refining. Please see the issue I filed here: #1492 |
I've run the module load/unload tests on Baytrail before, no issues. |
|
@plbossart as I commented in the issue #1492, both CHT and minnow board have the similar risk I described on #1492, we got multiple entries to the interrupt handler for a single interrupt, though it looks fine at module unload/reload on them. That's why we need to change the IPC interrupt handling here. see the dmesg logs below: On CHT: [ 6.469040] sof-audio-acpi 808622A8:00: ipc tx: 0x30030000 [ 12.833394] sof-audio-acpi 80860F28:01: booting DSP firmware |
|
@keyonjie Can we first try and understand the initial blocker, which was to deal with module load/unload? There does not seem to be any correlation nor causation between IPC issues and module removals. |
Per my understanding, there could be, I believe the scenario we hang there is because of a IPC interrupt storm happened. The fact that applying these will fix the hang issue proved that. And to say the least, I actually can't bear this kind of obvious risk in the code, can you? |
In God we trust, others bring data. Your theory of IPC rain storm may be right, but why doesn't it impact other platforms then? It needs to be backed by either experimental evidence or demonstration that the hang is removed with your fixes AND there is no regression on other platforms. I am not going to change one line of IPC code without careful explanation and extensive testing. The code works, maybe not optimally, on all other platforms, and has done so for many months, so the 'obvious risk' is not so obvious, sorry. The last 'obvious' IPC fix was reverted due to other issues which exposed an obvious lack of extensive testing. Not going to happen twice, sorry. It's fine to experiment, it's a completely different story to submit a PR and ask that it be merged. This PR does not have a 'Draft' or 'TEST' or 'RFC' status, so I will push back if I don't feel comfortable with the its maturity. |
I totally understand your point and I appreciate to that, IPC is a crucial part so any change to it should pass stress test, this is the right attitude to guarantee our code quality. Let me change the subject and call for more test on BYT/CHT/BSW. |
|
@mengdonglin @keqiaozhang Can I ask for a stress/extensive test on BTY/CHT/BSW with this PR applied, it is not expected to fix issues on BYT/CHT, so no regression with it applied is good enough to me. |
|
I ran one hour of stress-test. I don't see any regression issue. |
|
@fredoh9 Thank you. @plbossart @ranj063 The test on my side looks good on cht and bsw also. |
|
@plbossart @ranj063 Can you consider taking this one please, the logs in #1492 shows the risk without this and test shows no regression with the PR. |
no. I don't have a full picture of why this is needed and what your fix accomplishes and what it actually fixes, and if we have similar issues with other platforms. |
Why this is needed: What my fix accomplishes: What it actually fixes: Similar issues with other platforms? |
ok, let me make my point very simple: please work on D0i3 support as your first priority. Multi-tasking across completely unrelated platforms is inefficient. I will not even look at baytrail IPC until D0i3 is complete, merged and fully tested. |
That's true, thanks for your clear point. |
|
replaced by PR #2138 |
This is to fix #1492
We should mask interrupt once we got it in the handler, otherwise, we
may get interrupt storm in case where the irq thread has no chance to be
scheduled.
Signed-off-by: Keyon Jie [email protected]