-
Notifications
You must be signed in to change notification settings - Fork 140
ASoC: hdac_hda: fix deadlock after PCM open error #2208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: hdac_hda: fix deadlock after PCM open error #2208
Conversation
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok but Fixes link sounds wrong?
| snd_hda_codec_pcm_put(pcm); | ||
|
|
||
| return ret; | ||
| return hda_stream->ops.open(hda_stream, &hda_pvt->codec, substream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reference in the commit message seems incorrect, I reverted some changes in 5bd7044 ASoC: soc-dai: revert all changes to DAI startup/shutdown sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart OK, that's indeed a bit complicated history. But it seems in your commit
"ASoC: soc-dai: revert all changes to DAI startup/shutdown sequence"
.. you didn't revert all of the original changes. This part remains and triggers the HDMI problem:
-machine_err:
- i = rtd->num_codecs;
-
codec_dai_err:
- for_each_rtd_codec_dai_rollback(rtd, i, codec_dai)
+ for_each_rtd_codec_dai(rtd, i, codec_dai)
snd_soc_dai_shutdown(codec_dai, substream);
I.e. it removed the rollback and just called shutdown on all DAIs. Hmm, so what is the policy now, should shutdown() expect startup() to have succeeded or not? We've been going back and forth on this now. Looking at DAI impementations, most simply don't care about this and I couldn't find a single instance that would hit bugs like we have in hdac-hda. But for any resource allocation done in startup(), this can be a big issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was only worried about the Fixes tag, you probably want to use the last know fix to avoid any misunderstanding/collisions. Or use two Fixes tag :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart Ack. Probably better to clarify. I'll try to cook up something... not exactly straightforward to explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart I ended up only mentioning your commit. The original patch actually didn't change the semantics yet as it tracked whether startup() had succeeded or not. When you reverted all the startup() tracking, the rollback change stayed in the code. Considering nothing else has stopped working, I think just fixing hdac_hda is a good way forward. Need to keep a close eye on soc-pcm.c changes though. It is pretty easy to break this again.
43cb27e to
56e4563
Compare
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kv2019i
RanderWang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Commit 5bd7044 ("ASoC: soc-dai: revert all changes to DAI startup/shutdown sequence"), introduced a slight change of semantics to DAI startup/shutdown. If startup() returns an error, shutdown() is now called for the DAI. This causes a deadlock in hdac_hda which issues a call to snd_hda_codec_pcm_put() in case open fails. Upon error, soc_pcm_open() will call shutdown(), and pcm_put() ends up getting called twice. Result is a deadlock on pcm->open_mutex, as snd_device_free() gets called from within snd_pcm_open(). Typical task backtrace looks like this: [ 334.244627] snd_pcm_dev_disconnect+0x49/0x340 [snd_pcm] [ 334.244634] __snd_device_disconnect.part.0+0x2c/0x50 [snd] [ 334.244640] __snd_device_free+0x7f/0xc0 [snd] [ 334.244650] snd_hda_codec_pcm_put+0x87/0x120 [snd_hda_codec] [ 334.244660] soc_pcm_open+0x6a0/0xbe0 [snd_soc_core] [ 334.244676] ? dpcm_add_paths.isra.0+0x491/0x590 [snd_soc_core] [ 334.244679] ? kfree+0x9a/0x230 [ 334.244686] dpcm_be_dai_startup+0x255/0x300 [snd_soc_core] [ 334.244695] dpcm_fe_dai_open+0x20e/0xf30 [snd_soc_core] [ 334.244701] ? snd_pcm_hw_rule_muldivk+0x110/0x110 [snd_pcm] [ 334.244709] ? dpcm_be_dai_startup+0x300/0x300 [snd_soc_core] [ 334.244714] ? snd_pcm_attach_substream+0x3c4/0x540 [snd_pcm] [ 334.244719] snd_pcm_open_substream+0x69a/0xb60 [snd_pcm] [ 334.244729] ? snd_pcm_release_substream+0x30/0x30 [snd_pcm] [ 334.244732] ? __mutex_lock_slowpath+0x10/0x10 [ 334.244736] snd_pcm_open+0x1b3/0x3c0 [snd_pcm] Fixes: 5bd7044 ("ASoC: soc-dai: revert all changes to DAI startup/shutdown sequence") BugLink: thesofproject#2159 Signed-off-by: Kai Vehmanen <[email protected]>
56e4563 to
c3063f0
Compare
|
Sorry @plbossart @RanderWang , need to ask for a quick re-review, I put wrong commit-id in the message, now the correct one committed by Mark upstream. |
|
Thank you @RanderWang ! |
Fixes: #2159
Fixes: #2084
In commit b56be80 ("ASoC: soc-pcm: call
snd_soc_dai_startup()/shutdown() once"), the handling of DAI startup
failures was changed such that shutdown is always called.
This causes a deadlock in hdac_hda which had a call to
snd_hda_codec_pcm_put() in case open failed. Upon error, soc_pcm_open()
will call shutdown(), so pcm_put() gets called twice. This leads to a
deadlock on pcm->open_mutex, as snd_device_free() gets called from
within snd_pcm_open(). Typical task backtrace looks like this:
[ 334.244627] snd_pcm_dev_disconnect+0x49/0x340 [snd_pcm]
[ 334.244634] __snd_device_disconnect.part.0+0x2c/0x50 [snd]
[ 334.244640] __snd_device_free+0x7f/0xc0 [snd]
[ 334.244650] snd_hda_codec_pcm_put+0x87/0x120 [snd_hda_codec]
[ 334.244660] soc_pcm_open+0x6a0/0xbe0 [snd_soc_core]
[ 334.244676] ? dpcm_add_paths.isra.0+0x491/0x590 [snd_soc_core]
[ 334.244679] ? kfree+0x9a/0x230
[ 334.244686] dpcm_be_dai_startup+0x255/0x300 [snd_soc_core]
[ 334.244695] dpcm_fe_dai_open+0x20e/0xf30 [snd_soc_core]
[ 334.244701] ? snd_pcm_hw_rule_muldivk+0x110/0x110 [snd_pcm]
[ 334.244709] ? dpcm_be_dai_startup+0x300/0x300 [snd_soc_core]
[ 334.244714] ? snd_pcm_attach_substream+0x3c4/0x540 [snd_pcm]
[ 334.244719] snd_pcm_open_substream+0x69a/0xb60 [snd_pcm]
[ 334.244729] ? snd_pcm_release_substream+0x30/0x30 [snd_pcm]
[ 334.244732] ? __mutex_lock_slowpath+0x10/0x10
[ 334.244736] snd_pcm_open+0x1b3/0x3c0 [snd_pcm]
Fixes: b56be80 ("ASoC: soc-pcm: call snd_soc_dai_startup()/shutdown() once")
BugLink: #2159
Signed-off-by: Kai Vehmanen [email protected]