Skip to content

Conversation

@kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Jun 17, 2020

Fixes: #2159
Fixes: #2084

In commit b56be80 ("ASoC: soc-pcm: call
snd_soc_dai_startup()/shutdown() once"), the handling of DAI startup
failures was changed such that shutdown is always called.

This causes a deadlock in hdac_hda which had a call to
snd_hda_codec_pcm_put() in case open failed. Upon error, soc_pcm_open()
will call shutdown(), so pcm_put() gets called twice. This leads to a
deadlock on pcm->open_mutex, as snd_device_free() gets called from
within snd_pcm_open(). Typical task backtrace looks like this:

[ 334.244627] snd_pcm_dev_disconnect+0x49/0x340 [snd_pcm]
[ 334.244634] __snd_device_disconnect.part.0+0x2c/0x50 [snd]
[ 334.244640] __snd_device_free+0x7f/0xc0 [snd]
[ 334.244650] snd_hda_codec_pcm_put+0x87/0x120 [snd_hda_codec]
[ 334.244660] soc_pcm_open+0x6a0/0xbe0 [snd_soc_core]
[ 334.244676] ? dpcm_add_paths.isra.0+0x491/0x590 [snd_soc_core]
[ 334.244679] ? kfree+0x9a/0x230
[ 334.244686] dpcm_be_dai_startup+0x255/0x300 [snd_soc_core]
[ 334.244695] dpcm_fe_dai_open+0x20e/0xf30 [snd_soc_core]
[ 334.244701] ? snd_pcm_hw_rule_muldivk+0x110/0x110 [snd_pcm]
[ 334.244709] ? dpcm_be_dai_startup+0x300/0x300 [snd_soc_core]
[ 334.244714] ? snd_pcm_attach_substream+0x3c4/0x540 [snd_pcm]
[ 334.244719] snd_pcm_open_substream+0x69a/0xb60 [snd_pcm]
[ 334.244729] ? snd_pcm_release_substream+0x30/0x30 [snd_pcm]
[ 334.244732] ? __mutex_lock_slowpath+0x10/0x10
[ 334.244736] snd_pcm_open+0x1b3/0x3c0 [snd_pcm]

Fixes: b56be80 ("ASoC: soc-pcm: call snd_soc_dai_startup()/shutdown() once")
BugLink: #2159
Signed-off-by: Kai Vehmanen [email protected]

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok but Fixes link sounds wrong?

snd_hda_codec_pcm_put(pcm);

return ret;
return hda_stream->ops.open(hda_stream, &hda_pvt->codec, substream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reference in the commit message seems incorrect, I reverted some changes in 5bd7044 ASoC: soc-dai: revert all changes to DAI startup/shutdown sequence

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart OK, that's indeed a bit complicated history. But it seems in your commit
"ASoC: soc-dai: revert all changes to DAI startup/shutdown sequence"
.. you didn't revert all of the original changes. This part remains and triggers the HDMI problem:

-machine_err:
-       i = rtd->num_codecs;
-
 codec_dai_err:
-       for_each_rtd_codec_dai_rollback(rtd, i, codec_dai)
+       for_each_rtd_codec_dai(rtd, i, codec_dai)
                snd_soc_dai_shutdown(codec_dai, substream);

I.e. it removed the rollback and just called shutdown on all DAIs. Hmm, so what is the policy now, should shutdown() expect startup() to have succeeded or not? We've been going back and forth on this now. Looking at DAI impementations, most simply don't care about this and I couldn't find a single instance that would hit bugs like we have in hdac-hda. But for any resource allocation done in startup(), this can be a big issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only worried about the Fixes tag, you probably want to use the last know fix to avoid any misunderstanding/collisions. Or use two Fixes tag :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart Ack. Probably better to clarify. I'll try to cook up something... not exactly straightforward to explain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart I ended up only mentioning your commit. The original patch actually didn't change the semantics yet as it tracked whether startup() had succeeded or not. When you reverted all the startup() tracking, the rollback change stayed in the code. Considering nothing else has stopped working, I think just fixing hdac_hda is a good way forward. Need to keep a close eye on soc-pcm.c changes though. It is pretty easy to break this again.

plbossart
plbossart previously approved these changes Jun 18, 2020
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kv2019i

RanderWang
RanderWang previously approved these changes Jun 23, 2020
Copy link

@RanderWang RanderWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Commit 5bd7044 ("ASoC: soc-dai: revert all changes to DAI
startup/shutdown sequence"), introduced a slight change of semantics
to DAI startup/shutdown. If startup() returns an error, shutdown()
is now called for the DAI.

This causes a deadlock in hdac_hda which issues a call to
snd_hda_codec_pcm_put() in case open fails. Upon error, soc_pcm_open()
will call shutdown(), and pcm_put() ends up getting called twice. Result
is a deadlock on pcm->open_mutex, as snd_device_free() gets called from
within snd_pcm_open(). Typical task backtrace looks like this:

[  334.244627]  snd_pcm_dev_disconnect+0x49/0x340 [snd_pcm]
[  334.244634]  __snd_device_disconnect.part.0+0x2c/0x50 [snd]
[  334.244640]  __snd_device_free+0x7f/0xc0 [snd]
[  334.244650]  snd_hda_codec_pcm_put+0x87/0x120 [snd_hda_codec]
[  334.244660]  soc_pcm_open+0x6a0/0xbe0 [snd_soc_core]
[  334.244676]  ? dpcm_add_paths.isra.0+0x491/0x590 [snd_soc_core]
[  334.244679]  ? kfree+0x9a/0x230
[  334.244686]  dpcm_be_dai_startup+0x255/0x300 [snd_soc_core]
[  334.244695]  dpcm_fe_dai_open+0x20e/0xf30 [snd_soc_core]
[  334.244701]  ? snd_pcm_hw_rule_muldivk+0x110/0x110 [snd_pcm]
[  334.244709]  ? dpcm_be_dai_startup+0x300/0x300 [snd_soc_core]
[  334.244714]  ? snd_pcm_attach_substream+0x3c4/0x540 [snd_pcm]
[  334.244719]  snd_pcm_open_substream+0x69a/0xb60 [snd_pcm]
[  334.244729]  ? snd_pcm_release_substream+0x30/0x30 [snd_pcm]
[  334.244732]  ? __mutex_lock_slowpath+0x10/0x10
[  334.244736]  snd_pcm_open+0x1b3/0x3c0 [snd_pcm]

Fixes: 5bd7044 ("ASoC: soc-dai: revert all changes to DAI startup/shutdown sequence")
BugLink: thesofproject#2159
Signed-off-by: Kai Vehmanen <[email protected]>
@kv2019i kv2019i dismissed stale reviews from RanderWang and plbossart via c3063f0 June 23, 2020 10:39
@kv2019i kv2019i force-pushed the topic/jsl-hdmi-fix branch from 56e4563 to c3063f0 Compare June 23, 2020 10:39
@kv2019i kv2019i requested review from RanderWang and plbossart June 23, 2020 10:40
@kv2019i
Copy link
Collaborator Author

kv2019i commented Jun 23, 2020

Sorry @plbossart @RanderWang , need to ask for a quick re-review, I put wrong commit-id in the message, now the correct one committed by Mark upstream.

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jun 24, 2020

Thank you @RanderWang !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants