-
Notifications
You must be signed in to change notification settings - Fork 140
ASoC: SOF: rework FW ABI version handling on kernel side #3296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: rework FW ABI version handling on kernel side #3296
Conversation
|
Having a single version declared in kernel has not made sense in a long time. E.g. SOF trace.c already has code for ABI3.20, but public header has 3.18. Most of the kernel patches get backported to stable kernels, so this creates quite a mess in the end. Related discussions:
ABI change policy: FYI @RanderWang @bardliao @ujfalusi -- this impacts how we merge IPC4 support |
sound/soc/sof/ipc.c
Outdated
| return -EINVAL; | ||
| } | ||
| if (IS_ENABLED(CONFIG_SND_SOC_SOF_STRICT_ABI_CHECKS) && | ||
| SOF_ABI_VERSION_MINOR(v->abi_version) > SOF_ABI_MINOR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but it seems that we don't check ABI_MAJOR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bardliao It's checked earlier, a couple lines above this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i, it is a great thing to move the ABI number from uapi header, but do we really want it scattered around in three files?
Since we toss everything under sof-priv.h, can we do the same with this or have a variable in struct sof_dev and set it up during probe?
Also: there is a checkpatch warning for a typo.
We are supposed to check what the firmware is supporting, so I don't think it is going to affect things that much. Note: One way would be to 'lock' the major to 3 for IPC3 and 4 for IPC4 and use the minor/patch for feature selection, but we do not have feature parity between IPC3 and 4. |
That's a good question. It's just a very practical concern. Let's say ABI3.20+n has two changes, a new tplg token that needs parsing support, and new IPC field to some message. It is very unlikely that you will make a single kernel commit to update both. So in which commit you'd update the version in sof-priv? Now the norm is that no commit updates the version. When these get backported by e.g. Sasha Levin's scripts, there's no guarantee sof-priv.h value is accurate anymore. A number that is local to a single source-file, would seem much more easy to manage. If you update the tplg parser, you update the version in that commit. If ipc.c is updated, the version is bumped separately. If/when patches are backported, the correct versions comes along. My first version dropped the versions altogether. I'd actually maybe even prefer this. I kept them back as e.g. there's some code in control.c that actually sends the version kernel uses to FW. This seems a bit backwards as the version is supposed to be the FW version. Maybe we just hardcode a single version in control.c to work around this. |
To continue the line of thought, the use of a given IPC (if it can be used) only depends on the FW.
Right, but then the whole thing gots a bit silly. Imho.
Do you know how the firmware uses this information? Is it then handles messages in a different way or something? This two way version check is certainly interesting and it might be just enough if the kernel checks the fw version and throws up it's hands when it can not deal with a given version? |
|
@ujfalusi wrote:
That hasn't stopped us before. :) But yeah, I you are probably right. Let me work out a version that completely removes them. That will mean to drop CONFIG_SND_SOC_SOF_STRICT_ABI_CHECKS as well and there has been resistence in the past to that. But let's see how it looks like.
I think this is only used to extend the interface in a backwards-compatible way. Only examples I found are in smart_amp.c and codec_adapter.c. And these look a bit odd. If the version is not correct, an error is returned. But how does the ABI version relate to the binary blob content? I have no idea. Perhaps we need to dig a bit into the history of these parts. git blame points to certain @ufjalusi dude in commit 756bbe4 :) Other instance is from @RDharageswari , any insight on why version was used? Smart-amp is Maxim code (@ryans-lee commit 26875c7) and codec adapter is from @mrajwa (commit 969d008 in SOF) . Any inputs? |
If I could re-phrase the silly part...
I think the main problem is distro kernels vs linux-firmware, they pull in newer firmware easier than update the kernel. Or the firmware is in ROM.
That part is copied from the existing component get code... |
The definitions in sof/abi.h have been a verbatim copy of
the firmware header definition.
The header also defines a version and this has caused issues:
- kernel declares one version but as the interface supports
backwards-compatible minor version updates, many firmware
versions can be supported by any one kernel
- a single firmware MINOR version bump can cover multiple
changes to the interface and this can cause complex
dependencies to kernel patch backporting as kernel will
have multiple commits to support extensions identified
by a single MAJOR.MINOR version of firmware
Simplify the driver side definitions by removing the version
information from kernel side abi.h header. Also remove strict
version checks as these only make sense if kernel is updated
in an atomic way for each firmware release (which is not the case
especially in backported trees). Only leave check on the major
version number in place.
In control IPC implementation, kernel needs to send the firmware version
it implements. This is not widely used, but to keep the few existing
firmware components happy, declare a local SOF_CONTROL_ABI_VERSION that
matches the control implementation and send that in the IPC message.
If the firmware and its topology are newer than kernel, error
may occur in topology parsing. E.g. if a new DAI type is added, kernel
will return an error and is unable to pass this information to
the firmware. A warning was emitted in this case prior to this patch,
but that is no longer possible. This is considered an acceptable
compromise to simplify the version management.
BugLink: thesofproject/sof#4986
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
2f3a187 to
2c6c3fe
Compare
|
@ujfalusi wrote:
But that we have already done before. Basicly as long as major is the same, we can handle it. The most complicated part is how we handle the topology. But there too, if we detect tplg constructs we don't know how to parse, we raise an error and refuse to load (although this is not 100% perfect -- if a completely new type of data is added, old kernel just doesn't know how to pass that to FW). But this mostly affects new platforms (like the new platforms we recently added to SOF), but in these cases the distribution anyways has to update kernel to support these new platforms.
I uploaded a new version that just adds a local definition to use when sending control IPCs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I uploaded a new version that just adds a local definition to use when sending control IPCs.
Please update the github description (and commit message?). I was looking at 2c6c3fe and I saw nothing added to ipc.c or topology.c, only removed code. Confused me.
If you update the tplg parser, you update the version in that commit. If ipc.c is updated, the version is bumped separately. If/when patches are backported, the correct versions comes along.
Makes sense to me, each feature is "labelled" with the first version it was implemented it. All Python APIs are labelled like this.
kept them back as e.g. there's some code in control.c that actually sends the version kernel uses to FW
This seems a bit backwards as the version is supposed to be the FW version.
Most of the evidence in thesofproject/sof#4986 (comment) points at the ABI being the kernel version, not the firmware version!?
In reality I think the ABI is neither the kernel nor the firmware version. The "ABI" is versioning an abstract list of features shared between the two. semver.org is limited to asymmetrical, one-way backward compatibility whereas the ABI seems trying to achieve something much more symmetrical.
Most people miss this because they only look at the C structs in shared memory but this so-called "ABI" is actually a network protocol. Calling it a protocol makes its (symmetrical) nature and the current problem statements much more obvious.
I already tried to name it a "protocol" in July 2020 but this more accurate name did not catch on ;-)
thesofproject/sof#2920 (review)
| v->micro, v->tag); | ||
| dev_info(sdev->dev, | ||
| "Firmware: ABI %d:%d:%d Kernel ABI %d:%d:%d\n", | ||
| "Firmware: ABI %d:%d:%d\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Firmware: ABI %d:%d:%d\n", | |
| "Firmware ABI: %d.%d.%d\n", |
| man->priv.data[0], man->priv.data[1], | ||
| man->priv.data[2], SOF_ABI_MAJOR, SOF_ABI_MINOR, | ||
| SOF_ABI_PATCH); | ||
| dev_info(scomp->dev, "Topology: ABI %d:%d:%d\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dev_info(scomp->dev, "Topology: ABI %d:%d:%d\n", | |
| dev_info(scomp->dev, "Topology ABI: %d.%d.%d\n", |
| #include "sof-audio.h" | ||
|
|
||
| /* Interface version implemented for controls IPC */ | ||
| #define SOF_CONTROL_ABI_VERSION SOF_ABI_VER(3, 18, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the authoritative list of new features added to each MINOR increment? Is it https://github.com/orgs/thesofproject/projects/2? Either way, add a link to it somewhere in the source code?
|
@marc-hb wrote:
Done. @marc-hb wrote:
I removed this in latest version (and updated the commit). While I think would be a very pragmatic approach, it's true this would be quite adhoc as this is not common practise in kernel (unlike the case in Python where the practise is the established method).
I do agree it is indeed a protocol, but with regards to naming, the cat is already out of the bag I think. And this is not so extra-ordinary -- no ABI lives in isolation. E.g. there's the libc ABI or the syscall interface, but these are not just structs either, but it's an interface with established rules in addition to the binary struct definitions. On the roadmap we do the the tuples based IPC. This has been on hold for now ( thesofproject/sof#3207 ) , but still I think this is where we are going in the future. And note this is described in more protocol terms. Looking at the ABI extensions of past year (https://github.com/orgs/thesofproject/projects/2 ), I'm more certain we should go and remove a single "implements version X" from kernel. Rather we have kernel commits that implement standalone extensions that can be backported to old kernels (on their own) and have sane errors if either kernel or FW does not have some extension while the other side does. Really the only concern I see (in merging this PR), is some cases of topoligy ABI extensions where the kernel is unable to provide a meaningful error. Some cases are ok. Example, adding "Smart Amp" support in 3.16. The kernel side was implemented in commit "ASoC: SOF: topology: add support to smart amplifier". If this kernel commit is missing in the backport and user upgrades the FW and its topology to a newer version and uses a topology that uses Smart Amp, no warning would be printed by kernel (assuming this PR is meged). User would still get a kernel error "error: process loading failed" and driver probe would fail. OTOH, if the smart amp patch would be backported to an older kernel, the user would likely still get a warning in kernel dmesg as the commit to bump the version was not cherry-picked. There were 5 separate ABI extensions in 3.16. It would not be possible to put all the kernel changes in one commit. Same would happen for the recent change to add AMD and MTK platform support (ABI3.19 and ABI3.20). |
| return -EINVAL; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really mind if we remove the checks, but now how do we update the shared files when there is a change to the IPC structures? This was precisely when we bumped the ABI on the kernel side, so now there's no information to tell you what the kernel WILL support.
Not convinced or missing the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart Problem is the kernel update to a new ABI is not atomic. FW is released once per quarter with multiple distinct changes covered by a single ABI change. Sometimes kernel will only implement a subset and cirtically, sometimes a kernel commit depending on ABI extension will get selectively backported (like will happen for the stop-dma fix which requires 3.20).
In this context, a single ABI version for kernel side implementation is rather impossible to maintain. We get problems like this: #3298 (comment) (i.e. can we declare ABI3.20 now)
We could change the approach and make the kernel version just refer to the shared files. That could be done more systematically, but would that be useful. The fact that headers are updated does not imply the implementation is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happy with a kernel level version that refers only to the header files, to help with an elimination process when dealing with bug reports.
If these files do not have the relevant version, then for sure the new features exposed by the firmware will not be used. That's already useful information.
If the files do have the relevant version, we would have to check if all the relevant patches provide the implementation for the ABI change, as you rightly noted it's not an atomic update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart W.r.t. the option of tying the version to header alone. This will have some implications to our kernel upstreaming process. Let's say we have ABI extensions A, B and C in SOF2.1, bumping ABI to 3.21. Let's say we implement feature A in January and have kernel patches adding the definitions to kernel, and implementation in sof-dev. At this point, the version in kernel header is not correct as it only has the change of A, so it's a mix of ABU3.20 and ABI3.21. They don't match the header of same version in firmware (as SOF2.1 is not yet complete).
Plus we need to have a maintainer check the ABI and make a patch for every SOF release (e.g. my #3298 ).
If we are ok with this, then I can update #3298 and close this.
UPDATE: more issues. I noticed we are missing header bits in kernel. E.g. ABI3.19 added IMR support in firmware, but we have no support for it in kernel yet. (e.g. SOF_IPC_INFO_D3_PERSISTENT bit definition). So in practise sof-dev kernel has bits of ABI3.20 (stop-dma) but it does not yet have patches for ABI3.19. So I'd now have to go and look at the delta manually to make a kernel patch to bump the version of the interface files? Yuk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we'll never get it completely right, but we can treat this value as informative for debug.
ok to remove the strict checks and the warning, we can make it an info level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart We do still need to update the ABI process documentation to ensure the files get updated. Otherwise the data will be incorrect and not useful even for debug. I don't have time to do this now, so I'll close this PR. There also seems to be IPC4 changes coming that will anyways change the headers, so let's focus on those.
| return; | ||
|
|
||
| /* set the ABI header values */ | ||
| cdata->data->magic = SOF_ABI_MAGIC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for ease in the future for changing ABI, we should probably use an "init_ipc" function so we only have to change in one place
| #define SOF_MAX_DSP_NUM_CORES 8 | ||
|
|
||
| /* IPC interface version implemented in driver */ | ||
| #define SOF_ABI_VERSION SOF_ABI_VER(3, 0, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we certain the kernel handles all versions back this far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this is correct @kv2019i. We do need to the minor versions I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I take that back. I got confused with the firmware ABI version which is stored in sdev. This is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cujomalainey The ABI process has evolved quite a bit since ABI3.0.0, but yes, we have no known issues with backward compatibility. In early times, the MINOR was bumped many times in the a single release window and first SOF release with major set to 3 was SOF1.3 and this was ABI3.7.0. So we have no public releases that are older than this. SOF1.4 was ABI3.11 and that is probably still in use in some systems (some distros still ship old firmware binaries but users update the kernel, so this will actually happen).
The definitions in sof/abi.h have been a verbatim copy of
the firmware header definition.
The header also defines a version and this has caused issues:
backwards-compatible minor version updates, many firmware
versions can be supported by any one kernel
changes to the interface and this can cause complex
dependencies to kernel patch backporting as kernel will
have multiple commits to support extensions identified
by a single MAJOR.MINOR version of firwmare
(see description in git commit, updated multiple times)
BugLink: thesofproject/sof#4986
Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com