-
Notifications
You must be signed in to change notification settings - Fork 813
feat: Add NETopKV function. #1251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cb2af0c to
1f0778c
Compare
* The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (scalar CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop. * Resolves ARMCL-1227 Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705 Signed-off-by: Pablo Marquez Tello <[email protected]>
| * @publicapi | ||
| */ | ||
|
|
||
| #include "arm_compute/core/Types.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which types are needed from this file?
| */ | ||
| void configure(const ITensor *predictions, const ITensor *targets, ITensor *output, const unsigned int k); | ||
|
|
||
| /** Static function to check if given info will lead to a valid configuration of @ref CPPTopKVKernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CpuTopKVKernel
| #include "src/core/helpers/WindowHelpers.h" | ||
| #include "src/cpu/kernels/topkv/list.h" | ||
|
|
||
| #include <array> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector? also we need to add cstdint
| void | ||
| configure(const ITensorInfo *predictions, const ITensorInfo *targets, ITensorInfo *output, const unsigned int k); | ||
|
|
||
| /** Static function to check if given info will lead to a valid configuration of @ref CPPTopKVKernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just say "similar to @ref CpuTopKV::configure()" to reduce duplication? Have a look at arm_compute/runtime/experimental/operators/CpuActivation.h
Same comment is valid for every header
| ARM_COMPUTE_TRACE_EVENT(ARM_COMPUTE_PROF_CAT_CPU, ARM_COMPUTE_PROF_LVL_CPU, "CpuTopKV::run"); | ||
| ARM_COMPUTE_ERROR_ON_MSG(tensors.empty(), "No inputs provided"); | ||
|
|
||
| NEScheduler::get().schedule_op(_kernel.get(), 0, _kernel->window(), tensors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of 0, we use Window::DimX
| *reinterpret_cast<const T *>(predictions(Coordinates{static_cast<int>(c), static_cast<int>(i)})); | ||
| const float v = static_cast<float>(vt); | ||
|
|
||
| // Mirror CPPTopKVKernel: (a - b > epsilon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
| { | ||
|
|
||
| template <typename T> | ||
| SimpleTensor<uint8_t> topkv(SimpleTensor<T> &predictions, SimpleTensor<uint32_t> &targets, uint32_t k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to include
- SimpleTensor.h
- TensorShape.h
- cstdint (because of int types)
- CoreTypes.h for DataType
- Coordinates.h
Anything else? We don't need the big Types.h file
| #include "tests/IAccessor.h" | ||
| #include "tests/validation/Helpers.h" | ||
| #include "tests/validation/reference/TopKV.h" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to include cstdint, type_traits, random and algorithm STL headers for the functions used here.
| #include "tests/AssetsLibrary.h" | ||
| #include "tests/framework/Asserts.h" | ||
| #include "tests/framework/Fixture.h" | ||
| #include "tests/framework/ParametersLibrary.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need
- ParametersLibrary.h
- IAccessor.h
- tests/validation/Helpers.h
| #define ACL_TESTS_VALIDATION_FIXTURES_TOPKVLAYERFIXTURE_H | ||
|
|
||
| #include "arm_compute/core/TensorShape.h" | ||
| #include "arm_compute/core/Types.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need anything particular from this file?
1f0778c to
cb1481c
Compare
The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (scalar CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop.
Resolves ARMCL-1227
Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705