Skip to content

Conversation

@morgolock
Copy link
Contributor

  • The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (scalar CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop.

  • Resolves ARMCL-1227

Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705

* The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (scalar CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop.

* Resolves ARMCL-1227

Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705
Signed-off-by: Pablo Marquez Tello <[email protected]>
* @publicapi
*/

#include "arm_compute/core/Types.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which types are needed from this file?

*/
void configure(const ITensor *predictions, const ITensor *targets, ITensor *output, const unsigned int k);

/** Static function to check if given info will lead to a valid configuration of @ref CPPTopKVKernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CpuTopKVKernel

#include "src/core/helpers/WindowHelpers.h"
#include "src/cpu/kernels/topkv/list.h"

#include <array>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector? also we need to add cstdint

void
configure(const ITensorInfo *predictions, const ITensorInfo *targets, ITensorInfo *output, const unsigned int k);

/** Static function to check if given info will lead to a valid configuration of @ref CPPTopKVKernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just say "similar to @ref CpuTopKV::configure()" to reduce duplication? Have a look at arm_compute/runtime/experimental/operators/CpuActivation.h

Same comment is valid for every header

ARM_COMPUTE_TRACE_EVENT(ARM_COMPUTE_PROF_CAT_CPU, ARM_COMPUTE_PROF_LVL_CPU, "CpuTopKV::run");
ARM_COMPUTE_ERROR_ON_MSG(tensors.empty(), "No inputs provided");

NEScheduler::get().schedule_op(_kernel.get(), 0, _kernel->window(), tensors);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of 0, we use Window::DimX

*reinterpret_cast<const T *>(predictions(Coordinates{static_cast<int>(c), static_cast<int>(i)}));
const float v = static_cast<float>(vt);

// Mirror CPPTopKVKernel: (a - b > epsilon)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

{

template <typename T>
SimpleTensor<uint8_t> topkv(SimpleTensor<T> &predictions, SimpleTensor<uint32_t> &targets, uint32_t k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to include

  • SimpleTensor.h
  • TensorShape.h
  • cstdint (because of int types)
  • CoreTypes.h for DataType
  • Coordinates.h
    Anything else? We don't need the big Types.h file

#include "tests/IAccessor.h"
#include "tests/validation/Helpers.h"
#include "tests/validation/reference/TopKV.h"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to include cstdint, type_traits, random and algorithm STL headers for the functions used here.

#include "tests/AssetsLibrary.h"
#include "tests/framework/Asserts.h"
#include "tests/framework/Fixture.h"
#include "tests/framework/ParametersLibrary.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need

  • ParametersLibrary.h
  • IAccessor.h
  • tests/validation/Helpers.h

#define ACL_TESTS_VALIDATION_FIXTURES_TOPKVLAYERFIXTURE_H

#include "arm_compute/core/TensorShape.h"
#include "arm_compute/core/Types.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need anything particular from this file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants