There currently seem to be quite a few issues with StaticArrays on many different back-ends. In most cases this is inefficiency due to inappropriate allocations (sometimes quite severe), in other cases there are outright errors. This issue is to document the various problems. Note that very few, if any of these are actually issues with DifferentiationInterface.jl itself, but rather with the back-ends.
In what follows we will use
f1(x) = sum(x .* x)
f2(x) = x .* x
testarray() = @SVector ones(4)
x = testarray()
Enzyme
Improper Allocation in Gradient
◖◗ @btime gradient($f1, $(AutoEnzyme()), $x)
107.511 ns (9 allocations: 688 bytes)
4-element SVector{4, Float64} with indices SOneTo(4):
2.0
2.0
2.0
2.0
This is likely due to insufficient specialization in Enzyme.gradient for StaticArrays. I have confirmed that a raw Enzyme.autodiff is efficient and does not allocate. I'm attempting to address this, among other things in this PR.
Invalid Construction in Jacobian
◖◗ @btime jacobian($f2, $(AutoEnzyme()), $x)
ERROR: DimensionMismatch: No precise constructor for SVector{4, Float64} found. Length of input was 1.
Stacktrace:
[1] _no_precise_size(SA::Type, x::Tuple{Float64})
@ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:169
[2] length_match_size(::Type{SVector{4, Float64}}, x::Tuple{Float64})
@ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:138
[3] adapt_size(::Type{SVector{4, Float64}}, x::Tuple{Float64})
@ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:107
[4] construct_type(::Type{SVector{4, Float64}}, x::Tuple{Float64})
@ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:95
[5] Type
@ ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:175 [inlined]
[6] #3
@ ~/.julia/packages/Enzyme/OOd6p/ext/EnzymeStaticArraysExt.jl:16 [inlined]
[7] macro expansion
@ ./ntuple.jl:72 [inlined]
[8] ntuple
@ ./ntuple.jl:69 [inlined]
[9] onehot
@ ~/.julia/packages/Enzyme/OOd6p/ext/EnzymeStaticArraysExt.jl:14 [inlined]
[10] #98
@ ~/.julia/packages/Enzyme/OOd6p/src/Enzyme.jl:1149 [inlined]
[11] ntuple
@ ./ntuple.jl:48 [inlined]
[12] chunkedonehot
@ ~/.julia/packages/Enzyme/OOd6p/src/Enzyme.jl:1147 [inlined]
[13] prepare_jacobian(f::Function, backend::AutoEnzyme{Nothing, Nothing}, x::SVector{4, Float64})
@ DifferentiationInterfaceEnzymeExt ~/.julia/packages/DifferentiationInterface/cuZBh/ext/DifferentiationInterfaceEnzymeExt/forward_onearg.jl:131
[14] jacobian(f::typeof(f2), backend::AutoEnzyme{Nothing, Nothing}, x::SVector{4, Float64})
@ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/cuZBh/src/fallbacks/no_extras.jl:9
[15] var"##core#509"(f2#506::typeof(f2), 507::AutoEnzyme{Nothing, Nothing}, x#508::SVector{4, Float64})
@ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:561
[16] var"##sample#510"(::Tuple{typeof(f2), AutoEnzyme{Nothing, Nothing}, SVector{4, Float64}}, __params::BenchmarkTools.Parameters)
@ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:570
[17] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; maxevals::Int64, kwargs::@Kwargs{})
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:187
[18] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:182
[19] #invokelatest#2
@ ./essentials.jl:892 [inlined]
[20] invokelatest
@ ./essentials.jl:889 [inlined]
[21] #lineartrial#46
@ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:51 [inlined]
[22] lineartrial
@ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:50 [inlined]
[23] tune!(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, verbose::Bool, pad::String, kwargs::@Kwargs{})
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:300
[24] tune!
@ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289 [inlined]
[25] tune!(b::BenchmarkTools.Benchmark)
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289
[26] top-level scope
@ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:666
This error occurs when an insufficiently narrow StaticArrays type is used as a constructor. Again, this calls for more specialization within Enzyme. This may be fixed by this PR.
ForwardDiff (solved by #414 !)
Improper Allocation In Gradient
◖◗ @btime gradient($f1, $(AutoForwardDiff()), $x)
247.977 ns (6 allocations: 608 bytes)
4-element SVector{4, Float64} with indices SOneTo(4):
2.0
2.0
2.0
2.0
This seems to be due to type instability in ForwardDiff.GradientConfig which that package mostly relies on the compiler eliding, but which does not get elided during its use in DifferentiationInterface.jl. In my opinion that flaw runs pretty deep in ForwardDiff.jl as it plays fast and loose with types which can only be inferred at runtime, but there is a patch that I believe would fix the this secific issue here.
Improper Allocation in Jacobian
◖◗ @btime jacobian($f2, $(AutoForwardDiff()), $x)
282.829 ns (6 allocations: 704 bytes)
4×4 SMatrix{4, 4, Float64, 16} with indices SOneTo(4)×SOneTo(4):
2.0 0.0 0.0 0.0
0.0 2.0 0.0 0.0
0.0 0.0 2.0 0.0
0.0 0.0 0.0 2.0
I think this is the same issue as the above.
FiniteDiff
I'm less familiar with the internals of this package, but it claims to be non-allocating and compatible with StaticArrays.
Gradient uses setindex!
◖◗ @btime gradient($f1, $(AutoFiniteDiff()), $x);
ERROR: setindex!(::SVector{4, Float64}, value, ::Int) is not defined.
Hint: Use `MArray` or `SizedArray` to create a mutable static array
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] setindex!(a::SVector{4, Float64}, value::Float64, i::Int64)
@ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/indexing.jl:3
[3] macro expansion
@ ~/.julia/packages/StaticArrays/MSJcA/src/broadcast.jl:159 [inlined]
[4] _broadcast!
@ ~/.julia/packages/StaticArrays/MSJcA/src/broadcast.jl:143 [inlined]
[5] _copyto!
@ ~/.julia/packages/StaticArrays/MSJcA/src/broadcast.jl:70 [inlined]
[6] copyto!
@ ~/.julia/packages/StaticArrays/MSJcA/src/broadcast.jl:63 [inlined]
[7] materialize!
@ ./broadcast.jl:914 [inlined]
[8] materialize!
@ ./broadcast.jl:911 [inlined]
[9] finite_difference_gradient!(df::SVector{…}, f::typeof(f1), x::SVector{…}, cache::FiniteDiff.GradientCache{…}; relstep::Float64, absstep::Float64, dir::Bool)
@ FiniteDiff ~/.julia/packages/FiniteDiff/wm6iC/src/gradients.jl:241
[10] finite_difference_gradient(f::Function, x::SVector{…}, cache::FiniteDiff.GradientCache{…}; relstep::Float64, absstep::Float64, dir::Bool)
@ FiniteDiff ~/.julia/packages/FiniteDiff/wm6iC/src/gradients.jl:222
[11] finite_difference_gradient(f::Function, x::SVector{…}, cache::FiniteDiff.GradientCache{…})
@ FiniteDiff ~/.julia/packages/FiniteDiff/wm6iC/src/gradients.jl:209
[12] gradient(f::Function, ::AutoFiniteDiff{…}, x::SVector{…}, extras::DifferentiationInterfaceFiniteDiffExt.FiniteDiffGradientExtras{…})
@ DifferentiationInterfaceFiniteDiffExt ~/.julia/packages/DifferentiationInterface/cuZBh/ext/DifferentiationInterfaceFiniteDiffExt/onearg.jl:95
[13] gradient(f::typeof(f1), backend::AutoFiniteDiff{Val{:forward}, Val{:forward}, Val{:hcentral}}, x::SVector{4, Float64})
@ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/cuZBh/src/fallbacks/no_extras.jl:9
[14] var"##core#600"(f1#597::typeof(f1), 598::AutoFiniteDiff{Val{…}, Val{…}, Val{…}}, x#599::SVector{4, Float64})
@ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:561
[15] var"##sample#601"(::Tuple{typeof(f1), AutoFiniteDiff{…}, SVector{…}}, __params::BenchmarkTools.Parameters)
@ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:570
[16] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; maxevals::Int64, kwargs::@Kwargs{})
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:187
[17] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:182
[18] #invokelatest#2
@ ./essentials.jl:892 [inlined]
[19] invokelatest
@ ./essentials.jl:889 [inlined]
[20] #lineartrial#46
@ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:51 [inlined]
[21] lineartrial
@ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:50 [inlined]
[22] tune!(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, verbose::Bool, pad::String, kwargs::@Kwargs{})
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:300
[23] tune!
@ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289 [inlined]
[24] tune!(b::BenchmarkTools.Benchmark)
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289
[25] top-level scope
@ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:666
Some type information was truncated. Use `show(err)` to see complete types.
This one might actually be a problem with DifferentiationInterface.jl itself because there are surely methods somewhere in FiniteDiff that don't rely on this.
Inappropriate allocations in jacobian
◖◗ @btime jacobian($f2, $(AutoFiniteDiff()), $x)
345.040 ns (44 allocations: 1.84 KiB)
4×4 SMatrix{4, 4, Float64, 16} with indices SOneTo(4)×SOneTo(4):
2.0 0.0 0.0 0.0
0.0 2.0 0.0 0.0
0.0 0.0 2.0 0.0
0.0 0.0 0.0 2.0
I'm less than completely confident this is indeed a bug, but it likely is as I don't really see why this would have to allocate.
TODO: Others?
There are definitely lots of similar issues with other backends, but I haven't documented them yet. However, many of those other back-ends give fewer guarantees about performance with StaticArrays, so there are likely only 4 or 5 backends (including those listed above) where performance quips are valid.
There currently seem to be quite a few issues with StaticArrays on many different back-ends. In most cases this is inefficiency due to inappropriate allocations (sometimes quite severe), in other cases there are outright errors. This issue is to document the various problems. Note that very few, if any of these are actually issues with DifferentiationInterface.jl itself, but rather with the back-ends.
In what follows we will use
Enzyme
Improper Allocation in Gradient
This is likely due to insufficient specialization in
Enzyme.gradientfor StaticArrays. I have confirmed that a rawEnzyme.autodiffis efficient and does not allocate. I'm attempting to address this, among other things in this PR.Invalid Construction in Jacobian
This error occurs when an insufficiently narrow StaticArrays type is used as a constructor. Again, this calls for more specialization within Enzyme. This may be fixed by this PR.
ForwardDiff (solved by #414 !)
Improper Allocation In Gradient
This seems to be due to type instability in
ForwardDiff.GradientConfigwhich that package mostly relies on the compiler eliding, but which does not get elided during its use in DifferentiationInterface.jl. In my opinion that flaw runs pretty deep in ForwardDiff.jl as it plays fast and loose with types which can only be inferred at runtime, but there is a patch that I believe would fix the this secific issue here.Improper Allocation in Jacobian
I think this is the same issue as the above.
FiniteDiff
I'm less familiar with the internals of this package, but it claims to be non-allocating and compatible with StaticArrays.
Gradient uses
setindex!This one might actually be a problem with DifferentiationInterface.jl itself because there are surely methods somewhere in FiniteDiff that don't rely on this.
Inappropriate allocations in jacobian
I'm less than completely confident this is indeed a bug, but it likely is as I don't really see why this would have to allocate.
TODO: Others?
There are definitely lots of similar issues with other backends, but I haven't documented them yet. However, many of those other back-ends give fewer guarantees about performance with StaticArrays, so there are likely only 4 or 5 backends (including those listed above) where performance quips are valid.