Faster strpos() string function for ASCII-only case#12401
Conversation
There was a problem hiding this comment.
Great 🚀 The implementation looks good to me
However, I'm not really sure why the normal string array can't be improved. 🤔
I think it's not executed and triggered some error, see benchmark StringArray cases only take several ns, and StringViewArray cases are at least several us
I guess it's because in micro bench, arguments are of different physical string type and break the function implementation (strpos(string_col, string_view_col))
With #12415 this kind of bug can be more throughly tested
Update: It's not a bug for SQL API, and this can only be triggered if we use raw expression API as in the benchmark
| { | ||
| let string_iter = ArrayIter::new(string_array); | ||
| let substring_iter = ArrayIter::new(substring_array); | ||
| let ascii_only = string_array.is_ascii() && substring_array.is_ascii(); |
There was a problem hiding this comment.
| let ascii_only = string_array.is_ascii() && substring_array.is_ascii(); | |
| let ascii_only = substring_array.is_ascii() && string_array.is_ascii() ; |
I think we can check substring first, since it's cheaper in common cases and we can then shortcircuit checking string_array in some case
There was a problem hiding this comment.
Nice suggestion! I did the benchmark again. The performance is improved!
group before after
----- ----- -------
strpos_StringArray_ascii_str_len_128 1.06 370.5±9.46ns ? ?/sec 1.00 348.6±11.38ns ? ?/sec
strpos_StringArray_ascii_str_len_32 1.07 372.5±10.23ns ? ?/sec 1.00 346.9±9.45ns ? ?/sec
strpos_StringArray_ascii_str_len_4096 1.08 378.4±12.05ns ? ?/sec 1.00 349.4±16.83ns ? ?/sec
strpos_StringArray_ascii_str_len_8 1.07 371.4±14.53ns ? ?/sec 1.00 346.1±28.44ns ? ?/sec
strpos_StringArray_utf8_str_len_128 1.06 377.2±18.04ns ? ?/sec 1.00 356.4±21.92ns ? ?/sec
strpos_StringArray_utf8_str_len_32 1.08 374.9±34.32ns ? ?/sec 1.00 345.8±11.05ns ? ?/sec
strpos_StringArray_utf8_str_len_4096 1.09 381.6±16.68ns ? ?/sec 1.00 351.4±23.14ns ? ?/sec
strpos_StringArray_utf8_str_len_8 1.09 372.9±20.83ns ? ?/sec 1.00 343.3±11.78ns ? ?/sec
strpos_StringViewArray_ascii_str_len_128 1.79 3.2±0.15ms ? ?/sec 1.00 1763.8±44.16µs ? ?/sec
strpos_StringViewArray_ascii_str_len_32 1.03 648.7±18.27µs ? ?/sec 1.00 628.1±21.69µs ? ?/sec
strpos_StringViewArray_ascii_str_len_4096 1.24 62.8±7.27ms ? ?/sec 1.00 50.5±2.05ms ? ?/sec
strpos_StringViewArray_ascii_str_len_8 1.00 280.2±10.44µs ? ?/sec 1.05 294.8±42.47µs ? ?/sec
strpos_StringViewArray_utf8_str_len_128 1.03 5.3±0.13ms ? ?/sec 1.00 5.1±0.12ms ? ?/sec
strpos_StringViewArray_utf8_str_len_32 1.01 1961.9±57.34µs ? ?/sec 1.00 1944.1±73.19µs ? ?/sec
strpos_StringViewArray_utf8_str_len_4096 1.03 147.4±9.24ms ? ?/sec 1.00 142.5±3.61ms ? ?/sec
strpos_StringViewArray_utf8_str_len_8 1.01 874.5±26.02µs ? ?/sec 1.00 863.4±40.68µs ? ?/sec
You're right. There are some errors here. I ran the benchmark in the It seems that Then, I realized it was my mistake 😢. When I generated the substring for the |
5aae0fb to
62f2307
Compare
| ] | ||
| } else { | ||
| let string_array: StringArray = output_string_vec.clone().into_iter().collect(); | ||
| let sub_string_array: StringArray = |
There was a problem hiding this comment.
I rebase to make all benchmark-related changes in the first commit. It was
let sub_string_array: StringViewArray = output_string_vec.clone().into_iter().collect();
After changing to StringArray, the benchmark works fine.
|
@2010YOUY01 Thank you for the review. I updated the benchmark result. After fixing the benchmark, we can find the StringArray case also be improved. |
|
Thanks @alamb @2010YOUY01 |

Which issue does this PR close?
Closes #12366.
Rationale for this change
For ASCII-only cases, I tried to use the slice window to find the position instead of the string pattern matching.
We can find that the ASCII-only
StringViewArraycase performs 50% faster than before.However, I'm not really sure why the normal string array can't be improved. 🤔What changes are included in this PR?
If both arguments are ASCII-only, we can perform this faster method.
The benchmark result:
Are these changes tested?
yes
Are there any user-facing changes?