Skip to content

Commit 445251a

Browse files
maartenbreddelskszucs
authored andcommitted
ARROW-9991: [C++] Split kernels for strings/binary
Contains: * `split_pattern` kernel with max_split and reverse option * `ascii_split_whitespace` similar to Python's `bytes.split` * `utf8_split_whitespace` similar to Python's `str.split` It should be easy to add new split methods, e.g. a regex one in the future. Closes #8271 from maartenbreddels/ARROW-9991 Authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
1 parent 0e13e28 commit 445251a

11 files changed

Lines changed: 714 additions & 0 deletions

File tree

cpp/src/arrow/compute/api_scalar.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,25 @@ struct ARROW_EXPORT MatchSubstringOptions : public FunctionOptions {
4949
std::string pattern;
5050
};
5151

52+
struct ARROW_EXPORT SplitOptions : public FunctionOptions {
53+
explicit SplitOptions(int64_t max_splits = -1, bool reverse = false)
54+
: max_splits(max_splits), reverse(reverse) {}
55+
56+
/// Maximum number of splits allowed, or unlimited when -1
57+
int64_t max_splits;
58+
/// Start splitting from the end of the string (only relevant when max_splits != -1)
59+
bool reverse;
60+
};
61+
62+
struct ARROW_EXPORT SplitPatternOptions : public SplitOptions {
63+
explicit SplitPatternOptions(std::string pattern, int64_t max_splits = -1,
64+
bool reverse = false)
65+
: SplitOptions(max_splits, reverse), pattern(std::move(pattern)) {}
66+
67+
/// The exact substring to look for inside input values.
68+
std::string pattern;
69+
};
70+
5271
/// Options for IsIn and IndexIn functions
5372
struct ARROW_EXPORT SetLookupOptions : public FunctionOptions {
5473
explicit SetLookupOptions(Datum value_set, bool skip_nulls)

0 commit comments

Comments
 (0)