Skip to content

Easier Dataframe API for map #11546

@jayzhan211

Description

@jayzhan211

Dataframe API for map expects us to pass args with make_array

i.e.

map(vec![make_array(vec![lit("a"), lit("b")]), make_array(vec![lit("1"), lit("2")])])

I think we could have easier one with without make_array

map(vec![lit("a"), lit("b")], vec![lit("1"), lit("2")]])

To achieve this we may need to change the arguments of MapFunc from two array to Vec<Expr>, which the first half are keys, another half are values.

Originally posted by @jayzhan211 in #11452 (comment)

Dataframe API is somthing used for building Expr

Most of them are written in macro if they have similar pattern, others are individual function, like count_distinct

pub fn count_distinct(expr: Expr) -> Expr {
    Expr::AggregateFunction(datafusion_expr::expr::AggregateFunction::new_udf(
        count_udaf(),
        vec![expr],
        true,
        None,
        None,
        None,
    ))
}

The idea of map is similar to

pub fn map(keys:Vec<Expr>,values:Vec<Expr>) -> Expr {
    let args: Vec<Expr> = concat keys and values
    Expr::ScalarFunction(datafusion_expr::expr::ScalarFunction::new_udf(
        map_udf(),
        vec![args],

    ))
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions