refactor: make dfschema wrap schemaref#9595
Conversation
|
Great job @haohuaijin, glad to see this near the finish line. |
|
I plan to merge this tomorrow unless anyone else would like time to review. Thanks again everyone |
|
I found it difficult to cleanup the tuple into DFField struct because there 4 kinds of combination I need to construct. I thought we only need DFFieldRef and DFField for owned. Normal case pub struct DFFieldRef<'a> {
/// Optional qualifier (usually a table or relation name)
qualifier: Option<&'a OwnedTableReference>,
/// Arrow field definition
field: &'a Field,
}Sometimes we need FieldRef to avoid the cost of clone for field. pub struct DFFieldRefWithArc<'a> {
/// Optional qualifier (usually a table or relation name)
qualifier: Option<&'a OwnedTableReference>,
/// Arrow field definition
field: &'a FieldRef,
}It is usually easier to deal with owned dffield sometimes without additional cost, because we can't avoid clone for pub struct DFField {
/// Optional qualifier (usually a table or relation name)
pub qualifier: Option<OwnedTableReference>,
/// Arrow field definition
pub field: FieldRef,
}And of course, we can have field not fieldRef, it seems better not to wrap it with Arc if possible. pub struct DFField {
/// Optional qualifier (usually a table or relation name)
pub qualifier: Option<OwnedTableReference>,
/// Arrow field definition
pub field: Field,
} |
|
I suggest we want until the imminent 37.0.0 release -- #9697 -- so we have additional time to test this change before releasing it to the larger community |
|
Since Arrow 37 is now basically released I think this PR is ready to merge. I took the liberty of merging up from main to resolve some conflicts, and once CI is complete I intend to merge this PR Thanks again @haohuaijin |
|
Great job getting this over the line @haohuaijin |
|
🚀 |
DFField was removed upstream. Ref: apache/datafusion#9595
The previous implementation relied on `DFField` which was removed upstream. Ref: apache/datafusion#9595
* chore: upgrade datafusion Deps Ref #690 * update concat and concat_ws to use datafusion_functions Moved in apache/datafusion#10089 * feat: upgrade functions.rs Upstream is continuing it's migration to UDFs. Ref apache/datafusion#10098 Ref apache/datafusion#10372 * fix ScalarUDF import * feat: remove deprecated suppors_filter_pushdown and impl supports_filters_pushdown Deprecated function removed in apache/datafusion#9923 * use `unnest_columns_with_options` instead of deprecated `unnest_column_with_option` * remove ScalarFunction wrappers These relied on upstream BuiltinScalarFunction, which are now removed. Ref apache/datafusion#10098 * update dataframe `test_describe` `null_count` was fixed upstream. Ref apache/datafusion#10260 * remove PyDFField and related methods DFField was removed upstream. Ref: apache/datafusion#9595 * bump `datafusion-python` package version to 38.0.0 * re-implement `PyExpr::column_name` The previous implementation relied on `DFField` which was removed upstream. Ref: apache/datafusion#9595
Which issue does this PR close?
Closes #4680
follow on #8905
Rationale for this change
see related issue #4680, #7944 and #8905
See also mailing list discussion: https://lists.apache.org/thread/79rpwd22bncdxbkwlq842rwr485kwwyh
What changes are included in this PR?
change the
DFSchemaand removeDFFieldstructfrom
to
and update the related API
Are these changes tested?
test with existing tests
Are there any user-facing changes?
New API for create DFSchema
pub fn from_unqualifed_fields(fields: Fields, metadata: HashMap<String, String>)->Result<Self>pub fn try_from_qualified_schema<'a>(qualifier: impl Into<TableReference<'a>>,schema: &Schema,)->Result<Self>pub fn new_with_metadata(qualified_fields: Vec<(Option<OwnedTableReference>, Arc<Field>)>,metadata: HashMap<String, String>) -> Result<Self>pub fn from_field_specific_qualified_schema<'a>(qualifiers: Vec<Option<impl Into<TableReference<'a>>>>, schema: &SchemaRef) -> Result<Self>New API to find qualified
Field, the old API only returnFieldpub fn qualified_field(&self, i: usize) -> (Option<&OwnedTableReference>, &Field): returns an immutable reference of a specificFieldand its qualifier by an offsetpub fn qualified_field_with_name(&self, qualifier: Option<&TableReference>, name: &str) -> Result<(Option<&OwnedTableReference>, &Field)>: find the qualified field with the given namepub fn qualified_fields_with_unqualified_name(&self, name: &str) -> Vec<(Option<&OwnedTableReference>, &Field)>: Find all fields that match the given name and return them with their qualifierpub fn qualified_field_with_unqualified_name(&self,name: &str) -> Result<(Option<&OwnedTableReference>, &Field)>: Find the qualified field with the given unqualified namepub fn qualified_field_from_column(&self,column: &Column) -> Result<(Option<&OwnedTableReference>, &Field)>: Find the field with the given qualified columnpub fn iter(&self) -> impl Iterator<Item = (Option<&OwnedTableReference>, &FieldRef)>: Iterate over the qualifiers and fields in the DFSchemaOther new API
pub fn columns(&self) -> Vec<Column>: return allColumns for the schemapub fn columns_with_unqualified_name(&self, name: &str) -> Vec<Column>: find all fields that match the given name and convert to column