Skip to content

feat: implement REST API database and table CRUD operations with DLF authentication#147

Open
discivigour wants to merge 23 commits intoapache:mainfrom
discivigour:restApi-crud
Open

feat: implement REST API database and table CRUD operations with DLF authentication#147
discivigour wants to merge 23 commits intoapache:mainfrom
discivigour:restApi-crud

Conversation

@discivigour
Copy link
Contributor

@discivigour discivigour commented Mar 20, 2026

Purpose

Linked issue: #119

This PR implements complete REST API CRUD operations and DLF authentication for the Paimon Rust project.

Key Features:

  • Complete REST API database operations (list_databases, create_database, get_database, alter_database, drop_database)
  • Complete REST API table operations (list_tables, create_table, get_table, rename_table, drop_table)
  • DLF (Alibaba Cloud Data Lake Formation) authentication provider implementation

Brief change log

Core Feature Implementation

  1. REST API CRUD Operations

    • Implement complete database CRUD operations: list_databases, create_database, get_database, alter_database, drop_database
    • Implement complete table CRUD operations: list_tables, create_table, get_table, rename_table, drop_table
    • Unified API request builder and response handling
  2. DLF Authentication Support

    • Implement Alibaba Cloud DLF (Data Lake Formation) authentication provider
    • Complete Alibaba Cloud signature algorithm implementation
    • Support Token authentication method
    • Configurable authentication provider factory

Tests

# Run REST API related tests
cargo test --package paimon --test rest_api_test

API and Format

Documentation

umi added 4 commits March 18, 2026 19:28
…tion

- Add full REST API support for database and table operations
- Implement DLF (Alibaba Cloud Data Lake Formation) authentication provider
- Add bearer token authentication support
- Refactor mock server for better testing
- Add comprehensive test coverage for REST API operations
- Fix clippy warnings and apply code formatting

Database operations:
- list_databases, create_database, get_database, alter_database, drop_database

Table operations:
- list_tables, create_table, get_table, rename_table, drop_table

Authentication:
- DLF authentication with signature generation
- Bearer token authentication
- Configurable token provider factory
@discivigour discivigour marked this pull request as draft March 20, 2026 07:04
@discivigour discivigour changed the title feat: implement complete REST API CRUD operations with DLF authentication feat: implement REST API database and table CRUD operations with DLF authentication Mar 20, 2026
umi added 14 commits March 20, 2026 15:19
- Rename BearTokenAuthProvider to BearerTokenAuthProvider for correctness
- Update all references in auth module and API exports
- Fix typo in naming to match Bearer token authentication standard
@discivigour discivigour marked this pull request as ready for review March 23, 2026 07:21
Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@discivigour Haven't review all files. But left some comments firstly.

Makefile Outdated
docker compose -f dev/docker-compose.yaml down -v

# Code quality checks
check:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add in this pr? I'd perfer to make the pr focus

.licenserc.yaml Outdated
spdx-id: Apache-2.0
copyright-owner: Apache Software Foundation

paths-ignore:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add in this pr? I'd perfer to make the pr focus

.gitignore Outdated
.vscode
**/.DS_Store
dist/*
crates/paimon/examples/dlf_*.rs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this since seem we have no dlf_*.rs

dev/check.sh Outdated
# specific language governing permissions and limitations
# under the License.

set -e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add in this pr? I'd perfer to make the pr focus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will add the shell script in a separate pr.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@discivigour Thanks for the pr. Left some comments. PTAL


// List databases
println!("Listing databases...");
match api.list_databases().await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about introduing this example. Do use really care about using api ? Users only need to care about the catalog interface. So, two suggestion in here

  • remove this example, and add example in following pr if we found it's useful
  • keep this example, but use catalog interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will remove it and add examples of catalog interface in next pr.

access_key_id: impl Into<String>,
access_key_secret: impl Into<String>,
security_token: Option<String>,
expiration: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
change the order to

expiration_at_millis: Option<i64>,
expiration: Option<String>,

since it prefer expiration_at_millis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

.get(CatalogOptions::DLF_TOKEN_ECS_METADATA_URL)
.cloned()
.unwrap_or_else(|| {
"http://100.100.100.200/latest/meta-data/Ram/security-credentials/".to_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double check what's http://100.100.100.200. Is it correct to hardcode it as the default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked it. It's defaut value for aliyun ecs token auth.

/// - Load a new token if current token is None
/// - Refresh the token if it's about to expire (within TOKEN_EXPIRATION_SAFE_TIME_MILLIS)
fn get_token(&self) -> Result<DLFToken, String> {
if let Some(ref loader) = self.token_loader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

if let Some(loader) = &self.token_loader {
        let need_reload = {
            let token = self.token.borrow();

            match token.as_ref() {
                None => true,
                Some(token) => match token.expiration_at_millis {
                    Some(expiration_at_millis) => {
                        let now = chrono::Utc::now().timestamp_millis();
                        expiration_at_millis - now < TOKEN_EXPIRATION_SAFE_TIME_MILLIS
                    }
                    None => false,
                },
            }
        };

        if need_reload {
            let new_token = loader.load_token()?;
            *self.token.borrow_mut() = Some(new_token);
        }
    }

Change this to make it more idiomatic Rust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

/// If token_loader is configured, this method will:
/// - Load a new token if current token is None
/// - Refresh the token if it's about to expire (within TOKEN_EXPIRATION_SAFE_TIME_MILLIS)
fn get_token(&self) -> Result<DLFToken, String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to. get_or_refresh_token?
Also, I'm wondering can it be mut self so that we don't need RefCell<Option<DLFToken>>, RefCell feel a little hack to me considering it's not friendly to async block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I will change it.

let token = match self.get_token() {
Ok(t) => t,
Err(e) => {
eprintln!("Failed to get token: {}", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected we can just ignore this error and return base_header diectly?
If so, use log::info instead of eprintln

}

/// Perform HTTP GET request with retry logic (sync version).
fn get(&self, url: &str) -> Result<String, String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always use the Result define in paimon, please use crates/paimon/src/error.rs

use crate::{catalog::Identifier, spec::Schema};

/// Base trait for REST requests.
pub trait RESTRequest {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this trait currently?

token_loader: Option<Arc<dyn DLFTokenLoader>>,
) -> Self {
if token.is_none() && token_loader.is_none() {
panic!("Either token or token_loader must be provided");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not panic in here, return error instead

"{}/{}/{}/{}",
self.base_path,
Self::DATABASES,
database_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoudn't it do encode_string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I missed that.

let host = host.split('/').next().unwrap_or("");
let host = host.split(':').next().unwrap_or("");

if host.starts_with("dlfnext") || host.contains("openapi") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host.contains("openapi") should not be needed, I will check with Tu Ming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Ok(t) => t,
Err(e) => {
eprintln!("Failed to get token: {}", e);
return base_header;
Copy link

@XiaoHongbo-Hope XiaoHongbo-Hope Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep same behavior with java, throwing exception when failure.

Image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I will change it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants