Skip to content

ggauravr/flightstream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

140 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ FlightStream - A Node.js-based Flight Server and Client Framework

License Node.js Version Alpha Release

⚠️ Alpha Release: This is currently in alpha. APIs may change between releases. This is not production-ready software. For production use, consider waiting for the stable release or pinning to a specific alpha version.

A comprehensive, high-performance Apache Arrow Flight streaming framework for Node.js that enables efficient, real-time data streaming across distributed systems. Built with a modular plugin architecture, FlightStream provides both server-side streaming capabilities and client-side data access patterns, making it ideal for modern data pipelines, analytics applications, and microservices architectures.

See It in Action: Streaming a 45MB CSV file with ~850K rows in < 4s

FlightStream Demo

Use Cases

  • Data Engineering: Stream CSV files to analytics engines (Apache Spark, DuckDB, Pandas)
  • API Modernization: Replace REST APIs with efficient columnar data transfer
  • Real-time Analytics: Power dashboards and BI tools with live data streams
  • Microservices: Enable high-performance data sharing between services
  • Multi-language Integration: Connect applications written in different programming languages

πŸš€ Features

πŸ—οΈ Plugin Architecture

Extensible adapter system for any data source - CSV, databases, cloud storage

⚑ High Performance

Efficient gRPC streaming with Apache Arrow's columnar data format

πŸ”§ Production Ready

Comprehensive error handling, monitoring hooks, and Docker support

πŸ‘₯ Multi-Language

Connect from Python, Java, C++, JavaScript using standard Arrow Flight clients

πŸ“Š Auto Schema Inference

Automatic Arrow schema detection from CSV files with type optimization

🌊 Streaming Support

Efficient streaming of large datasets with configurable batch sizes

πŸ’» Developer Friendly

Rich examples, comprehensive documentation, and easy setup

⚑ Quick Start

# Clone and install
git clone https://github.com/ggauravr/flightstream.git
cd flightstream
npm install

# Start the example server
npm start

# Test with the first dataset found in the data/ directory
npm test

# Test with a specific dataset
npm test <dataset>

The server automatically discovers CSV files in the data/ directory and serves them via Arrow Flight protocol.

Expected Output

Server Terminal (npm run dev):

FlightStream Server Running

Client Terminal (npm test):

FlightStream Client Streaming Data

That's it! The server will automatically discover CSV files in the data/ directory and stream them via Arrow Flight protocol. The test client will connect and display the streamed data in real-time. As you can see a CSV with ~41k rows is streamed to the client in .25s!

Client Terminal With a Specific Dataset(npm test MARC2020-County-01):

FlightStream Client Streaming Data

The test client will connect and display the streamed data specificed by the dataset id in real-time. In the example above, CSV with ~800k rows is streamed to the client in <4s!

What just happened?

  • Flight Server: Started on localhost:8080 with CSV adapter
  • Sample Data: Automatically discovered from ./data/ directory
  • Test Client: Connected via gRPC and streamed Arrow data
  • Live Reload: Server restarts automatically when you modify code

πŸ“¦ Packages

The monorepo contains focused, reusable packages:

Package Version Description
@flightstream/core-server 1.0.0-alpha.7 Core Flight server framework with gRPC support
@flightstream/core-client 1.0.0-alpha.3 Core Flight client framework with connection management
@flightstream/core-shared 1.0.0-alpha.3 Shared utilities and protocol helpers
@flightstream/adapters-csv 1.0.0-alpha.5 CSV file adapter with streaming and schema inference
@flightstream/utils-arrow 1.0.0-alpha.5 Advanced Arrow utilities and type system

🎯 Use Cases

  • Data Lakes: Serve files efficiently from S3, GCS, Snowflake, or local storage
  • Analytics Pipelines: Stream data to Apache Spark, DuckDB, or custom analytics
  • Real-time ETL: High-performance data transformation and streaming
  • API Modernization: Replace REST APIs with efficient columnar data transfer for real-time analytics products
  • Multi-language Integration: Connect Python, Java, C++, and JavaScript applications

πŸ“š Documentation

Bug fixes, enhancements, optimizations, docs, anything!

Complete API documentation and examples

Core architecture diagrams and design patterns

πŸ”§ Examples

The project includes working examples:

  • Basic Server (examples/basic-server/): Complete CSV server implementation
  • Basic Client (examples/basic-client/): Client with connection management and streaming

🀝 Community

πŸ“„ License

This project is licensed under the MIT License.

πŸ™ Acknowledgments

  • Apache Arrow for the columnar data format
  • DuckDB for the embedded analytical database and the mind-blowing single-node performance
  • gRPC for the high-performance RPC framework
  • Apache Arrow Flight for the amazing message transfer protocol

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors