Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion deps/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,21 @@ PROXYSQL_PATH := $(shell while [ ! -f ./src/proxysql_global.cpp ]; do cd ..; don

include $(PROXYSQL_PATH)/include/makefiles_vars.mk

# Rust toolchain detection
RUSTC := $(shell which rustc 2>/dev/null)
CARGO := $(shell which cargo 2>/dev/null)
ifndef RUSTC
$(error "rustc not found. Please install Rust toolchain")
endif
ifndef CARGO
$(error "cargo not found. Please install Rust toolchain")
endif

# SQLite environment variables for sqlite-rembed build
export SQLITE3_INCLUDE_DIR=$(shell pwd)/sqlite3/sqlite3
export SQLITE3_LIB_DIR=$(shell pwd)/sqlite3/sqlite3
export SQLITE3_STATIC=1


# to compile libmariadb_client with support for valgrind enabled, run:
# export USEVALGRIND=1
Expand Down Expand Up @@ -250,7 +265,14 @@ sqlite3/sqlite3/vec.o: sqlite3/sqlite3/sqlite3.o
cd sqlite3/sqlite3 && cp ../sqlite-vec-source/sqlite-vec.c . && cp ../sqlite-vec-source/sqlite-vec.h .
cd sqlite3/sqlite3 && ${CC} ${MYCFLAGS} -fPIC -c -o vec.o sqlite-vec.c -DSQLITE_CORE -DSQLITE_VEC_STATIC -DSQLITE_ENABLE_MEMORY_MANAGEMENT -DSQLITE_ENABLE_JSON1 -DSQLITE_DLL=1

sqlite3: sqlite3/sqlite3/sqlite3.o sqlite3/sqlite3/vec.o
sqlite3/libsqlite_rembed.a: sqlite3/sqlite-rembed-0.0.1-alpha.9.tar.gz
cd sqlite3 && rm -rf sqlite-rembed-*/ sqlite-rembed-source/ || true
cd sqlite3 && tar -zxf sqlite-rembed-0.0.1-alpha.9.tar.gz
mv sqlite3/sqlite-rembed-0.0.1-alpha.9 sqlite3/sqlite-rembed-source
cd sqlite3/sqlite-rembed-source && SQLITE3_INCLUDE_DIR=$(SQLITE3_INCLUDE_DIR) SQLITE3_LIB_DIR=$(SQLITE3_LIB_DIR) SQLITE3_STATIC=1 $(CARGO) build --release --features=sqlite-loadable/static --lib
cp sqlite3/sqlite-rembed-source/target/release/libsqlite_rembed.a sqlite3/libsqlite_rembed.a

sqlite3: sqlite3/sqlite3/sqlite3.o sqlite3/sqlite3/vec.o sqlite3/libsqlite_rembed.a


libconfig/libconfig/out/libconfig++.a:
Expand Down Expand Up @@ -342,6 +364,7 @@ cleanpart:
cd mariadb-client-library && rm -rf mariadb-connector-c-*/ || true
cd jemalloc && rm -rf jemalloc-*/ || true
cd sqlite3 && rm -rf sqlite-amalgamation-*/ || true
cd sqlite3 && rm -rf libsqlite_rembed.a sqlite-rembed-source/ sqlite-rembed-*/ || true
cd postgresql && rm -rf postgresql-*/ || true
cd postgresql && rm -rf postgres-*/ || true
.PHONY: cleanpart
Expand Down
Binary file added deps/sqlite3/sqlite-rembed-0.0.1-alpha.9.tar.gz
Binary file not shown.
245 changes: 245 additions & 0 deletions doc/SQLITE-REMBED-TEST-README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# sqlite-rembed Integration Test Suite

## Overview

This test suite comprehensively validates the integration of `sqlite-rembed` (Rust SQLite extension for text embedding generation) into ProxySQL. The tests verify the complete AI pipeline from client registration to embedding generation and vector similarity search.

## Prerequisites

### System Requirements
- **ProxySQL** compiled with `sqlite-rembed` and `sqlite-vec` extensions
- **MySQL client** (`mysql` command line tool)
- **Bash** shell environment
- **Network access** to embedding API endpoint (or local Ollama/OpenAI API)

### ProxySQL Configuration
Ensure ProxySQL is running with SQLite3 server enabled:
```bash
cd /home/rene/proxysql-vec/src
./proxysql --sqlite3-server
```

### Test Configuration
The test script uses default connection parameters:
- Host: `127.0.0.1`
- Port: `6030` (default SQLite3 server port)
- User: `root`
- Password: `root`

Modify these in the script if your configuration differs.

## Test Suite Structure

The test suite is organized into 9 phases, each testing specific components:

### Phase 1: Basic Connectivity and Function Verification
- ✅ ProxySQL connection
- ✅ Database listing
- ✅ `sqlite-vec` function availability
- ✅ `sqlite-rembed` function registration
- ✅ `temp.rembed_clients` virtual table existence

### Phase 2: Client Configuration
- ✅ Create embedding API client with `rembed_client_options()`
- ✅ Verify client registration in `temp.rembed_clients`
- ✅ Test `rembed_client_options` function

### Phase 3: Embedding Generation Tests
- ✅ Generate embeddings for short and long text
- ✅ Verify embedding data type (BLOB) and size (768 dimensions × 4 bytes)
- ✅ Error handling for non-existent clients

### Phase 4: Table Creation and Data Storage
- ✅ Create regular table for document storage
- ✅ Create virtual vector table using `vec0`
- ✅ Insert test documents with diverse content

### Phase 5: Embedding Generation and Storage
- ✅ Generate embeddings for all documents
- ✅ Store embeddings in vector table
- ✅ Verify embedding count matches document count
- ✅ Check embedding storage format

### Phase 6: Similarity Search Tests
- ✅ Exact self-match (document with itself, distance = 0.0)
- ✅ Similarity search with query text
- ✅ Verify result ordering by ascending distance

### Phase 7: Edge Cases and Error Handling
- ✅ Empty text input
- ✅ Very long text input
- ✅ SQL injection attempt safety

### Phase 8: Performance and Concurrency
- ✅ Sequential embedding generation timing
- ✅ Basic performance validation (< 10 seconds for 3 embeddings)

### Phase 9: Cleanup and Final Verification
- ✅ Clean up test tables
- ✅ Verify no test artifacts remain

## Usage

### Running the Full Test Suite
```bash
cd /home/rene/proxysql-vec/doc
./sqlite-rembed-test.sh
```

### Expected Output
The script provides color-coded output:
- 🟢 **Green**: Test passed
- 🔴 **Red**: Test failed
- 🔵 **Blue**: Information and headers
- 🟡 **Yellow**: Test being executed

### Exit Codes
- `0`: All tests passed
- `1`: One or more tests failed
- `2`: Connection issues or missing dependencies

## Configuration

### Modifying Connection Parameters
Edit the following variables in `sqlite-rembed-test.sh`:
```bash
PROXYSQL_HOST="127.0.0.1"
PROXYSQL_PORT="6030"
MYSQL_USER="root"
MYSQL_PASS="root"
```

### API Configuration
The test uses a synthetic OpenAI endpoint by default. Set `API_KEY` environment variable or modify the variable below to use your own API:
```bash
API_CLIENT_NAME="test-client-$(date +%s)"
API_FORMAT="openai"
API_URL="https://api.synthetic.new/openai/v1/embeddings"
API_KEY="${API_KEY:-YOUR_API_KEY}" # Uses environment variable or placeholder
API_MODEL="hf:nomic-ai/nomic-embed-text-v1.5"
VECTOR_DIMENSIONS=768
```

For other providers (Ollama, Cohere, Nomic), adjust the format and URL accordingly.

## Test Data

### Sample Documents
The test creates 4 sample documents:
1. **Machine Learning** - "Machine learning algorithms improve with more training data..."
2. **Database Systems** - "Database management systems efficiently store, retrieve..."
3. **Artificial Intelligence** - "AI enables computers to perform tasks typically..."
4. **Vector Databases** - "Vector databases enable similarity search for embeddings..."

### Query Texts
Test searches use:
- Self-match: Document 1 with itself
- Query: "data science and algorithms"

## Troubleshooting

### Common Issues

#### 1. Connection Failed
```
Error: Cannot connect to ProxySQL at 127.0.0.1:6030
```
**Solution**: Ensure ProxySQL is running with `--sqlite3-server` flag.

#### 2. Missing Functions
```
ERROR 1045 (28000): no such function: rembed
```
**Solution**: Verify `sqlite-rembed` was compiled and linked into ProxySQL binary.

#### 3. API Errors
```
Error from embedding API
```
**Solution**: Check network connectivity and API credentials.

#### 4. Vector Table Errors
```
ERROR 1045 (28000): A LIMIT or 'k = ?' constraint is required on vec0 knn queries.
```
**Solution**: All `sqlite-vec` similarity queries require `LIMIT` clause.

### Debug Mode
For detailed debugging, run with trace:
```bash
bash -x ./sqlite-rembed-test.sh
```

## Integration with CI/CD

The test script can be integrated into CI/CD pipelines:

```yaml
# Example GitHub Actions workflow
name: sqlite-rembed Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build ProxySQL with sqlite-rembed
run: |
cd deps && make cleanpart && make sqlite3
cd ../lib && make
cd ../src && make
- name: Start ProxySQL
run: |
cd src && ./proxysql --sqlite3-server &
sleep 5
- name: Run Integration Tests
run: |
cd doc && ./sqlite-rembed-test.sh
```

## Extending the Test Suite

### Adding New Tests
1. Add new test function following existing pattern
2. Update phase header and test count
3. Add to appropriate phase section

### Testing Different Providers
Modify the API configuration block to test:
- **Ollama**: Use `format='ollama'` and local URL
- **Cohere**: Use `format='cohere'` and appropriate model
- **Nomic**: Use `format='nomic'` and Nomic API endpoint

### Performance Testing
Extend Phase 8 for:
- Concurrent embedding generation
- Batch processing tests
- Memory usage monitoring

## Results Interpretation

### Success Criteria
- All connectivity tests pass
- Embeddings generated with correct dimensions
- Vector search returns ordered results
- No test artifacts remain after cleanup

### Performance Benchmarks
- Embedding generation: < 3 seconds per request (network-dependent)
- Similarity search: < 100ms for small datasets
- Memory: Stable during sequential operations

## References

- [sqlite-rembed GitHub](https://github.com/asg017/sqlite-rembed)
- [sqlite-vec Documentation](./SQLite3-Server.md)
- [ProxySQL SQLite3 Server](./SQLite3-Server.md)
- [Integration Documentation](./sqlite-rembed-integration.md)

## License

This test suite is part of the ProxySQL project and follows the same licensing terms.

---
*Last Updated: $(date)*
*Test Suite Version: 1.0*
41 changes: 38 additions & 3 deletions doc/SQLite3-Server.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,39 @@ SELECT rowid, distance FROM vec_data
WHERE vector MATCH json('[0.1, 0.2, 0.3,...,0.128]');
```

### Embedding Generation (with sqlite-rembed)

```sql
-- Register an embedding API client
INSERT INTO temp.rembed_clients(name, format, model, key)
VALUES ('openai', 'openai', 'text-embedding-3-small', 'your-api-key');

-- Generate text embeddings
SELECT rembed('openai', 'Hello world') as embedding;

-- Complete AI pipeline: generate embedding and search
CREATE VECTOR TABLE documents (embedding float[1536]);

INSERT INTO documents(rowid, embedding)
VALUES (1, rembed('openai', 'First document text'));

INSERT INTO documents(rowid, embedding)
VALUES (2, rembed('openai', 'Second document text'));

-- Search for similar documents
SELECT rowid, distance FROM documents
WHERE embedding MATCH rembed('openai', 'Search query');
```

#### Supported Embedding Providers
- **OpenAI**: `format='openai', model='text-embedding-3-small'`
- **Ollama** (local): `format='ollama', model='nomic-embed-text'`
- **Cohere**: `format='cohere', model='embed-english-v3.0'`
- **Nomic**: `format='nomic', model='nomic-embed-text-v1.5'`
- **Llamafile** (local): `format='llamafile'`

See [sqlite-rembed integration documentation](./sqlite-rembed-integration.md) for full details.

### Available Databases

```sql
Expand All @@ -87,9 +120,11 @@ SHOW DATABASES;

1. **Data Analysis**: Store and analyze temporary data
2. **Vector Search**: Perform similarity searches with sqlite-vec
3. **Testing**: Test SQLite features with MySQL clients
4. **Prototyping**: Quick data storage and retrieval
5. **Custom Applications**: Build applications using SQLite with MySQL tools
3. **Embedding Generation**: Create text embeddings with sqlite-rembed (OpenAI, Ollama, Cohere, etc.)
4. **AI Pipelines**: Complete RAG workflows: embedding generation → vector storage → similarity search
5. **Testing**: Test SQLite features with MySQL clients
6. **Prototyping**: Quick data storage and retrieval
7. **Custom Applications**: Build applications using SQLite with MySQL tools

## Limitations

Expand Down
Loading