Skip to content

Aman edits#92

Closed
amasick wants to merge 6 commits intoVectifyAI:mainfrom
amasick:aman_edits
Closed

Aman edits#92
amasick wants to merge 6 commits intoVectifyAI:mainfrom
amasick:aman_edits

Conversation

@amasick
Copy link

@amasick amasick commented Jan 29, 2026

added gemini configuration

unknown added 6 commits January 29, 2026 13:13
… detection to avoid false positives - Implemented recursive large node splitting - Added comprehensive RAG pipeline - Created test scripts for RedBook queries - Generated 349 nodes from 686-page document (vs original 13 nodes)
@saccharin98
Copy link
Collaborator

Thanks for working on Gemini support — it's a valuable direction. However, this PR needs significant rework
before it can be merged.

Files that should not be in this PR:

Please remove all of the following — they appear to be personal notes/workflow files and test data that don't
belong in the repository:

  • FINAL_SUBMISSION_STATUS.md, SUBMISSION_READY.md, SETUP_COMPLETE.md
  • PUSH_AND_PR_GUIDE.md, QUICK_PUSH_REFERENCE.txt
  • DOCUMENTATION_INDEX.md, INDEX.md, INSTALLATION_CHECKLIST.md
  • RedBook.pdf, redbook_query.py

Architectural concern (the main issue):

The current approach copies the entire pageindex logic into a new standalone file and replaces OpenAI calls
with Gemini. This creates a parallel implementation that will drift out of sync whenever the core package is
updated.

The right approach is to add a provider abstraction inside the existing pageindex package — for example,
allowing users to pass in a custom LLM client or specify a provider — rather than duplicating hundreds of
lines of logic. Would you be open to refactoring in that direction?

Code issues in gemini_pageindex.py:

  1. google-generativeai==0.3.1 is severely outdated (current is 0.8.x). Models like gemini-2.0-flash don't
    exist in that version — this likely won't run at all. Please update and verify.
  2. count_tokens() hardcodes gpt-4o as the tokenizer model — semantically incorrect for a Gemini
    implementation.
  3. Default model is inconsistent: gemini_api() defaults to gemini-2.0-flash, but the CLI defaults to
    gemini-1.5-flash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants