Skip to content

ghostiee-11/Multi-agent-AI-system-for-document-processing-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Agent AI System for Document Processing (Groq Llama3-8B Version)

This system accepts input in PDF, JSON, or Email (text) format, classifies its format and intent, and routes it to the appropriate specialized agent for processing. It maintains a shared context (memory) for traceability. This version is specifically configured to use the Groq API with the llama3-8b-8192 model.

System Overview

  1. Input:
    • Raw file (PDF, JSON, TXT representing email).
    • Raw text string (simulating email content).
    • Raw JSON string.
  2. Classifier Agent:
    • Determines the input format (PDF, JSON, Email/Text).
    • Extracts text content (for PDF/Text).
    • Uses the Groq Llama3-8B model to classify the intent (e.g., Invoice, RFQ, Complaint, Regulation) and urgency.
    • Logs format, intent, source, and other details to shared memory.
    • Routes the input (or extracted text/parsed JSON) to a specialized agent.
  3. JSON Agent:
    • Accepts structured JSON payloads.
    • If the classified intent is "Invoice" (or as configured), it validates against a target schema (JSON_TARGET_SCHEMA in config.py).
    • Extracts/reformats data based on the schema.
    • Flags anomalies or missing fields.
    • If the intent is not "Invoice" or doesn't match schema criteria, it processes the JSON as a generic payload.
    • Logs results to shared memory.
  4. Email Agent:
    • Accepts email content (text, including text extracted from PDFs).
    • Uses the Groq Llama3-8B model to extract key information (sender, refined subject, refined summary, urgency, action items, contact person/phone).
    • Formats the extracted information for CRM-style usage according to EMAIL_CRM_FORMAT_KEYS in config.py.
    • Logs results to shared memory.
  5. Shared Memory Module:
    • A lightweight in-memory store (Python dictionary).
    • Stores: conversation ID, source, format type, timestamp, classified intent, extracted values from agents, errors, and processing steps.
    • Accessible across all agents for context and traceability.

Tech Stack

  • Python 3.9+
  • Groq API with the llama3-8b-8192 model for all LLM tasks.
  • pdfplumber for PDF text extraction.
  • Standard Python json library.
  • In-memory Python dictionary for Shared Memory (memory/shared_memory.py).

Folder Structure

├── agents/ # Agent implementations (classifier, json, email)
│ ├── init.py
│ ├── base_agent.py
│ ├── classifier_agent.py
│ ├── json_agent.py
│ └── email_agent.py
├── memory/ # Shared memory module
│ ├── init.py
│ └── shared_memory.py
├── utils/ # Utility functions (LLM interaction, PDF parsing)
│ ├── init.py
│ ├── llm_utils.py
│ └── pdf_parser.py
├── sample_inputs/ # Example input files (JSON, TXT, and a placeholder for PDF)
│ ├── complaint_sample.txt
│ ├── invoice_sample.json
│ ├── regulation_sample.txt
│ └── rfq_sample_text_fallback.txt
├── main.py # Main script to run examples and orchestrate agents
├── config.py # Configuration (API keys, model names, schemas)
├── requirements.txt # Python dependencies
├── README.md # This file
└── .env # For API keys (gitignored)

Setup

  1. Clone the repository (if you have one):

    git clone <your-repo-url>
    cd multi_agent_system

    (If you don't have a repo, just ensure all files are in the correct structure shown above.)

  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Linux/macOS
    # venv\Scripts\activate    # On Windows
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up Groq API Key:

    • Sign up at GroqCloud to get an API key if you don't have one.
    • Create a file named .env in the root of the multi_agent_system directory.
    • Add your Groq API key to it:
      GROQ_API_KEY="your_groq_api_key_here"
      

How to Run

Execute the main.py script from the root of the multi_agent_system directory:

python main.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages