A guide for testing and evaluating the IPCC RAG system
This guide helps you:
- Test the system with real climate science questions
- Evaluate answer quality and accuracy
- Check source tracking (paragraph IDs)
- Provide feedback for improvements
- Share findings with colleagues
# Navigate to the project directory
cd llmrag
# Start the web interface
streamlit run streamlit_app.py- Open your browser to
http://localhost:8501 - Select
wg1/chapter02(Changing State of the Climate System) - Click "Load Chapter"
- Wait for the loading message to complete
Try these questions in order:
-
"What are the main findings about temperature trends?"
- Should get a detailed answer about global temperature changes
- Check that paragraph IDs are shown
-
"How much has the Earth warmed since pre-industrial times?"
- Should mention specific temperature increases
- Look for numerical values and confidence levels
-
"What causes global warming?"
- Should mention greenhouse gases, human activities
- Check for scientific terminology
- Web interface loads without errors
- Chapter selection works
- Questions generate responses
- Paragraph IDs are displayed
- No error messages appear
- Answers are coherent and readable
- Information is scientifically accurate
- Responses are relevant to questions
- No repetitive or nonsensical text
- Technical terms are used appropriately
- Paragraph IDs are shown with answers
- IDs correspond to actual content
- Multiple sources are cited when relevant
- Source information is helpful for verification
- First question takes <60 seconds
- Follow-up questions are faster
- System remains responsive
- No memory leaks or crashes
Questions to test:
- "What is climate change?"
- "What are greenhouse gases?"
- "How do scientists measure temperature?"
What to look for:
- Clear, accurate definitions
- Scientific terminology
- Proper source citations
Questions to test:
- "What is the current global temperature?"
- "How fast is the Arctic warming?"
- "What are the temperature projections for 2100?"
What to look for:
- Specific numerical values
- Units of measurement
- Confidence intervals
- Time periods mentioned
Questions to test:
- "How does climate change affect extreme weather?"
- "What is the relationship between CO2 and temperature?"
- "How do feedback loops work in climate systems?"
What to look for:
- Clear explanations of relationships
- Scientific mechanisms
- Multiple factors considered
Questions to test:
- "How certain are scientists about climate change?"
- "What are the uncertainties in climate projections?"
- "Which climate impacts are most uncertain?"
What to look for:
- Discussion of confidence levels
- Acknowledgment of uncertainties
- Nuanced language about certainty
- 5: Excellent - Clear, accurate, comprehensive
- 4: Good - Mostly accurate, some minor issues
- 3: Adequate - Generally correct, some confusion
- 2: Poor - Significant errors or confusion
- 1: Very Poor - Inaccurate or nonsensical
- 5: Excellent - Clear, relevant sources
- 4: Good - Sources provided, mostly relevant
- 3: Adequate - Some sources, not always helpful
- 2: Poor - Sources unclear or irrelevant
- 1: Very Poor - No sources or misleading sources
- 5: Excellent - Fast, reliable, smooth
- 4: Good - Generally fast, occasional delays
- 3: Adequate - Acceptable speed, some issues
- 2: Poor - Slow, frequent problems
- 1: Very Poor - Very slow or crashes
- "What is the global average temperature?"
- "How much has the Earth warmed?"
- "What are the temperature trends?"
- "How do scientists measure global temperature?"
- "What causes global warming?"
- "How do greenhouse gases work?"
- "What are the main sources of CO2?"
- "How do human activities affect climate?"
- "What are the impacts of climate change?"
- "How does climate change affect weather?"
- "What are the effects on ecosystems?"
- "How does climate change affect humans?"
- "What are the future climate projections?"
- "How will temperatures change by 2100?"
- "What are the different emission scenarios?"
- "How certain are these projections?"
- "How do scientists study climate change?"
- "What is the evidence for climate change?"
- "How confident are scientists?"
- "What are the uncertainties?"
- Check: Python is installed (
python --version) - Check: Dependencies are installed (
pip install -r requirements.txt) - Check: Port 8501 is available
- Check: You're in the correct directory
- Check:
tests/ipcc/folder exists - Check: HTML files are present in chapter folders
- Check: Close other applications
- Check: At least 4GB RAM available
- Check: First run is always slower (downloading models)
- Check: Chapter is fully loaded
- Check: Question is clear and specific
- Check: Try rephrasing the question
Date: _______________
Tester: _____________
System Version: _______
Chapter Tested: _______
Questions Asked: ______
Answer Quality Scores:
- Q1: ___/5
- Q2: ___/5
- Q3: ___/5
Source Tracking Scores:
- Q1: ___/5
- Q2: ___/5
- Q3: ___/5
Performance: ___/5
Issues Found:
- ________________
- ________________
Suggestions:
- ________________
- ________________
Overall Rating: ___/5
- Specific questions you tested
- Quality scores for each question
- Issues encountered (with screenshots if possible)
- Suggestions for improvement
- Overall assessment
- GitHub Issues: For bugs and technical problems
- GitHub Discussions: For general feedback and suggestions
- Email: For private or detailed feedback
I tested the system with wg1/chapter02 and found:
Good:
- Clear answers about temperature trends
- Good source tracking with paragraph IDs
- Fast response times after initial load
Issues:
- One answer about CO2 was confusing
- Paragraph ID "executive_summary_p15" wasn't very helpful
- System was slow on first question (45 seconds)
Suggestions:
- Add more context to paragraph IDs
- Improve answer clarity for technical topics
- Add loading progress indicator
Overall: 4/5 - Very useful tool with room for improvement
After testing:
- Share your findings with the development team
- Discuss with colleagues who might use the system
- Suggest additional test cases based on your expertise
- Consider contributing to the project if you're interested
Thank you for helping improve the IPCC RAG system! 🌍📚