Skip to content

Remove 4 26#15

Open
dalyw wants to merge 9 commits into
mainfrom
remove-4-26
Open

Remove 4 26#15
dalyw wants to merge 9 commits into
mainfrom
remove-4-26

Conversation

@dalyw
Copy link
Copy Markdown
Collaborator

@dalyw dalyw commented May 20, 2026

No description provided.

dalyw added 8 commits May 16, 2026 14:40
In unitprocess_json file:
Adding Denitrification Filter to UP list
Moving anaerobic filter out of fixed film category
Moving biosolids lagoon out of disposal category and adding Lagoon as secondary category
Cleaning up some alt_names

Deleting old llm output files (using facility name rather than place ID)
Modified LLM prompt to further encourage structured output and adherence to ontology categories

Expanding San Jose example to include solids, disinfection

Renaming "truth" to "manual reading" throughout after figure_2
Renaming cwns_processes_by_facility to cwns_unit_processes_by_facility for consistency with LLM file

Updating README
@dalyw dalyw marked this pull request as ready for review May 29, 2026 20:19
@dalyw dalyw requested a review from fletchapin May 29, 2026 20:19
Copy link
Copy Markdown
Contributor

@fletchapin fletchapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks great! I just had a minor comment about the date folder

sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))

DATE_FOLDER = "2026-5-15"
DATE_FOLDER = "2026-5-25"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the date important for publication? I guess I'm wondering if we can just remove a level of nesting and publish this data directly in the output folder

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date only matters if we re-run every ~6-12 months as permits are updated to keep separate versions of the results.

But we could keep these analysis scripts “flat” and then save any date-specific versions in the Stanford Digital Repository output file - how does that sound?

from helpers.plotting import make_grouped_legend, save_and_close, set_thick_spines

DATE_FOLDER = "2026-5-15"
DATE_FOLDER = "2026-5-25"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about the date

Adding final unit_processes_by_facility.csv file with both datasets

Fixing bug in step4 where "offsite" laction wasn't being used

Re-running model comparison with final ontology and Place ID suffix on filenames
Updating step5_llm_extraction with higher token limits for gpt-5 and to save manifest/token csv rows after every facility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants