Review experiment status & potential next steps

Review the finetuning experiments we've done from the relevant notes in this order:
- [Blog outline finetuning experiment sections](https://coda.io/d/_dzujAXxemDw/3-part-blog-outline_su70LMop#_luzVjnZG)
- [Analysis](https://coda.io/d/Software-Control-Agents-Knowledge-Base_dzujAXxemDw/LoRA-experiment-Analysis_su2yTWFr#_luEBN8w8)
- [Raw experiment notes](https://coda.io/d/Software-Control-Agents-Knowledge-Base_dzujAXxemDw/Debug-LoRA-Configs_suTzXou8#_luF0AsiA) - This is the temp notes I kept while I did the experiments, some of the notes were outdated, use the blog outline and analysis as the high abstracted source of truth

Review papers:
- [SpatialLadder](https://www.alphaxiv.org/abs/2510.08531)
- GTA1 paper
- [An Efficient Training Pipeline For Graphical User Interface Reasoning Models](https://www.alphaxiv.org/abs/2511.08172)
- AutoGUI
- ZonUI
- Other relevant papers you find

Review and identify what are the next steps for improving 7B CUA models on our perturbation evaluation (especially on the spatial relational instructions). I have listed some directions such as improving the training data mix, and using RL training methods instead of SFT with LoRA. 

You should investigate for all possible directions (breadth first) and find evidences and justifications / uncertainties for how to prioritize these investigation directions based on the risks and time required.

The next step after this ticket will be investigating more in depth on the top priority direction and design minimal rapid experiments to validate specific hypothesis (e.g., do this in this way will improve 7B model on our perturbation eval spatial relational instructions)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review experiment status & potential next steps #226

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Review experiment status & potential next steps #226

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions