Review the finetuning experiments we've done from the relevant notes in this order:
Review papers:
Review and identify what are the next steps for improving 7B CUA models on our perturbation evaluation (especially on the spatial relational instructions). I have listed some directions such as improving the training data mix, and using RL training methods instead of SFT with LoRA.
You should investigate for all possible directions (breadth first) and find evidences and justifications / uncertainties for how to prioritize these investigation directions based on the risks and time required.
The next step after this ticket will be investigating more in depth on the top priority direction and design minimal rapid experiments to validate specific hypothesis (e.g., do this in this way will improve 7B model on our perturbation eval spatial relational instructions)
Review the finetuning experiments we've done from the relevant notes in this order:
Review papers:
Review and identify what are the next steps for improving 7B CUA models on our perturbation evaluation (especially on the spatial relational instructions). I have listed some directions such as improving the training data mix, and using RL training methods instead of SFT with LoRA.
You should investigate for all possible directions (breadth first) and find evidences and justifications / uncertainties for how to prioritize these investigation directions based on the risks and time required.
The next step after this ticket will be investigating more in depth on the top priority direction and design minimal rapid experiments to validate specific hypothesis (e.g., do this in this way will improve 7B model on our perturbation eval spatial relational instructions)