Hi, thank you for your cool work! I've read data.md but still couldn't understand how to make a training dataset for training the vision-language model using videos. Could anyone kindly share an example format of the training dataset? Thanks
Hi, thank you for your cool work!
I've read data.md but still couldn't understand how to make a training dataset for training the vision-language model using videos.
Could anyone kindly share an example format of the training dataset?
Thanks