Hello, great work!
now, I'm looking to build a video autoqc system based on what you've done, aiming to spot artifacts in image2video-generated videos. The main focus is on detecting artifacts related to hand, face, and foot movements.
I have already annotated around 50,000 video-text pairs, where the video is the i2v generated video and the text is the corresponding artifact description.
However, I'm unsure how to construct the training data and train your model. Could you provide me with some better references or suggestions?
I will keep track of our work and provide feedback on my progress. Thank you!
Hello, great work!
now, I'm looking to build a video autoqc system based on what you've done, aiming to spot artifacts in image2video-generated videos. The main focus is on detecting artifacts related to hand, face, and foot movements.
I have already annotated around 50,000 video-text pairs, where the video is the i2v generated video and the text is the corresponding artifact description.
However, I'm unsure how to construct the training data and train your model. Could you provide me with some better references or suggestions?
I will keep track of our work and provide feedback on my progress. Thank you!