Temporal-Modal Replay for Continual
Audio-Conditioned 3D Motion Generation
BEAT2 (Speech conditioned full body 3D gestures). The goal is to generate natural, coherent, and synchronized motion of the lips, facial expressions, hands, legs, and full body, which TMR is able to retain even after 9 long sessions while maintaining low FGD.
Vanilla - Session 4; Speaker 2 Scott
FGD: 2.129 (↓)
CSReL - Session 4; Speaker 2 Scott
FGD: 0.743 (↓)
GCR - Session 4; Speaker 2 Scott
FGD: 0.732 (↓)
TMR - Session 4; Speaker 2 Scott
FGD: 0.613 (↓)
Vanilla - Session 9; Speaker 2 Scott
FGD: 1.900 (↓)
TMR - Session 9; Speaker 2 Scott
FGD: 0.601 (↓)
MOSPA (moving towards sound source). The goal is to generate coherent 3D motion toward the sound source, which is clearly captured by TMR after the second session for the Sensitive genre.