Temporal-Modal Replay for Continual
Audio-Conditioned 3D Motion Generation

BEAT2 (Speech conditioned full body 3D gestures)

Vanilla - Session 4; Speaker 2 Scott

FGD: 2.129 (↓)

CSReL - Session 4; Speaker 2 Scott

FGD: 0.743 (↓)

GCR - Session 4; Speaker 2 Scott

FGD: 0.732 (↓)

TMR - Session 4; Speaker 2 Scott

FGD: 0.613 (↓)

Vanilla - Session 9; Speaker 2 Scott

FGD: 1.900 (↓)

TMR - Session 9; Speaker 2 Scott

FGD: 0.601 (↓)

MOSPA (moving towards sound source)

Vanilla - Session 2; Sensitive Genre

FID: 62.617 (↓)

CSReL - Session 2; Sensitive Genre

FID: 67.617 (↓)

GCR - Session 2; Sensitive Genre

FID: 40.144 (↓)

TMR - Session 2; Sensitive Genre

FID: 37.802 (↓)