Where the gain comes from

Architecture Tokenized LEXIS beats Gaussian VAE
Guidance signal mask + IF guidance
beats
no guidance
Variant $\mathrm{CD}_{\mathrm{hum}}$ ↓ $\mathrm{CD}_{\mathrm{obj}}$ ↓
GaussFlow 13.4557.25
FlowPose 65.05
LEXIS-Flow baseline 9.6848.01
Unguided 9.6848.01
+ mask ($\mathcal{L}_{\mathrm{mask}}$) 9.4643.51
+ InterField ($\mathcal{L}_{\mathrm{IF}}$) 9.0541.01
$\mathcal{L}_{\mathrm{mask}}$ + $\mathcal{L}_{\mathrm{IF}}$ 8.85 35.01
−27% $\mathrm{CD}_{\mathrm{obj}}$ vs unguided