Why is this hard?

Input RGB photo
Input image

Highly ill-posed (e.g. depth ambiguity).

Expert HPS[1] & OPS[2]

Contact couples body & object.

With HOI representation
[1] CameraHMR, Patel et al 3DV'25
[2] SAM3D, SAM3D team arXiv'25