PointOdyssey is a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 159 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks. Our data and code are publicly available.
To make the data useful for multiple tasks, we provide multi-modal data: RGB, depth, instance segmentation, surface normals, camera intrinsics, camera extrinsics, and 2D and 3D point trajectories. For a subset of the scenes, we include multiple synchronized views.
We randomly generate physically realistic and semantically plausible scenes, by sampling human and animal subjects, motion trajectories for the subjects and the camera, 3D physical assets, materials, environment maps for outdoor scenes, manually created environments for indoor scenes, as well as lighting and atmospheric effects. From these scenes we render videos, paired with various ground truth.
We modify PIPs, greatly widening its 8-frame temporal window, and incorporating a template-update mechanism. Experimental results show that our method achieves higher tracking accuracy than all existing methods, both on the PointOdyssey test set and real-world benchmarks.