Say I have a video from a random camera somewhere in the world. The camera view is fixed and we don’t know neither its intrinsic nor extrinsic parameters.<p>Is there a way of automatically place 3D objects in the scene? That is automatically find a ground plane and somehow calibrate it using reference objects.<p>Is this scalable? Say I have thousands of different videos, now any manual step is not feasible. Is there a way of using the video itself for this calibration problem?
depth estimation, scene reconstruction, 3d reconstruction, image to 3d model, cctv scene reconstruction - all good phrases to start searching from<p>There's lots of research, mostly based on deep vision models / transformers (algo)