Yes, shooting with a locked off camera is the simplest way.
You can then run an algorithm over all the frames to remove anything that is moving from each frame (leaving empty space). You can then populate all of the empty spaces with information from other frames where the same space is not empty.
Of course, anything standing still in the frame, that you don't want in the result, will still turn up in the result. These you would have to manually remove and populate with information from other sources - or just invent some plausible information to infill it.
Another way (far more elaborate) it is to take a lot of photographs (and I mean a lot) from different angles and run a 3D reconstruction algorithm over it. These algorithms remove inconsistencies (such as people walking through shot) leaving behind a 3D model of what was otherwise static (was otherwise a rigid body) in the scene. Can takes a long time to generate a high quality 3D model from the photographs, but the benefit is that you can then design and render a tracking shot. Work in this area is still an ongoing thing - but archeologists and those working in forensics are nevertheless using it and new techniques are always evolving. It's called "photogrammetry". So while the results may not be currently that great (although certainly not bad) shooting in anticipation of future algorithms becoming available won't hurt.
And one can always give the photographs to a good 3D modeller to work with, to recreate the scene.