Looks like it was shot on green screen with markers. The markers were tracked and the background was built in 3d and then composited with the green screen video.
Irrespective of how it was done in this particular video, if you want to pin something to an object in your shot, you need to track this object, if you want to embed e.g. a 3D model into your shot, you need to solve camera motion. See this help section on solving camera motion and that tutorial on tracking in perspective with Mocha: