Probably in After Effects or an equivalent composting program.
This is my updtaed attempt at making this starting intro for videos.
Anyione have any suggestion how to make it 'better'?
It's too much. The views attention can't "fix" on anyone thing.
The reason it doesn't look like the example is because the person most likely used after effects or another program similar to after effects just like Jim previously mentioned. After effects allows you to use a camera in your comp which is why when they "pan" everything moves together it's because they are using a camera so the camera is what is moving not the single object like you can do in Premiere using the basic 3d effect.
If you want to make it truly look like your example your going to have to use after effects. Premiere isn't made to do things like that. Here is a site that will help you do things simlar to what you're wanting. This specific tutorial will show you how to use a camera in after effects using the "sure target" preset although you can totally do it without the sure target preset. The sure target preset just automates things for you a bit.
The only thing I have ever been able to do in Premiere to mimic a after effects camera is too nest a bunch of the "floating items" then keyframe them out at the exact same time you're keyframming the new items in. Although honestly at that point it really becomes way easier to just use after effects. Basically though in after affects you can do exactly what your example did by simply using a camera layer.