This is just how it works. 3D text layers respond to changes in perspective like any other layer and thus sizes can visually vary across different shots. Not sure what else you expect, even more so since AE's tracker can't be calibrated. Even if it did, you'd still have to do the math and calculate a new size for every shot. Things being as they are, you just have to muddle through and gauge everything manually.
I go to my next shot to do the exact same thing, I follow the exact same procedure, but this time the font is a different size, even though the character settings are the same. Even though both fonts are 30px they show as drastically different sizes.
Text in Ae have 3 parameters that control his apparent size on screen (not taking into account effects or text animators): the font size in pixels that is a parameter inside the character panel, Scale, and the Z position when in 3D.
if your Text is 30px you still have 2 variables that can change. when you add your text to a target in the 3D Camera Tracker, Ae does this on his own and creates this Scale/Z position arrangement based on the target size you set in 3D Tracker and the camera information. this makes it pretty unpredictable as to what exactly the size or position in Z space/scale you are going to get eventually = the appearance of your text on screen is different from each shot. every track is different because every tracking point in 3D space is different and every camera move is different - as you can see these are many variables so you can't predict that the text will be the same size.
you can somewhat control the relative size by increasing or decreasing the target size
I'm left to eyeball the right size and stroke for each shot,
probably. there are ways to make it easy on yourself - you can work with snapshots.
Maybe this will help. It's all about the size of the target.