You will probably need some expressions. The two tracks simply don't know about each other and your coordinates are in both cases based on the "neutral" initial footage.
Yes, I was thinking of that as a possibility, but I don't know what expression would work.
I may use a 3D tracker instead like Synth Eyes.
You should stabilize the shot of the iPad so there's no movement. Then track your finger and add the objects. Then reintroduce the motion to the shot and the track. It is simple if you break it down and you have planned the shot well enough.
If you post the shot on YouTube I'll send you a soluton in a day or so.
I didn't think of stabilizing it, but I'd probably need Mocha Pro for a good stabilizer. But I actually got it to work using Mocha AE, I duplicated the Fill layer that contains the iPad motion track, including corner pin, all the position scale and rotational data. I then masked off the area that I want to create as a drag and drop item next to where the finger is touching the screen. After creating a null in AE with the finger tracking data from Mocha AE, I used an expression (thisComp.layer("Finger Null").transform.position) on my Fill layer simply by using the pickwhip to reference the position of the Finger Null. It's not hard once you know what to do. I haven't used expressions too much myself which probably made this task harder to figure out. I should point out that the new drag & drop item that now tracks the finger doesn't at this point correspond to the scaling inherent in the perspective, but it does follow the rotation nicely and really creates the illusion that the icon is following the finger at the surface of the iPad Screen. Perhaps there might be a work around using another way other than the mask without having to use a 3D tracker like Synth Eyes. Might look into this a little more.
Here's the final test comp: