as I mentionned earlier it's unfortunately just a concept as I'm not a developer.
What I thought about creating the tutorials was pretty similar to what the Tutorial Builder is atm.
I wanted the system just to record the users input. Next step would be to recognize what the user's input does in the actual application. With this info, an automated description gets generated. Because all this happens automatically it has to be reviewed later.
Playback is meant to work pretty similar like Growl: it's just a window that's placed dynamically(near the place where the next action has to be done) and shows the information what to do next. As the system is recording the users input here as well, it notices when the next step has been fulfilled, moves the window and shows the next step.
Entering a tutorial at a certain step would work similar like actions in Photoshop. So the tutorial becomes built up from zero until the point where the user wants to enter.
About the APIs: No idea how that would work^^I'm a communication designer;)
I hope this answers some of your questions:)
Thanks for your interest:)