You create your artwork, animate the position in Z or scale, pre-compose, then apply the echo effect to the pre-comp and adjust the parameters. Making the layer 3D and animating the Z position will give you the best look.
I suggest that you type Echo Effect in the Search Help field and check out the community resources for details on how to use the effect. It takes a bit of fiddling around to get the look you want and the adjustments for this kind of effect are very small.
Pre-composing structures the AE render pipeline and doesn't just serve as a means of grouping items. Effects like Echo may require pre-composing since they require multiple pixel buffers to be accumulated and pre-composing your animation does just that. It's basically like the effect saying "What does the rendered/ flattened result look like on frame X? Can I have it please?!" as opposed to requesting native transform values or some such thing.