I just completed my first video filter that is CUDA accelerated using the Premiere Pro SDK for CS6 and CC7. If suitable CUDA hardware is available, it use it, but if not then it uses a multithreaded software implementation. Both work extremely well. However, the CUDA implementation could be a lot faster if the source and destination memory buffers were "pinned" memory. Since they are not, I must copy the source and destination memory to a pinned buffer, and then asynchronously copy that to CUDA device memory and back. The overhead to copy from the source/destination memory to pinned memory is significant. Without the copy to pinned memory the CUDA on my laptop is fast enought to process 130 fps for a 1920 by 1080 HD video. However with the pinned memory I only get about 45 fps.
If I do the exact same filter in DirectShow, the source and destination buffer pools are always pinned and the filter runs much faster than it does on Premiere Pro. I noticed that the new GPU filter example uses and AE interface, but it does allow access to pinned memory. However, I have not mastered the AE interface, and I am reluctant to giving up 13 years of learning curve on the Premiere SDK.
Is there any good reason why the source and destination buffer pool in the Premiere Pro SDK is not pinned memory?
Hey Gene, pinned memory only applies to host memory. The CUDA memory that you are given through the GPU suite is device resident so pinning does not apply and you can perform GPU computation directly from it without transfer.
That is my whole point. To copy from the PC host memory to the CUDA device memory asynchronously, the host memory must be pinned. Hence, the source and destination memory should be pinned. Otherwise, I must copy the source memory to pinned memory I have allocated on the PC, copy it asynchronously to the CUDA device memory, process it on the CUDA device, asynchronously copy it back to the PC pinned memory, and then copy it to the destination memory.
If you copy synchronously, it is slow as Christmas! Therefore, you must copy the memory asynchronously, or you should not use CUDA and GPU acceleration.
My question still stands. Why is the source and destination memory on the PC used by Premiere Pro not pinned memory?
Gene A. Grindstaff
Executive Manager, SG&I
T: 1.256.730.6983 M: 1.256.566.5376 F: 1.256.730.8046
19 Interpro Road
Madison, AL 35758 USA
LinkedIn<http://www.linkedin.com/groups?gid=127267&trk=myg_ugrp_ovr> | Facebook<http://www.facebook.com/intergraph> | Twitter<http://twitter.com/intergraph
That is my whole point. The memory on the PC that Premiere Pro for source and destination buffers should be pinned memory. To get any decent performance in copying memory from the PC to the CUDA device, you must copy asynchronously (i.e. multiple copies at the same time). However, asynchronously copies can only occur from pinned memory. Thus, I am forced to copy the source/destination memory to pinned memory before I copy it to/from the CUDA device memory.
So my question still stands.
The source and destination memory are the buffers pointed to by "theData" handle for any video filter. The source memory contains the video raster data that is input to the filter, and the destination memory is where the modified raster data is output to Premiere Pro.
I do not want to use the AE GPU interface that is used in the new GPU filter. I have used the Premiere Pro SDK for 13 years. I have too much invested to learn a new interface, and all of the hidden tricks I would need to know to make it work. Besides, the new interface is not available on previous of Premiere Pro, which I must support.
I noticed the interface in the new AE GPU filter uses pinned memory for the input and output buffers. My question is why does the Premiere Pro SDK not use pinned memory? It would be a lot more efficient and mjuch faster. If the AE GPU interface uses pinned memory, why doesn't the standard Premiere Pro interface use pinned memory?
I have gone through the example, and it looks interesting. I am not very familiar with the AE SDK, but it appears to be much like AE's. It reminds me of what a friend of mine says, "I can write FORTRAN in any language." While it is a little like Premiere's SDK, it reaks of AE. I will eventually learn it I guess, but I have a thousand questions.
A lot of our filters work on the whole frame, but do not require that the video be progressive. I eventually figured out how to do that in Premiere's SDK after several years of pain and suffering. Of course AE allows you to addres the video data as a field or frame. I also added the ability to handle areas of interest (regions) which Premiere does not support, but AE does. Is any of that supported in the new interface?
The ProcAmp sample in the SDK demonstrates GPU-accelerated rendering in PPro. You can think of it as two effects folded into one: A CPU rendering path, and a GPU rendering path. The CPU rendering path and parameter handling is implemented using the AE API, whereas the GPU rendering path uses the new GPU extensions.
As Steve points out, the GPU extensions can be fitted to an AE effect, or a PPro-style effect. So if you want to retrofit all your PPro-style effects to add GPU rendering, that is supported. You'll want to look at SDK_ProcAmp_GPU.cpp to see how the GPU rendering is implemented.
It has been a while since we last exchanged messages.
I understand what you and Steve are saying. However, I am not sure why I would use the new GPU interface. I used the old Premiere SDK that I have used for 13 years and implemented a filter that can run a multithreaded filter fast enought to run in real-time, but it also knows when a CUDA card is present and runs a different code path that fully uses the CUDA hardware. The CUDA version is fast enough to do 213 frames a second for a 1920 by 1080 video. The filter is not a trivial filter. It does some very complex calculations.
The only reason that I see to use the new interface is the following:
1. Premiere knows that I can run in real-time.
2. It uses an interface that has less overhead than the Premiere SDK.
3. I can get to some of the advanced AE api that does not exist in the Premiere SDK.
On the other hand, I have several reasons why I would not code according to the new GPU interface. They are as follows:
1. I have to learn a whole new interface that will take time to learn all of the hidden trick.
2. I would need to completely rewrite from scratch all 20 of my filters, which will take a long time and cost a lot of money.
3. I get little benefit from converting to the new interface, since I already have the preformance that I need.
So what did I miss? Is there some advantage that I missed because of my ignorance of the new interface?
The new interface is rather minimal, and is just an additional entry point for GPU rendering. What ever API you choose to build on top of (either the AE effect API, or the PPro filter API) will still be used for things like parameter definitions and UI.