6 Replies Latest reply on Mar 4, 2013 2:19 PM by ECBowen

    Interested by performance issue ?  Read this !  If you can explain, you're a master Jedi !

    goodastruff Level 1

      This is the question we will try to answer...

      What si the bottle neck (hardware) of Adobe Premiere Pro CS6

       

      I used PPBM5 as a benchmark testing template.

      All the data and log as been collected using performance counter

       

      First of all, describe my computer...

       

      Operating System

      Microsoft Windows 8 Pro 64-bit

      CPU

      Intel Xeon E5 2687W @ 3.10GHz

      Sandy Bridge-EP/EX 32nm Technology

      RAM

      Corsair Dominator Platinum 64.0 GB DDR3

      Motherboard

      EVGA Corporation Classified SR-X

      Graphics

      PNY Nvidia Quadro 6000

      EVGA Nvidia GTX 680   // Yes, I created bench stats for both card

      Hard Drives

      16.0GB Romex RAMDISK (RAID)

      556GB LSI MegaRAID 9260-8i SATA3 6GB/s 5 disks with Fastpath Chip Installed (RAID 0)

      I have other RAID installed, but not relevant for the present post...

      PSU

      Cosair 1000 Watts

       

       

      After many days of tests, I wanna share my results with community and comment them.

       

      CPU Introduction

      I tested my cpu and pushed it at maximum speed to understand where is the limit, can I reach this limit and I've logged precisely all result in graph (See pictures 1).

       

      1. Intro : I tested my E5-XEON 2687W (8 Cores Hyperthread - 16 threads) to know if programs can use the maximum of it.  I used Prime 95 to get the result.  // I know this seem to be ordinary, but you will understand soon...
      2. The result : Yes, I can get 100% of my CPU with 1 program using 20 threads in parallel.  The CPU gives everything it can !
      3. Comment : I put 3 IO (cpu, disk, ram) on the graph of my computer during the test...

      Benchmark CPU Usage.png

      (picture 1)

       

      Disk Introduction

      I tested my disk and pushed it at maximum speed to understand where is the limit and I've logged precisely all result in graph (See pictures 2).

       

      1. Intro : I tested my RAID 0 556GB (LSI MegaRAID 9260-8i SATA3 6GB/s 5 disks with Fastpath Chip Installed) to know if I can reach the maximum % disk usage (0% idle Time)
      2. The result : As you can see in picture 2, yes, I can get the max of my drive at ~ 1.2 Gb/sec read/write steady !
      3. Comment : I put 3 IO (cpu, disk, ram) on the graph of my computer during the test to see the impact of transfering many Go of data during ~10 sec...

      Benchmark Disk Usage.png

      (picture 2)

       

      Now, I know my limits !  It's time to enter deeper in the subject !

       

      PPBM5 (H.264) Result

      I rendered the sequence (H.264) using Adobe Media Encoder.

       

      1. The result :
        1. My CPU is not used at 100%, the turn around 50%
        2. My Disk is totally idle !
        3. All the process usage are idle except process of (Adobe Media Encoder)
        4. The transfert rate seem to be a wave (up and down).  Probably caused by (Encrypt time....  write.... Encrypt time.... write...)  // It's ok, ~5Mb/sec during transfert rate !
        5. CPU Power management give 100% of clock to CPU during the encoding process (it's ok, the clock is stable during process).
        6. RAM, more than enough !  39 Go RAM free after the test !  // Excellent
        7. ~65 thread opened by Adobe Media Encoder (Good, thread is the sign that program try to using many cores !)
        8. GPU Load on card seem to be a wave also ! (up and down)  ~40% usage of GPU during the process of encoding.
        9. GPU Ram get 1.2Go of RAM (But with GTX 680, no problem and Quadro 6000 with 6 GB RAM, no problem !)
      2. Comment/Question : CPU is free (50%), disks are free (99%), GPU is free (60%), RAM is free (62%), my computer is not pushed at limit during the encoding process.  Why ????  Is there some time delay in the encoding process ?
      3. Other : Quadro 6000 & GTX 680 gives the same result !

      Benchmark PPBM5 Media Encoder H.264.png

      (picture 3)

       

      PPBM5 (Disk Test) Result (RAID LSI)

      I rendered the sequence (Disk Test) using Adobe Media Encoder on my RAID 0 LSI disk.

       

      1. The result :
        1. My CPU is not used at 100%
        2. My Disk wave and wave again, but far far from the limit !
        3. All the process usage are idle except process of (Adobe Media Encoder)
        4. The transfert rate wave and wave again (up and down).  Probably caused by (Buffering time....  write.... Buffering time.... write...)  // It's ok, ~375Mb/sec peak during transfert rate !  Easy !
        5. CPU Power management give 100% of clock to CPU during the encoding process (it's ok, the clock is stable during process).
        6. RAM, more than enough !  40.5 Go RAM free after the test !  // Excellent
        7. ~48 thread opened by Adobe Media Encoder (Good, thread is the sign that program try to using many cores !)
        8. GPU Load on card = 0 (This kind of encoding is GPU irrelevant)
        9. GPU Ram get 400Mb of RAM (No usage for encoding)
      2. Comment/Question : CPU is free (65%), disks are free (60%), GPU is free (100%), RAM is free (63%), my computer is not pushed at limit during the encoding process.  Why ????  Is there some time delay in the encoding process ?

      Benchmark PPBM5 Media Encoder Disk Test.png

      (picture 4)

       

      PPBM5 (Disk Test) Result (Direct in RAMDrive)

      I rendered the same sequence (Disk Test) using Adobe Media Encoder directly in my RamDrive

      1. Comment/Question : Look at the transfert rate under (picture 5).  It's exactly the same speed than with my RAID 0 LSI controller.  Impossible !  Look in the same picture the transfert rate I can reach with the ramdrive (> 3.0 Gb/sec steady) and I don't go under 30% of disk usage.  CPU is idle (70%), Disk is idle (100%), GPU is idle (100%) and RAM is free (63%).  // This kind of results let me REALLY confused.  It's smell bug and big problem with hardware and IO usage in CS6 !

      Benchmark PPBM5 Media Encoder Disk Test On RamDrive.jpg

      (picture 5)

       

      PPBM5 (MPEG-DVD) Result

      I rendered the sequence (MPEG-DVD) using Adobe Media Encoder.

       

      1. The result :
        1. My CPU is not used at 100%
        2. My Disk is totally idle !
        3. All the process usage are idle except process of (Adobe Media Encoder)
        4. The transfert rate wave and wave again (up and down).  Probably caused by (Encoding time....  write.... Encoding time.... write...)  // It's ok, ~2Mb/sec during transfert rate !  Real Joke !
        5. CPU Power management give 100% of clock to CPU during the encoding process (it's ok, the clock is stable during process).
        6. RAM, more than enough !  40 Go RAM free after the test !  // Excellent
        7. ~80 thread opened by Adobe Media Encoder (Lot of thread, but it's ok in multi-thread apps!)
        8. GPU Load on card = 100 (This use the maximum of my GPU)
        9. GPU Ram get 1Gb of RAM
      2. Comment/Question : CPU is free (70%), disks are free (98%), GPU is loaded (MAX), RAM is free (63%), my computer is pushed at limit during the encoding process for GPU only.  Now, for this kind of encoding, the speed limit is affected by the slower IO (Video Card GPU)
      3. Other : Quadro 6000 is slower than GTX 680 for this kind of encoding (~20 s slower than GTX).

      Benchmark PPBM5 Media Encoder MPEG-DVD.png

      (picture 6)

       

      Encoding single clip FULL HD AVCHD to H.264 Result (Premiere Pro CS6)

      You can look the result in the picture.

       

      1. Comment/Question : CPU is free (55%), disks are free (99%), GPU is free (90%), RAM is free (65%), my computer is not pushed at limit during the encoding process.  Why ????   Adobe Premiere seem to have some bug with thread management.  My hardware is idle !  I understand AVCHD can be very difficult to decode, but where is the waste ?  My computer want, but the software not !

      Benchmark Premiere Pro AVCHD in H.264.png

      (picture 7)

       

      Render composition using 3D Raytracer in After Effects CS6

      You can look the result in the picture.

       

      1. Comment : GPU seems to be the bottle neck when using After Effects.  CPU is free (99%), Disks are free (98%), Memory is free (60%) and it depend of the setting and type of project.
      2. Other : Quadro 6000 & GTX 680 gives the same result in time for rendering the composition.

      Benchmark After Effects 3d Raytracer.png

      (picture 8)

       

       

      Conclusion

      There is nothing you can do (I thing) with CS6 to get better performance actually.  GTX 680 is the best (Consumer grade card) and the Quadro 6000 is the best (Profressional card).  Both of card give really similar result (I will probably return my GTX 680 since I not really get any better performance).  I not used Tesla card with my Quadro, but actually, both, Premiere Pro & After Effects doesn't use multi GPU.  I tried to used both card together (GTX & Quadro), but After Effects gives priority to the slower card (In this case, the GTX 680)

       

      Premiere Pro, I'm speechless !  Premiere Pro is not able to get max performance of my computer.  Not just 10% or 20%, but average 60%.  I'm a programmor, multi-threadling apps are difficult to manage and I can understand Adobe's programmor.  But actually, if anybody have comment about this post, tricks or any kind of solution, you can comment this post.  It's seem to be a bug...

       

      Thank you.

        • 1. Re: Interested by performance issue ?  Read this !  If you can explain, you're a master Jedi !
          Harm Millaard Level 7

          Patrick,

           

          I can't explain everything, but let me give you some background as I understand it.

           

          The first issue is that CS6 has a far less efficient internal buffering or caching system than CS5/5.5. That is why the MPEG encoding in CS6 is roughly 2-3 times slower than the same test with CS5. There is some 'under-the-hood' processing going on that causes this significant performance loss.

           

          The second issue is that AME does not handle regular memory and inter-process memory very well. I have described this here: Latest News

           

          As to your test results, there are some other noteworthy things to mention. 3D Ray tracing in AE is not very good in using all CUDA cores. In fact it is lousy, it only uses very few cores and the threading is pretty bad and does not use the video card's capabilities effectively. Whether that is a driver issue with nVidia or an Adobe issue, I don't know, but whichever way you turn it, the end result is disappointing.

           

          The overhead AME carries in our tests is something we are looking into and the next test will only use direct export and no longer the AME queue, to avoid some of the problems you saw. That entails other problems for us, since we lose the capability to check encoding logs, but a solution is in the works.

           

          You see very low GPU usage during the H.264 test, since there are only very few accelerated parts in the timeline, in contrast to the MPEG2-DVD test, where there is rescaling going on and that is CUDA accelerated. The disk I/O test suffers from the problems mentioned above and is the reason that my own Disk I/O results are only 33 seconds with the current test, but when I extend the duration of that timeline to 3 hours, the direct export method gives me 22 seconds, although the amount of data to be written, 37,092 MB has increased threefold. An effective write speed of 1,686 MB/s.

           

          There are a number of performance issues with CS6 that Adobe is aware of, but whether they can be solved and in what time, I haven't the faintest idea.

           

          Just my $ 0.02

          • 3. Re: Interested by performance issue ?  Read this !  If you can explain, you're a master Jedi !
            goodastruff Level 1

            Thanks, I would appreciate if Adobe Programmor guys can give their comments about the post.  For After Effects, as you can see in the stat charts (see picture under), the GTX 680 GPU Load is at 100% during the rendering.  This tell me 2 things.  If the monitor tell truth, then the card with all CUDA core on it is used at maximum capability.  If the monitor (GPU-Z) don't get the correct load value, than, I agree that the card can be not used at maximum capability.

            Benchmark After Effects 3d Raytracer.png

             

            I would be interested by people giving better result for that kind of benchmark with the same tests lab.  Did anybody do test with Quadro + Tesla card ?

             

            Thanks for comment.

            • 4. Re: Interested by performance issue ?  Read this !  If you can explain, you're a master Jedi !
              ECBowen Most Valuable Participant

              I have stated many times that once the Cores/threads meet or exceed the codec's requirement to decode/encode optimally then GHz is far more important than threading. What you are running into is the diminishing returns when you exceed that requirement based on the amount of frames PP/AME is caching at one time for encoding. This has been lowered/optimized since CS5 to allow lower end systems to handle Adobe far better than they use to when CS5 first released. The downside is the frame caching optimization does not scale with the resources you have available. So simply put if you exceed what you require to encode optimally then the rest of the resources will remain idle. This is also why we have seen some performance issues with certain codecs and higher end systems. Adobe's player is not scaling up to use the resources required to playback the material realtime rather than use 10% and have stuttered playback. I have seen this with the Dual Xeons and AVCHD. However try loading a Red 4K+ timeline in and then you will see the MPE engine really kick into gear and the performance is pushed to the ceiling of the available resources immediately. As with all new technologies, some things just require time and testing to find the optimal performance. This is one of them and I expect it to be resolved at some point.


              Eric

              ADK

              • 5. Re: Interested by performance issue ?  Read this !  If you can explain, you're a master Jedi !
                goodastruff Level 1

                Thanks for the answer. 

                 

                It sounds to me logical that if codec constraint doesn't use efficiently the multi-threadling (CPU/GPU+CUDA), higher clock will help in this situation.  Now, I have a PNY Nvidia Quadro 6000 & GeForce GTX 680 with completely different clock GPU.  GTX is far away from my Quadro 6000, and the price of my Quadro 6000 is far away from my GTX.  Do you suggest me to keep my Quadro or keep my GTX ?

                 

                                                   Quadro               GTX 680

                GPU Clock                   574 MHz            1059 MHz/1124 MHz with boost

                Memory Clock              747 MHz            1552 MHz

                Memory Bandwidth       143.4 GB/s         198.7 GB/s

                Texture Fillrate              32.1 GTexel/s     135.6 GTexel/s

                CUDA                          448                    1536

                 

                The 6GB RAM of Quadro isn't used in video production (All the facts/tests tell that the max usage of the memory is about ~1Gb/Ram in heavy situation of encoding)

                 

                I'm really confused about what to do actually...  I know Adobe & Nvidia recommand Quadro for video production  (Maybe there is some marketing behind the scene), but the facts tell that there is a slight difference between Quadro & GTX giving GTX winner by a some seconds in rendering process.  I don't wanna sell a 3000.00$ card that will be activated in CS6.5 or CS7.  I understand also that Quadro manage ECC memory, but for instance, I'm interested by performance and reducing my waiting time vs stability and 100% no error mode.  What is the PROS and CONS of that decision ?

                 

                In résumé, to get better performance actually, we have to wait for a product upgrade ? 

                Thanks for you feedback, it's really appreciated and hope, this will help people doing good choice for the moment.

                • 6. Re: Interested by performance issue ?  Read this !  If you can explain, you're a master Jedi !
                  ECBowen Most Valuable Participant

                  Right now, only the Quadro K5000 has the current 600 series GPU's. I would not even consider a Quadro card other than that until the Quadro cards are all updated. The only reason to get the Quadro right now is 10bit color preview with Adobe. Adobe only supports 10 bit color output via Open GL and that requires the Quadro since the Geforce cards only output 10bit color via Direct X. So the better card listed above would be the 680GTX card provided you did not require a 10bit color workflow. If you do then the consideration would be the Quadro K5000 or a I/O card such as Blackmagic/Aja/Matrox.

                   

                  Eric

                  ADK