13 Replies Latest reply: Jul 30, 2010 11:09 AM by Bernd Paradies RSS

    Are AS3 timers make use of multi processor cores?

    bkamenov Community Member

      Can anyone tell if a Timer in AS3 will force Flash player to create a new Thread internally and so make use of another processor core on a multicore processor?

       

      To Bernd: I am asking because onFrame I can just blit the prepared in a timer handler function frame. I mean to create a timer running each 1ms and call the emulator code there and only show generated pixels in the ON_ENTER_FRAME handler function. In that way, theoretically, the emulator will use in best case a whole CPU-core. This will most probably not reach the desired performance anyway, but is still an improvement. Still, as I mentioned in my earlier posts, Adobe should think of speeding up the AVM. It is still generally 4-5 slower than Java when using Alchemy. Man, there must be a way to speed up the AVM, or?

       

      For those interested what I am implementing, look at:

      Sega emulated games in flash

       

      If moderators think that this link is in some way an advertisement and harms the rules of this forum, please feel free to remove it. I will not be offended at all.

        • 1. Re: Are AS3 timers make use of multi processor cores?
          davr64 Community Member

          GameLure looks completely illegal to me. Not only have you pirated the games, but you are trying to make a profit from it. (Feel free to prove me wrong and show me that you have an actual agreement with Sega, Activision, etc to post their games online, but somehow I highly doubt it)

           

          But to answer your technical question, no, the AVM is completely single threaded. Timers, Events, Functions, etc it's all a single thread. The most you can use is a single core in actionscript. If you can find some way to use pixel shaders, those are in fact multithreaded, but the type of code you can place in those is pretty restricted.

          • 2. Re: Are AS3 timers make use of multi processor cores?
            bkamenov Community Member

            GameLure looks completely illegal to me. Not only have you pirated the games, but you are trying to make a profit from it.

            As far as I know those games are long time abandonware. Moreover, there a majority of sites doing the same in the core of US. I do not think that they can exists there for longer than a few months if they were illegal.

             

            If I am wrong, Sega may contact me and will remove the content if it really harms their profit and rights.

            • 3. Re: Are AS3 timers make use of multi processor cores?
              davr64 Community Member

              "Abandonware" has no legal meaning. Everything produced since videogames were invented are still covered by copyright (personally I think copyright terms are waaay too long, but that's another story). Sega still sells lots of things based on the Sonic games, including compilations of old games. So even if "Abandonware" was a real thing, it would be hard to argue, as compared to something taken from a company that no longer exists for example. Also these companies often don't waste time going after the smaller infringers, since there can be so many of them. Same issue with TV/music piracy, there's tons of sites out there, often operating for a long time, they spend their money on lawyers only going after the biggest players. So maybe you'll be fine, but don't kid yourself about what you're doing.

              • 4. Re: Are AS3 timers make use of multi processor cores?
                bkamenov Community Member

                You may be right...

                 

                Maybe I need to discuss this with Sega. But as I say you may download or play any 8bit game on thousands of US sites. If this was a problem to anyone, they surely would try to limit it somehow. I have never heard of any website offering master system games to be acused of being illegal and even softly being asked to remove their content. But to clear this question once and for all I'll contact Sega.

                • 5. Re: Are AS3 timers make use of multi processor cores?
                  Max Lord 1234 Community Member

                  Ignoring the legality question, I do not think a Timer is a good idea. Since almost all the work is rendering, it makes sense to be synchronized with the frame rate.

                   

                  Perhaps if you have other crunching to do in the background, some kind of green-threading system would allow that to happen without killing the rendering event handlers. But that is the only case I can see for threading the work.

                  • 6. Re: Are AS3 timers make use of multi processor cores?
                    bkamenov Community Member

                    It is not smart idea, but it would be if FP was multithreaded. Sync would be done with audio, I have already implemented this type of sync in native code. Still, it was a real fun playing with Alchemy. At first, I've tried Adobe Flash CSsomething but I was not able to do anything with this soft - it was too complex to be learned in few hours. So, alchemy was the best shot. Although Alchemy is not about developing apps with Flash, I'd use it exactly for that. Very good toll, even MS do not have such a cool thing.

                    • 7. Re: Are AS3 timers make use of multi processor cores?
                      Bernd Paradies Adobe Employee

                      Hello Boris,

                       

                      Alchemy is slower than Java and C but much faster than "regular" ActionScript. For more details see this post:

                      http://forums.adobe.com/thread/662605?tstart=0

                       

                      In regards to increasing the frame rate and getting more calls to your onFrameEnter() I suspect that  it is not the AVM engine that is the bottleneck.

                      I am proposing the following experiment: implement a Sprite with onFrameEnter that measures fps but does nothing else. Then set the framerate to 2000, compile, and run your SWF. My guess is that you get an actual frame rate between 300 and 800 fps. That fps would roughly mark the technical limits of the AVM engine.

                       

                      If your game shows lower frame rates it is most likely that your games need to be optimized further. But I am sure there are also limits to optimizing your code. On the other hand: do you really need all those onFrameEnter calls? Another observation that I would like to share is that it is very important to reach 24 fps but higher frame rates won't increase the quality of your game. QuakeFlash actually limits the fps to 24. You can see that if you type in TIMEDEMO DEMO1 in the QuakeFlash console. I am getting 35 fps. There must be something in human's perception that starts accepting moving pictures at around 24 fps as reality. At 15 fps you are looking *at* the game on the screen, at 24 fps you get sucked *into* the game. I believe regular movies also only have 24 fps.

                      The last time I checked your game showed between 20-24 fps. If that's the case then you are good in my opinion.

                       

                      Best wishes,

                       

                      - Bernd

                      • 8. Re: Are AS3 timers make use of multi processor cores?
                        bkamenov Community Member

                        Hello Bernd,

                         

                        the game is an emulated ROM image from a real machine. In order to make it work , the code must behave as a real machine on which this game normally runs. The FPS does not have really the same meaning as it normally does. Depending on the game display mode (PAL or NTSC) the emulator must process a given number of X cycles in 50 or 60 steps for 1 second respectively. Each step fills a matrix with pixel lines - generates a frame, and creates 1/50 or 1/60 of the sound. If you leave it running at 24 fps you'll get only 24/50 or 24/60 of the sound for 1 second and what you will hear is chopy sound as you have already heard. So, to have a normally working emulation you must run all the 50 or 60 steps for 1 second. One optimization which I made is to avoid the generation of the frame (the image part) on each step. Now the emulator creates an image each second run resulting in image frame rate of 25 or 30 fps. But this gave only extra 3 fps for the whole thing, because the code is realtively fast. ANyway, I'll wait for new versions of alchemy and flash player. May be in few years it will be fast enough. I have tried the native code on my WindowsMobile phone and there it runs smoothly as it should. The generated SWF does not run under linux for some reason, but I do not know why and I do not really care. Just for fun I have implemented a Mozilla plugin and now I can emulate Sega Mega Drive in Firefox at full speed, but this is only for fun and to learn more about the plugin system of Mozilla. For other than "fun" purposes flash would be the best option.

                         

                        Boris

                        • 9. Re: Are AS3 timers make use of multi processor cores?
                          Bernd Paradies Adobe Employee

                          Hello Boris,

                           

                          thanks for taking the time and explaining why your project needs 60 fps. If I understand you correctly those 60 fps are necessary to maintain full audio samples rate. You said your emulator collects sound samples at the frame rate and the reduced sampling rate of 24/60 results in "choppy sound". Are there any other reasons why 60 fps are necessary? The video seems smooth.

                           

                          That "choppy sound" was exactly what I was hearing when you sent me the source code of your project. But did you notice that I "solved" (read: "hacked around") the choppy sound problem even at those bad sampling rates? First off, I am not arguing with you about whether you need 60fps, or not. You convinced me that you do need 60fps. I still want to help you solve your problem (it might take a while until you get a FlashPlayer that delivers the performance you need).

                           

                          But maybe it is a good time to step back for a moment and share some of the results of your and my performance improvements to your project first. (Please correct me if my numbers are incorrect, or if you disagree with my statements):

                           

                          1) Embedding the resources instead of using the URLLoader.

                          Your version uses URLLoader in order to load game resources. Embedding the resources instead does not increase the performance. But I find it more elegant and easier to use. Here is how I did it:
                          [Embed(source="RESOURCES.BIN", mimeType="application/octet-stream")]
                          private var EmbeddedBIN:Class;
                          
                          ...
                          
                          const rom : ByteArray = new EmbeddedBIN;
                          

                           

                          2) Sharing ByteArrays between C code and AS code.

                           

                          I noticed that your code copied a lot of bytes between video and audio memory buffers on the C side into a ByteArray that you needed in order to call AS functions. I suggested using a technique for sharing ByteArrays between C code and AS code, which I will explain in a separate post.

                          The results of this performance optimization were mildly disappointing: the frame rate only notched up by 1-2 fps.

                           

                           

                          3) Optimized switch/case for op table calls

                           

                          Your C code used a big function table that allows you to map op codes to functions. I wrote a script that converted that function table to a huge switch/case statement that is equivalent to your function table. This performance optimization was a winner. You got an improvement of 30% in performance. I believe the frame rate suddenly jumped to 25fps, which means that you roughly gained 6fps. I talked with Scott (Petersen, the inventor of Alchemy) and he said that function calls in general and function tables are expensive. This may be a weakness within the Alchemy glue code, or in ActionScript. You can work around that weakness by replacing function calls and function tables with switch/case statements.

                           

                           

                          4) Using inline assembler.

                           

                          I replaced the MemUser class with an inline assembler version  as I proposed in this post:

                          http://forums.adobe.com/thread/660099?tstart=0

                          The results were disappointing, there was no noticeable performance gain.

                           

                           

                          Now, let me return to my choppy sound hack I mentioned earlier. This is were we enter my "not so perfect world"...

                          In order to play custom sound you usually create a sound object and add an EventListener for SampleDataEvent.SAMPLE_DATA:

                           

                          _sound = new Sound();
                          _sound.addEventListener( SampleDataEvent.SAMPLE_DATA, sampleDataHandler );
                          

                           

                          The Flash Player then calls your sampleDataHandler function for retrieving audio samples. The frequency of those requests does not necessarily match with the frequency onFrameEnter is being called. Unfortunately your architecture only gets "tickled" by onFrameEnter, which is currently only being called 25fps. This becomes your bottleneck, because no matter how often the Flash Player asks for more samples, the amount will always be limited by the frame rate. In this architecture you always end up with the FlashPlayer asking for more samples than you have if the frame rate is too low.

                           

                          This is bad news. But can't we chat a little bit and assume that the "sample holes" can be filled by using sample neighbors on the time line? In other words, can't we just  stretch the samples? Well, this is what I came up with:

                           

                          private function sampleDataHandler(event:SampleDataEvent):void
                          {
                               if( audioBuffer.length > 0 )
                               {
                                    var L : Number;
                                    var R : Number;
                          
                                    //The sound channel is requesting more samples. If it ever runs out then a sound complete message will occur.               
                                    const audioBufferSize : uint = _swc.sega_audioBufferSize();
                                              
                                    /*     minSamples, see http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/events/SampleDataEvent.html
                                         Provide between 2048 and 8192 samples to the data property of the SampleDataEvent object. 
                                         For best performance, provide as many samples as possible. The fewer samples you provide, 
                                         the more likely it is that clicks and pops will occur during playback. This behavior can 
                                         differ on various platforms and can occur in various situations - for example, when 
                                         resizing the browser. You might write code that works on one platform when you provide 
                                         only 2048 samples, but that same code might not work as well when run on a different platform. 
                                         If you require the lowest latency possible, consider making the amount of data user-selectable.                         
                                    */
                                    const minSamples : uint = 2048; 
                                              
                                    /*     For the maximum sample rate of 44100 we still only get 1470 samples:
                                         snd.buffer_size = (rate / vdp_rate) = 44100 / 60 = 735.
                                         samples = snd.buffer_size * channels = 735 * 2 = 1470.
                          
                                         So we need to stretch the samples until we have at least 2048 samples.
                                         stretch = Math.ceil(2048 / (735*2)) = 3.
                                         snd.buffer_size * channels * stretch = 735 * 2 * 3 = 2790.
                                              
                                         Bingo: 2790 > 2048 !
                                    */
                                    const stretch : uint = Math.ceil(minSamples / audioBufferSize);
                                              
                                    audioBuffer.position = 0;
                                    if( stretch == 1 )
                                    {
                                         event.data.writeBytes( audioBuffer );
                                    }
                                    else
                                    {
                                         for( var i : uint = 0; i < audioBufferSize; ++i )
                                         {
                                              L = audioBuffer.readFloat();
                                              R = audioBuffer.readFloat();
                                              for( var k : uint = 0; k < stretch; ++k )
                                              {
                                                   event.data.writeFloat(L);
                                                   event.data.writeFloat(R);
                                              }
                                         }
                                    }
                                    audioBuffer.position = 0;
                               }
                          }
                          

                           

                          After using that method the sound was not choppy anymore! Even though I did hear a few crackling bits here and there the sound quality improved significantly.

                           

                          Please consider this implementation as a workaround until Adobe delivers a FlashPlayer that is 3 times faster :-)

                           

                          Best wishes,

                           

                          - Bernd

                          • 10. Re: Are AS3 timers make use of multi processor cores?
                            bkamenov Community Member

                            Wooow, this was a huge post. Thank you for your time. Well, I have also something to add to your post. Your sound issue solution is good only if your are experiencing rare and very short sound pauses. Otherwise the oscilation will be signifficantly changed. In my case the sound was as if it comes slowly from the "other side"

                             

                            My sound streamer is a port from those I have implemented for Windows using waveForm (the native simple stereo windows sound API). The idea behind is very simple:

                             

                            I have two arrays in my class:

                             

                            var audioFree:Array = new Array();
                            var audioReady:Array = new Array();

                             

                            audioFree contains a predefined count of ByteArrays with a predefined length. For example 6 ByteArrays each allocated for 735*2*sizeof(float) bytes - which comes from: (44100 Hz / 60 fps) * 2 channels stereo * 4 which is float size.

                            audioReady contains the audio samples which the game/generator or whatever generates.

                             

                            I have another array var fakeSamples:ByteArray = new ByteArray(); which is later allocated with 2048*2*4 bytes and represents silence. 2048 is the minimal count of samples which ActionScript needs in order not to stop the audio. fakeSamples is used to feed the audio with silence when not enough audio data is holded in audioReady. This will result in audio pauses when audio is not generated with the desired speed (what I have as a problem one of my emulators), otherwise it works like a charm and does not lead to buffer underrun which may happen in Bernd's code and which forces the user to restart the audio and this is slow (at least I can notice that).

                             

                            Well, below is the code with commments to see how the whole thing works:

                             

                            private var channel:SoundChannel = null;
                            private var snd:Sound = new Sound();

                            private var fakeSamples:ByteArray = new ByteArray();
                            private var audioFree:Array = new Array();
                            private var audioReady:Array = new Array();
                            private var audioChunkSize:uint = 0;
                            private var audioBytesWritten:uint = 0;

                             

                            parivate var AUDIO_CHUNKS:uint = 6; //Any number you wish but it must be picked so that AUDIO_CHUNKS*sizeof_each_chunk >=

                            //2*2048*2*4 which is merely twice the minimal audio size ActionScript requires on single SAMPLE_DATA event



                            private function startAudio() : void
                            {

                                 //Change this to your needs

                                 audioChunkSize = (44100 / stage.FrameRate) * 2 * 4;

                             

                                 //Create silence
                                 snd.addEventListener(SampleDataEvent.SAMPLE_DATA, feedAudio);
                                 fakeSamples.length = 2048*2*4;
                                 channel = snd.play();

                             

                                 //prepare your free array

                                 for(var i:uint=0;i<AUDIO_CHUNKS;i++)

                                 {

                                      var aba:ByteArray = new ByteArray();

                                      aba.length = audioChunkSize;

                                      audioFree.push(aba);

                                 }

                            }

                             

                            private function feedAudio(event:SampleDataEvent):void
                            {
                                  audioBytesWritten = 0; //Holds how many byte we have written to flash
                                     
                                  while(audioReady.length > 0)
                                  {

                                       //Get and remove a prepared audio chunk from array's start
                                       audioChunkPtr = audioReady.shift();

                                       //Update the bytes written
                                       audioBytesWritten += audioChunkSize;

                                       //Write the samples
                                       event.data.writeBytes(audioChunkPtr);

                                       //Our chunk has been written so it may go back in the free array list
                                       audioFree.push(audioChunkPtr);
                                       //Did flash get enough samples?

                                       if(audioBytesWritten > 2048*2*4)
                                           break; //Good
                                  }
                                  //Did we write less than 2048 samples?

                                  if(audioBytesWritten < 2048*2*4)
                                      event.data.writeBytes(fakeSamples, audioBytesWritten); //Ok, then fill the missing gap with silence

                            }

                             

                            //Your game funtion or whatever

                            private function onFrame(event:Event):void
                            {       
                                 if(audioFree.length > 0)
                                 {

                                       var audioChunk:ByteArray = audioFree.shift(); //Get and remove a free chunk from free array's start

                                       fill_chunk_with_audio_samples(audioChunk); //It is your stuff here

                                       audioReady.push(audioChunk); //Push it in the ready list

                             

                                       ....

                                 }

                             

                                 if(!channel)

                                      startAudio();

                            }

                             

                            Well that's it about audio.

                             

                            Indeed, I have inspected how compiled ActionScript is stored and executed. Basicly, it stores arrays from classes, functions descriptions. To call a function it uses the CALLSTATIC opcode and gives the function index in the array of funtions. Well, here is where the problem with pointer to function comes. Function address as we know it in C does not correspond to the function index. To emulate this, alchemy creates a derivative class from an abstract class for each C function with one method exec(). Exec in the derivative class implements the function's code. Then an instance of each class is created. All these instances are stored in the array of objects of the compiled ActionScript. Then each C function address is an index in this array and so it is called. Ok, I agree it is too basicly explained because there are some calculation which index should be called. But the point is that alchemy may be optimized to call the functions directly without using a wrapper abstract class , because this is slooow.

                             

                            Using a switch/case for 65535 cases is faster than calling each function from an array of funtion pointers of the same size! Suprisingly, I have discovered that using the huge switch/case in native code with VisualStudio gives almost the same speed as alchemy. So, I think that if alchemy revises and improves the calling conventions it will be a great success regarding the speed.

                             

                            Bernd, I already use all the optimizations you and Doom code offer and my app is already on steroids but I am still far away from desired speed. I'll wait for better times. Till then I'll finish my Mozilla/IE Sega plugin for Windows.

                             

                            Regards

                            Boris

                            • 11. Re: Are AS3 timers make use of multi processor cores?
                              Bernd Paradies Adobe Employee

                              Hello Boris,

                               

                              your points are well taken in regards to optimizing Alchemy's function call performance!

                              It seems like we have run out of ideas on how to further increase the performance of your games. Future FlashPlayer versions will hopefully increase the performance to levels that will make playing your games more enjoyable.

                               

                              I don't know about you but I learned a few new things along the way.

                               

                              Take care,

                               

                              - Bernd

                              • 12. Re: Are AS3 timers make use of multi processor cores?
                                bkamenov Community Member

                                I have learnd things as well, but most important was the great fun I had. Now I extend the fun by writing a mozilla plugin which mirrors the same look and behaviour as the flash version of the mega drive emu. It is now 99.9% ready and probably I'll post it today (I have small issues with sending cookies to the server from my plugin). Then, I'll make a plugin for IE as well. In the HTML I have something like "if(Windows && Firefox && SegaMegaDrive) then UseOwnPlugin else UseFlash"

                                 

                                Porting to UNIX is for me an impossible task. I do not understand UNIX at all.

                                • 13. Re: Are AS3 timers make use of multi processor cores?
                                  Bernd Paradies Adobe Employee

                                  Hello,

                                   

                                  on 7/15 I wrote that I would explain sharing ByteArrays between C code and AS code in a separate thread.

                                  It took a while, but I just posted a recipe in this post:

                                   

                                  AS3 ByteArray -> C# array -> calculations -> AS3 ByteArray |  How To ?

                                  http://forums.adobe.com/thread/689742?tstart=0

                                   

                                  Best wishes,

                                   

                                  - Bernd