2 Replies Latest reply on Aug 19, 2011 1:46 AM by c0d3r1c

    Documentation on Flex/Flash SPEEX Audio Data Packets

    bbspeterlee

      Hi all,

       

      I'm implementing a very simple audio-only RTMP server.  The commercial servers like FMS, Red5 or WowzaMedia are too complicated for me.

       

      I'm kind of very frustrated with the Adobe's online documents, because I searched Adobe, or googled a lot, and I could not find any documentation about the Flex/Flash SPEEX Audio Data Packets.

       

      I have my client code like this:

       

      // get the default mic
      var mic:Microphone = Microphone.getMicrophone();

      // best quality (picks up all sounds, no transmission interruptions)
      mic
      .setSilenceLevel(0);

      // Using SPEEX codec with quality of 5
      mic
      .codec = SoundCodec.SPEEX;
      mic
      .encodeQuality = 5; // Required bit rate: 16.8 kbits/s,

      // Rate is automatically set to 16K Hz if SPEEX codec is set
      //mic.rate = 16;

      mic
      .framesPerPacket = 1;

      // Attach the mic to the NetStream
      ns
      .attachAudio(mic);

      ns
      .publish("SpeexAudioData", "record");

      Then on the server, I keep receiving audio packets with size of either 43 bytes or 11 bytes (no other sizes found yet).

      My questions are:

      1. Why do I get size of either 43 bytes or 11 bytes (from SPEEX encoding?)?
      2. Is the 43 bytes = 1 head byte + 42 data bytes? (I think so)
      3. Where are the meta data information about the audio packets upon receival at the server side, such as, framesPerPacket, sizeOfEachEncodedFrame, encodeQuality?
        (I understand that a SPEEX frame does not necessarily have to be at an octet boundary, and also know that I could get the sizeOfEachEncodeFrame from the mic.encodeQuality, but I'm very frustrated with the 11-byte packets, which totally messed up my logic and thought)
      4. What is the size of 11 bytes (1 head byte + 10 data bytes)? What is the content in the 10 bytes?
      5. How should I process or convert the SPEEX to raw data, so that my server side app can use this audio data? My current implementation:
        • I pick up all 43-byte packets (drop all 11-byte packets);
        • Skip the first 1 byte;
        • Decode the left 42 bytes using Speex library.
      6. How should I convert the raw data back to SPEEX audio data? (if there is no 11-byte packets, I know how to, but with these UNKNOWN 11-packets, I don't know how)

       

      P.S.: A more complicated case: if I set mic.framesPerPacket = 2, at the server side, I receives packets with size of 85 bytes, 21 bytes or 53 bytes, and I totally understand that:

      1. 85 = 1 + 42 + 42
      2. 21 = 1 + 10 + 10
      3. 53 = 1 + 10 + 42 = 1 + 42 + 10

      But again, with the (1+10=10)-byte data, I'm totally confused.  So what are the 10 bytes on earth?

       

      Thanks.

        • 1. Re: Documentation on Flex/Flash SPEEX Audio Data Packets
          bbspeterlee Level 1

          Hi,

           

          I guess I figured out myself.

           

          In Adobe's documentation for Microphone.codec property:

           

          Speex includes voice activity detection (VAD) and automatically reduces bandwidth when no voice is detected.       When using the Speex codec, Adobe recommends that you set the silence level to 0. To set the silence level, use the       Microphone.setSilenceLevel() method.

           

          That's means it's possible that the size of a frame might be different when VAD is deteced (and bandwidth is reduced), even framesPerPacket is set to 1 and encodeQuality is set to 5.  I tried decoding the 10-byte frame, and I found 79 bits (which is 9 bytes and 7 bits) are used for this special frame.

           

          I also observed that this 10-byte frame is always 10 byte frame even the encodeQuality is set to 0 or 10.

           

          Hope it helps.

          • 2. Re: Documentation on Flex/Flash SPEEX Audio Data Packets
            c0d3r1c

            Hello,

                      the speex codec embedded in Flash Player, uses VAD (Voice Activity Detection) and CBR (Constant Bit-Rate), so when no voice activity is detected, the codec send a few bytes with constant rate, for padding the stream and report that there is no voice. I guess the ten bytes is this padding report.

            Hoping this is helpful.

             

            P.S. : I am coding with C language a utility for extracting speex from OggS and wrapping it in a FLV container, and I have forgotten the AudioTagHeader byte in the tag body. You reminded me it when you wrote about the sended bytes to the server.