9 Replies Latest reply on Apr 23, 2010 7:08 AM by xxael

    Connection Inconsistencies

    RGBEFFECTS

      Hello,

       

      First, I want to say KUDOS to all you guys who have put time into making Stratus a reality for all of us web guys. I have come to love both the idea of Stratus & the functionality it brings to online applications.

       

      I've been working on a new video chat application & I have noticed quite a few inconsistencies in my application. I have a back-end that keeps up with the status of each user & it matches up the people in the open pool.

       

      There are times when I can have 10+ users testing out the chat & everything is working perfectly, and other times where nobody can connect to each other. Just to be clear, none of my test users have any problem joining Stratus and getting their unique ID, but at certain times nobody can connect to anyone else.

       

      I have since created a procedure that clears out users that cannot be connected to (this was in hopes of speeding up connections with good Stratus IDs) & requires them to join back into the chat. What I have noticed is a wave pattern that affects every users. When everything is going good, we can have 10+ users all connecting fine. All of a sudden, these users can't connect to each other and fall out of the pool like flies. Once they come back in, they get another life, but it's questionable how long the life is. The time span between good & bad is not consistent, but it gets consistently worse after 5pm (CST).

       

       

      I apologize for not getting to my questions quicker, but here they are:

       

      1. When a user can't connect to another user (connection times out) what could be the reasons?

       

      2. How much time should we allocate for the connection to be made between users?

       

      3. Is there another way to test if a user's Stratus ID is still valid (connectable)?

       

      4. Is it possible that Stratus is clearing out IDs? if so, how often should I renew someone's Stratus ID?

       

       

      Any other input, feedback or advice would be greatly appreciated.

      Thank you,

      RGB

        • 1. Re: Connection Inconsistencies
          Michael Thornburgh Adobe Employee

          if your connection to Stratus is still up, your peerID *should* still be reachable.  Stratus doesn't time out connections, and it shouldn't be necessary to "refresh" your connection.

           

          we are still experiencing trouble with the front-end network equipment at our data center, and it is possible, i suppose, that the behavior of the stateful firewall for longer-lived UDP sessions could be less than optimal.  that's a guess, though; i don't know that for certain.  our data center managers are still working toward taking that piece of equipment out of the network path for Stratus.

           

          i'll have to run some tests to try to replicate your problem.  can you give me a hint as to how long until i can expect trouble?  minutes, tens of minutes, hours, days?

           

          oh, here's something else to think about that just occurred to me: if you aren't closing your P2P NetStreams, it may be that they are sticking around, and you're running into the maxPeerConnections limit, which would keep a client from accepting new connections after it was exceeded.  i don't know offhand if simply losing all references to a connected and running P2P NetStream will cause it to be implicitly closed and garbage collected.  and even if having no references to a NetStream in your ActionScript does cause it to be closed and collected, that would only happen when GC is triggered, which might be very infrequent depending on the specific allocation behavior of your code.

          1 person found this helpful
          • 2. Re: Connection Inconsistencies
            RGBEFFECTS Level 1

            Michael,

             

            Thanks for the solid feedback. I understand the growing pains Stratus may be having & look forward to a service that you guys are confident with.

             

            As far as when I experience the problem, the problem is intermittent, but I can say that it is much more obvious after 5:30pm (CST). It gets progressively worse later into the night (9:00-10:00pm CST). During the day, connections are much more reliable & the application is "almost" perfect (with the occasional connection time out).

             

            Your advice on making sure I'm closing my P2P NetStreams has been noted & I am currently going into my code to make sure this is done.

             

            Thanks again for all of your advice & I look forward to creating a much more solid chat application,

            RGB

            • 3. Re: Connection Inconsistencies
              Michael Thornburgh Adobe Employee

              as a data point, i left a connection open to Stratus all night.  this morning i was able to connect to that peerID using Stratus.

               

              the most likely culprit is dangling NetStreams using up your maxPeerConnections.  you might also take a look at the unconnectedPeerStreams to see if there are NetStreams hanging out there, and, as a test, try setting maxPeerConnections super high to see if that stops, or delays, your problem.  if setting maxPeerConnections super high stops or delays your problem, then the culprit is somewhere in your code.

              • 4. Re: Connection Inconsistencies
                RGBEFFECTS Level 1

                I went into the application & verified the peer streams. I made sure everything was being closed off after a stream & I verified it by tracing the stream count before creating new streams for the next communication. I left the maxPeerConnections at 8 since I was able to verify that I was cleaning up the streams.

                 

                It's not that any user's Stratus IDs went bad. It's just that periodically Stratus doesn't connect two users. Originally, we were deleteing users that another user could not connect to (timed out). Now, we are allowing these users to stick around & are noticing an interesting pattern.

                 

                If a user tries to connect to a user & it times out, the user that failed to connect is still able to connect to other users. And in certain situations, the two users that failed to connect are able to connect later on. We have also tried reconnecting users when they fail to connect. We had the connection retry for up to 12 times in a row & every single one failed.

                 

                Outside of connections timing out, I am also getting periodic NetStream.Play.Failed issues. Initially I was throwing the user back into the pool to find someone else, but I'd like to find a better way if I can stop this from happening.

                 

                Short of adding traces to my whole chat application, I have added traces to every place I can think of. Is there a way to get a better description of "why" Stratus isn't connecting two users?

                 

                I appreciate your time, experience & advice,

                RGB

                • 5. Re: Connection Inconsistencies
                  Michael Thornburgh Adobe Employee

                  are all of your users in a lab environment with known networking configurations/limitations?  or are these all "real world" users out in the wild wild Internet?

                   

                  not all combinations of NATs are compatible with P2P communications.  and perhaps some users' networks or ISPs have NATs or firewalls with behavior that changes under load, which might explain the time-of-day sensitivity.  please see this posting for a detailed description of the kinds of P2P vs NAT problems that exist:

                   

                    http://forums.adobe.com/message/1064983#1064983

                   

                  note: i've had my test program running now for nearly 24 hours, and have tried connecting to it by peerID numerous times throughout the day, and Stratus is still connecting us.

                   

                  can you replicate your problem with simulated users in your lab?  especially when there is no NAT or firewall between the computers?  especially even when the simulated users are on the same computer?

                  • 6. Re: Connection Inconsistencies
                    RGBEFFECTS Level 1

                    I have done some more thorough tests using specific computer in the office. I created a controlled test to uncover possible P2P issues that you discussed. I actually noticed some consistencies in the responses I was getting from each computer. Below are my findings:


                    --------------------------------------

                     

                    CPU 1 (MAC Laptop):
                    - Could request a connection with Self
                    - Could accept a request from Self

                    - Could request a connection with CPU 2
                    - Could accept a request from CPU 2

                    - Could Not request a connection with CPU 3 (bounce)
                    - Could accept a request from CPU 3

                    --------------------------------------

                    CPU 2 (PC Tower):
                    - Could request a connection with Self
                    - Could accept a request from Self

                    - Could request a connection with CPU 1
                    - Could accept a request from CPU 1

                    - Could Not request a connection with CPU 3 (bounce)
                    - Could accept a request from CPU 3

                    --------------------------------------

                    CPU 3 (PC Laptop):
                    - Could request a connection with Self
                    - Could accept a request from Self

                    - Could request a connection with CPU 2
                    - Could Not accept a request from CPU 2 (bounce)

                    - Could request a connection with CPU 1
                    - Could Not accept a request from CPU 1 (bounce)

                     


                    You might come to believe that no one could start a connection with CPU 3, but there were actually two unknown users who dropped in during testing & they were able to create a connection with CPU 3.

                     

                    In the end, this leads me to believe that each user needs to have their own “request bounced list” that they don’t try to connect to if previous attempts to connect failed. This should still leaves the door open for a user on the list to request a connect with this user. (CPU 1 & CPU 2 could not request a conversation with CPU 3, but CPU 3 was able to start a conversation with both CPU 1 & CPU 2 )

                    Thoughts?

                    • 7. Re: Connection Inconsistencies
                      Michael Thornburgh Adobe Employee

                      based on the information given (and no description of how your lab network is laid out), i believe the following is the most likely explanation:

                       

                        1) your lab network goes through at least one NAT to reach Stratus and the outside Internet

                       

                        2) CPU1 and CPU2 are on the same LAN segment, or at least their local interface addresses are mutually routable in your NAT domain

                       

                        3) CPU3 (the laptop) is on wireless

                       

                        4) the wireless base station does its own NAT

                       

                        5) the outermost NAT for deduction #1 does not do hairpinning

                       

                      double-NAT with no hairpinning is a known "some P2P connections will fail" scenario.

                       

                      CPU1 and CPU2 can't initiate to CPU3 because a) CPU3's local interface address is translated and not directly reachable by CPU1&2 and b) CPU3's observed address is the outside-#1-NAT address, as are CPU1&2's, so the UDP-hole-punching intro packets that Stratus sends are trying to create paths for traffic that would need to hairpin, but that's not supported by #1-NAT.

                       

                      CPU3 can initiate to CPU1&2 because their local interface addresses, while in a translation domain, are in the same translation domain as the wireless base station's wired LAN, and so are directly reachable.

                       

                      there are even more complex explanations possible if CPU3 is multi-homed (specifically, if it has wireless and wired connections).

                       

                      note: once CPU3 has connected to CPU1 or CPU2, for at least a few minutes it should be possible for the connected-to CPU to connect to CPU3 (assuming the NetConnections on both sides stay running).  the underlying RTMFP session between them stays up for a few minutes even when idle.

                      • 8. Re: Connection Inconsistencies
                        RGBEFFECTS Level 1

                        You are 100% accurate about the lab setup. CPU1 & CPU2 are on a main network while CPU3 (laptop) is on a secondary wireless network. Everything you said makes logical sense & I appreciate all the help.

                         

                        As far as my approach to solve this issue, what are your thoughts?

                        ------------------

                        Each user needs to have their own “request bounced list” that they don’t try to connect to if previous attempts to connect failed. This should still leaves the door open for a user on the list to request a connect with this user. (CPU 1 & CPU 2 could not request a conversation with CPU 3, but CPU 3 was able to start a conversation with both CPU 1 & CPU 2 )

                        ------------------

                         

                        Do you have any alternative solutions to maximize user experience?

                         

                        Thanks for everything,

                        RGB

                        • 9. Re: Connection Inconsistencies
                          xxael Level 1

                          Hi there,

                           

                          is there another solution for this problem?? I need it very urgent. I have this Problem too.

                           

                          Local all works fine.

                          But live there are a few user who sometimes don't get a connection. And at other cases they call s.o and never get an answer. I fixed it with an timeout, but this is not a final solution for me, because in some cases, timeout comes on timeout and the connection time is taking too long.

                           

                          Please help me for this problem...

                          What is the cause of this problem?? And what is the solution

                           

                          Thanks alot

                           

                          greetz

                           

                          xxael