1 person found this helpful
Object Replication is the most abstract of the RTMFP Groups functions we've introduced.
the "objects" can be any AMF-serializable object, such as Number, String, ByteArray, Array, and so on. what they are and what they represent are entirely up to you (the developer). we recommend that for best results, whatever they are, each one should serialize up to somewhere between 16KB and 64KB.
each object has an index number (determined by you, the developer). for best results, the indices should be assigned contiguously. all RTMFP cares about what index numbers each node has, and what index numbers each node wants. when a node indicates that it has an index number, it's promising that if a neighbor wants that index number, it can get it from the node that has it.
the theory is that an object for each assigned index number exists on at least one node in the group (that is, for each assigned index number, at least one node has it). for each assigned index number, each node starts out having or wanting it. a node that wants an index number that a neighbor has will request that index number of the having neighbor. the node that has it will receive that request as a NetStatusEvent, then send or deny the requested object. if the request is satisfied, the wanting node will get the object that was sent in a NetStatusEvent. if the wanting node decides the received object is acceptable, it keeps that object (in whatever way is appropriate -- say, in an Array), and then adds that object's index number to its own have set. eventually, every node will have all the index numbers (and therefore objects) and not want any.
Mike, thanks for the reply. It helps a great deal in understanding. Is there any additional documentation you can point me to? I see the Action Script reference and I realize it is early beta but without some additional information I am not sure how to use these powerful capabilities. Lastly any chance you have a example you could share utilizing this aspect of RTMFP ?
1 person found this helpful
we don't currently have any examples or extra documentation. however, i'd like to highlight the APIs involved in object replication and recommend you read the descriptions of each property/method/event code.
property replicationStrategy (see also enumeration class NetGroupReplicationStrategy)
property info (see also class NetGroupInfo to see the stats for object replication)
addHaveObjects() (this is how you specify what you have)
addWantObjects() (this is how you specify what you want)
writeRequestedObject() (this is where you send an object you have to a neighbor that wants it)
NetGroup.Replication.Fetch.Result (this is where you get an object that you wanted)
NetGroup.Replication.Request (this is where you receive a request from a neighbor for an object you have)
the NetStatusEvent code table lists the properties of the info object that are of interest for the different cases listed above.
I am still working on this but find it tough sledding. The main issue I have is I am not sure how to create a contiguous index for the whole group when some peers may be working offline. If I try to make the index specific to a endpoint I get duplicates within the group (each endpoint has an independent index so ), or if I use a endpoint identifier as part of the index I have a discovery problem in that I am unsure how other endpoints discover the existence of objects (much less request them)..
After reviewing the classes you suggested I am still unsure how to request the objects that are available to and endpoint so a determination of wheter they are wanted or not can be made. I know how to express "this is what I have" and "this is what I want" . Whats missing is "what else is out there" and it is needed so i can figure out if I want it or not.
I am blown away at the simple elegance, flexibility, and power of the protocol and semantics for expressing p2p communications and group formation. I would love to see data synchronization added with the same level of abstraction and expressiveness.
there are a number of possible models for using Object Replication, plus i'm sure plenty we haven't even thought of yet. here are just a couple examples:
1) a file is broken into N 16KB blocks. each block is given an index number from 1 -> N. you create another object (a "manifest" object) that describes how many blocks there are, perhaps a hash of the contents of each block, perhaps a suggested name for the reassembled file, maybe a description, etc. you put this manifest object at index 0. at the beginning, there's one peer (the "seed" peer) that has the whole file (and therefore all its indices) and the manifest at index 0. every other peer starts out wanting index 0. once each peer gets/has index 0, it can tell the rest of the index range to want. as each block is received, perhaps you check its hash, and if it doesn't match the manifest, you discard and re-want it. eventually each peer will have all the blocks, and could reassemble and save the file.
1a) for added fun, you hash the manifest object and make the hash part of the group name (and check the hash when you receive index 0). assuming you use a strong hash (like SHA2-256, ActionScript implementation available in the Flex SDK: mx.utils.SHA256) it becomes cryptographically infeasible to distribute a corrupted file.
2) every peer starts out wanting *all* the indices (0 -> 9007199254740992). you have a central authority (FMS? a web service?) hand out sequential sequence numbers to individual peers whenever that peer wants to distribute a new object (for example: some kind of incremental update to a shared state?). the central authority ensures the index numbers are unique and sequential (additional benefit: everyone applying the updates to the shared state will see consistent results). the peer that has the update or whatever unwants that index number and then has it, and eventually all the peers will get it. the central authority/server had to hand out the index number, but the actual data distribution happens P2P.
cases #1 probably work best with the "rarest first" replication strategy. case #2 probably works best with "lowest first".
for case #1a there, the step i left out that makes it really interesting is all you have to communicate out-of-band to all the peers who want this file is the hash of the manifest object. the peer would use that (along with your application-specific information) to construct the GroupSpecifier and Groupspec with which to join the just-for-this-file group, get (and verify) the actual manifest from a peer, and then get (and verify) all of the blocks.
I understand how to use index 0 as a pointer to an "application level control object", including facilitating the encryption of the corresponding payload content. What is not clear is how to make use of the required indexing scheme without a central semaphore-like mechanism handing out index numbers. This seems to almost negate the benefit of the pure p2p capabilities. I am struggling thru several failed attempts at implementing without a central authority, but none are very promising at this point. Any suggestions?
Object Replication was designed for use cases where there can be a reasonable assignment of contiguous indices, such as file replication or where a central authority assigns sequential indices.
there are decentralized schemes you could try. there's no *requirement* that the indices be contiguous or sequential. you could use a 53 bit hash (example: truncated SHA256) of the content to assign the indices. however, once there was any appreciable number of objects in the system, it would get horribly bogged down, as the low-level protocol and implementation are designed around the "reasonable number of contiguous regions" case.
you could use Directed Routing to have the system elect a central index number arbiter in the mesh (perhaps "whoever is nearest to 0"). that wouldn't be very scalable, though, and it would be difficult to actually get right.
without knowing what you're actually trying to build, it sounds like posting (NetGroup.post()) might actually be a better fit -- if every participant may have something to say from time to time. posting is best-effort (unlike object replication which is fully reliable), although the effort is "quite good". the temporal semantics of posting may or may not be acceptable to what you're trying to build (postings are ephemeral and best effort rather than permanent and reliable).
if you need the reliable and permanent semantics of object replication in a situation where different nodes could suddenly have a new block of information for the "next index number", then the easiest solution is to just use a central arbiter/authority. it may not be as cool as a "pure p2p" solution, but it has the benefits of actually being possible and yielding the correct results. you still get a p2p win, since the overhead of obtaining the next index number is relatively small (perhaps a single HTTP query from the node generating the new information), but the information can then be shared in the p2p mesh without ever using server resources no matter how many members are in the mesh or how large the block of information is.
That makes me feel better, as I was beginning to think I was missing something. I am actually just experimenting with building a distributed status reporting system. The feature I was working on was enabling the submitter to compose the status report while disconnected (such as on an airplane) an/or when connected to a local LAN (as in a conference /war room like setting) with no internet connectivity.
Given what I have been trying it seems like what is needed is two additional methods on the netGroup class.
- AddHaveItem - takes as a parameter a GUIDlist. (The GUID is defined by the application, so in my example the GUID may be a logical key like "lastname + yy/mm/dd" or some much more elaborate GUID creation scheme.
- AddWantItem - takes as a parameter a GUIDlist or the string "ALL" indicating everything
with the response behavior being similar to the AddWantObjects and AddHaveObjects. This may provide a means of accessing the shared objects without depending on a central semaphore. I suspect you have the internal information necessary to map the GUID to the index already in the internals of the group.
A couple of more unrelated questions:
1) While debugging I tend to use the same GroupSpecifier over and over. What I notice is unless I disconnect cleanly the next time I connect the previous peer is still represented in the group. So this morning I started plugging away. I ranmy application and saw there were 10 connections to my group. I know this is residual from last night because I am only running on my LAN (using netConnection.connect("rtmfp:") ) and only have one machine with one instance of the app running. Where is this information coming from? Where is it stored? When will it "timeout"? Is there any way to access these internal structures and information?
2) It seems it would be very useful to provide access to current group members. In my example app mentioned above I want to show the current user any other peers who are online an connected to the group. The only way I see to do this is to manually build the list by listening for "NetGroup.Neighbor.Connect" which I believe only gets me the endpoints I am directly connected with. In my case I want the user to see a list of everyone they can communicate with (regardless of whether connected directly or indirectly). More assumptions on my part but this must be available in the internals since NetGroup.Post is able to deliver to all current users.
Again, thanks for your assistance on this new, potentially revolutionary functionality. While all the other vendors are betting big on the cloud adobe is innovating on the opposite end of the spectrum where in my opinion most users are most naturally comfortable.
(pls look for private message)
re "addHaveItem/addWantItem" -- we'll think about it. it sounds exactly the same to me as "permanent posting".
if you don't close your NetConnection, but just forget about it in a running application, then it will continue living until it is garbage collected. perhaps you are seeing those. if you're actually getting NetGroup.Neighbor.Connect events, they can *only* come from actually alive neighbors that you are discovering on your LAN with ip multicast (i assume, since you're connecting to "rtmfp:").
individual group members don't have access to "all of the group members". each one knows about a subset, but the subset each one knows about is chosen to be maximally useful to generating the desired topology of direct neighbors which is also a (O(log n)) subset of all of the members, but with a few interesting properties. the most important property is the direct neighbors are chosen so that there's a path through the mesh between every pair of members (that is, it is fully connected, but not fully meshed). it has other properties which aid in message routing and low-latency message distribution (posting/multicast).
there is no scalable way for every node to see every other node in the mesh.
My questions is on these four functions: addHave, removeHave, addWant, removeWant. Say I want an object by calling "addWant". When I get it, my status switches from "want" to "have". It seems very natural that the system should update my status automatically, and maybe calling "addHave" on behalf of me, such that other peers know that this object has a new owner.
I don't mind calling "removeWant" and "addHave" myself, I just want to confirm how the system actually works. I don't want my system to malfunction, but neither do I want to send in redundant messages.
say you want the object at index 1 by calling NetGroup.addWantObjects(1, 1). hopefully at some point you'll get the object from a peer via a NetGroup.Replication.Fetch.Result NetStatusEvent. the system will automatically un-want that index, but will not automatically addHaveObjects() it. the "meaning" column for NetGroup.Replication.Fetch.Result describes the "un-want" behavior.
the reason for not automatically addHaveObjects()ing it is that the object may not be valid according to your application (perhaps its checksum is bad), or perhaps you need to retrieve several objects before you can determine that they are all correct, at which point you would addHaveObjects() the several.
ah thanks! this makes perfect sense. Just another question: if a careless
programmer creates some inconsistency, say calling addWantObjects(1, 10) and
addHaveObjects(5, 15) at the same time, then this will create some confusion
over objects (5, 10). How would the system handle this?
there's no confusion. you're allowed to want and have the same indices. "wanting" them means you want your neighbors to send you the objects if they "have" them. "having" them means you will satisfy the "wants" of your neighbors.
RTMFP doesn't impose any semantic meaning on what the objects at each index are or represent. for most straightforward applications of Object Replication, it doesn't make sense to both want and have the same indices, but there's nothing stopping you from doing that.
I see. Glad to get this straightened out. I guess with most straightforward
applications, this will cause wasteful downloads but no damages otherwise.
And a careful programmer should be able to avoid this.