I have a question about scalability.
Correct me if I'm wrong, but my understanding is that for every fragment request from a Player to an Edge (let's say, nginx i.e. not running the Apache module), the Edge must pass this to the Origin, which must then look in the index (.f4x), and subsequently send the corresponding fragment data in response to the Edge (which can cache this), and the Edge sends this back to the Player.
I'm just wondering, say if you have a CDN with 100 Edges, and 100 customers hit each edge, and the fragments requested are not cached, does this mean 10,000 requests in to the Origin, which has to do this index parsing/fragment-extraction for each one?
Because it seems to me that's what would happen; and I would not expect such architecture to be scalable at all.
1. Is my understanding right?
2. Does it actually in practice, matter (scale)?
Anyone interested in this, today I did some basic benchmarking, details here: http://rdkls.blogspot.com/2011/12/benchmarking-adobe-hds.html
The module for Apache fell over at 500 concurrent requests, maybe less, and by 1000 concurrent requests availability was only 58% (plain nginx i.e. no module was still at 100%), which seems pretty bad and not well-suited to scaling to me?
Ahhh benchmarks. So what exactly is going on here?
1. Firstly take note of how apache is configured:
MaxClients sets a limit on the total number of server processes, or simultaneously connected clients, that can run at one time. The main purpose of this directive is to keep a runaway Apache HTTP Server from crashing the operating system. For busy servers this value should be set to a high value. The server's default is set to 150 regardless of the MPM in use. However, it is not recommended that the value for MaxClients exceeds 256 when using the prefork MPM.
This means that if the response time is slow (because a lot of work is being performed) you probably wont see more than 150 concurrent requests before failure. The HDS module is doing a lot of work.
2. How big is the content being accessed? Smaller files on disk are easier to parse and create indexes-- Larger files mean more indexes, more work, and more time. That said, 7000 different files (of any size) on disk will mess with the operating system's disk cache and cause a lot of swapping.
3. I hope that the request was made via a loopback so that the network wasn't involved. Otherwise the network was your bottleneck... (I have never seen nginx handle less than 20 reqs/sec ... and for that matter apache also ... )
There are some other items, but generally those are the big ones that you should re-evaluate.
I wonder how big companies using HDS scale it. Perhaps they just have monstrous Apache instances, dunno.
Well, it depends on what you are trying to achieve.
a. The edge servers to origin servers can be federated with content. (ie 500 peices of content per origin (instead of 7000) which has a better cache coherency / locality )
b. If performance really matters, then pre-package the content and prime the edges.
c. Defer the scaling to experts (ie big CDNs). This is what those Content Distribution Networks do-- im guessing other "big companies" still don't have the infrastructure to scale well, and will lean on other companies expertise in that area.
PS. I havn't checked out siege yet. Thanks for the info.
Thanks techeye, some valid points.
I think the poor performance across the board re: txn/sec was due to the large number of concurrent requests (100, 500 and 1000 are quite high) and the size of each response (3.2mb).
On the data used - for the without-HDS scenarios i.e. serving the fragment straight - I extracted that fragment via the f4f module first, and saved to the server's html dir.
For through-HDS scenarios, I requested e.g. Seg1-Frag1, which was the same fragment, but extracted on the fly by HDS. Granted the f4f segment was 15M and contained 5 fragments - something I should change to have a totally fair comparison.
The 7000 copies are just duplicates of these files, in separate subdirectories; this is simply to simulate 7000 potential unique streams/content.
1. Tuning apache - yes this is needed for accurate Apache vs. Nginx comparisons, but mainly I'm after overhead of HDS module.
Still, I did some adjustment of MaxClients, but found 256 to be around optimal.
2. I should have described - content was a 3.2mb fragment.
3. Initial runs were made from my local box over local gigabit connection to box physically connected to the same switch as me.
But again, network issues shouldn't matter as much when I'm really after relative comparison of HDS module vs without.
Later tests (see below) have done locally though with similar trends.
a. Makes sense, so you'd have more origins to spread the load
b. By prepackaging the content, you mean pre-extracting fragments? I don't think priming the edges is feasible for a large library (what I was aiming to simulate with 7000 copies)
c. Yes, CDN architecture is another topic though - this is just about performance of an origin running the HDS module
- MaxClients to 256 (I tried this, 512, 1024 and 256 was the best for this machine)
- reduced the number of concurrent requests to just 20
- ran the test 10 times per batch (instead of once)
- ran 5 batches, for each setup, averaging results
Summary of results were:
- In all cases 100% availability (as you would hope)
- txn/sec for HDS Module, Straight Apache, and Straight Nginx were, in order: 14.6, 15.5, 18.7
My conclusion is that the module still induces overhead (as would be expected) - though not huge (in this lightweight scenario ca. 5.8%, but at concurrency 100 ca. 17%), still considerable, something I would think an architect would count as a risk at scale.
Losing the module and running straight Nginx (e.g. something you could do with your suggestion of pre-extraction, e.g. John Crosby from RealEyes' extractor referred to - but not availabe - on http://www.thekuroko.com/) would seem to give improvement in vicinity of 28% (at low concurrency 20) to 41% (at concurrency 100).
Hi Nick (or anyone else who may know the answer) -- do you know of any tool that we can use to pre-extract the fragments? We'd like to be able to use any HTTP server and not rely on the Adobe HTTP Origin module. Part of the reason is for scalability, but the main reason is so we can simplify by serving up purely static content (much like HLS does). I contacted John Crosby about a command line tool to pre-extract, but have not gotten a response. It would be great if Adobe would provide such a tool.
John Crosby from Realeyes makes reference to one he wrote a while back here:
however it seems to be unavailable at the moment.
Possibly I'll write one later, it should be pretty straightforward, will see (will post back here if I do).
Followup - My benchmarking was seriously flawed, in that the URLs I used to siege, contained nonexistent fragments.
This is because I generated them as a contiguous range, but the packager doesn't necessarily use contiguous sequence of numbers for fragments.
FYI The correct way to get the list is something like:
f4fpackager --inspect-fragments --input-file infile.f4f | grep -v discontinuity | grep "fragment = "