This is correct, there is no particular limitation for the repository size. By default, files larger than 4096 bytes are not saved in the Repository Storage (typically the TAR file), but as plain files on the file system by the Data Store (see Data Store documentation). The Persistance Manager then only stores a file hash in the Repository Storage, so that if a file appears at multiple locations in the repository, it is only saved once on the file system. So really, the limitating factor is mainly the disk drive.
Here are a few additional things that might be useful when working with large data sets:
- The Persistance Manager works incrementally, which offers optimal performance, but in case of many content modifications, there will be some useless content stored. In case this takes too much disk space, you should consider TAR optimization and Data Store garbage collection.
- Making a package of some content duplicates the content into the package, so it roughly duplicates the amount of disk space needed (and package files also end up in the Data Store).
- If you start uploading a lot of content to your instance, you should have optimized workflows to do only what is needed (basically, each workflow rendering an asset will need to load the asset in memory). If you want to optimize the data in your repository, you could also remove the original asset once it's renditions have been done.
- You should avoid having content structures with more than a thousand of nodes on the same level in the hierarchy (as you already know).
- If you have a lot of content that doesn't need to be searchable, configure your search indexing accordingly.
But finally I'd like to say that when working with very large data sets, it is always a good idea to setup a test environment with realistic content and test the different cases, such as feeding new data into the system, authoring, web access, replication, backup, etc. In case you reach some limits, we will work with you to identify and optimize the bottlenecks.