Thanks, though from the slides & description (haven't had a chance to check video yet), it seems to be related to upgrades, rather than deployments, is that right? — Is deployment a supported use-case for crx2oak, or is it only for upgrades?
1 person found this helpful
The blue-green deployment pattern post by Martin Fowler simply forgot a single item: What happens when blue is under constant change by its users while you prepare green?
This is the problem with AEM, as your blue publishs are under constant change by authoring users. Oh, and don't forget that you have your single point of truth as well, the authoring instance. There you cannot apply this pattern at all, if you don't want to have a planned downtime for the time of the deployment.
My conclusion: You cannot apply the classical blue-green approach.
I normally do deployments in a way, that I use planned service downtimes on authoring, but none on publish.
- initiate service downtime on author
- deploy author
- remove 1st half of the publishing instances from loadbalancer, so that the 2nd half is serving all the requests.
- deploy 1st half of the publish instances
- switch loadbalancer, so the 1st half now servces all the requests
- deploy 2nd half
- bring all back online
This is a modified version of the blue-green approach: I don't have a standby instance which just changes roles with the production instance. But I have enough redundancy in the frontend that I am able to perform the deployment without downtime.
what I meant by Blue-Green in this case was to have a 'Blue' and a 'Green' author instance, one which is the live Production instance, and one acting as Pre-Prod. For example:
- Deploy new release to non-live servers (green), both author and publish.
- Enforce a content freeze on live author (blue).
- Run a synchronisation of content between live & non-live servers.
- Put green servers live, and set blue to non-live
- Lift the content freeze for authors.
In theory, this set-up is quite feasible, even with a changing author, if we can quickly sync the latest changes from Blue back to Green — a minimal content freeze can be tolerable, especially if outside of normal business hours.
I'm still not clear as to whether this is a supported use-case of the tooling though?
the approach sounds good, but the problem is indeed step 3. And unless you know a method, which can achieve this really quickly (that means 2 min at max) in 99% of all cases, I would doubt that this is doable.
Fast synchronization between AEM instances (especially if 1 instance is weeks behind) is hard; especially problematic is the versioning stuff, because the API prevents it to create the versioning nodes directly via JCR APi (you have to use the versioning API for it).
So yes, in theory it's possible. But I haven't seen it implemented yet :-)
Yeah, that's the trouble alright
We could keep them closer than weeks apart by pulling content back from Production (e.g. by using nightly disk-level back-ups of Prod to do a restore over Pre-Prod), but agree that the trouble is getting over the last hurdle alright!
kautuksahni, you've marked this answer as 'resolved', so could you clarify if the use of crx2oak against a running instance is a supported use of the tool, or if it can only be run against offline instances?
Still find these docs confusing to be honest… So think you're talking about under the heading "Content repository migration and upgrade" where step #1 says "First, stop the instance if it is running."?
These seems to be explicitly steps for upgrading though, rather than just content migraton, e.g. step #6 says "Start AEM to bring up the instance for the inplace upgrade."
On the child page in the left-hand nav, "Using the CRX2Oak Migration Tool" it calls out three use-cases:
The tool can be used for:
- Migrating from older CQ 5 versions to AEM 6
- Copying data between multiple Oak repositories
- Converting data between different Oak MicroKernel implementations.
I'm just looking at the second use-case — Copying data between multiple Oak repositories — rather than a full upgrade between versions.
Also, it sounds like these two use cases even use different modes within the tool; from further down the page:
CRX2Oak is called during AEM upgrades in a fashion in which the user can specify a predefined migration profile that automates the reconfiguration of persistence modes. This is called the quickstart mode.
It can also be run separately in case it requires more customization. However, note that in this mode changes are made only to the repository and any additional reconfiguration of AEM needs to be performed manually. This is called the standalone mode.
Another thing to note is that with the default settings in standalone mode, only the Node Store will be migrated and the new repository will re-use the old binary storage.
So it's the "standalone" mode that I'm interested in rather than the "quickstart" mode.
Finally on that page, under "Parameters: Migration options", it mentions:
- --early-shutdown: Shuts down the source JCR2 repository after nodes are copied and before the commit hooks are applied
Which would seem to suggest that it is being run against an online instance (or at least, the instance is brought online by the tool as part of the migration?)
Also oak-upgrade (which I presume shares a lot of internals, as the imagery on the documentation is identical), mentions at least an in-part online upgrade for Blob storage.
Apologies if I didn't have enough detail in the first post & my question wasn't clear, but (as often seems the case with internal tools of AEM), that the docs are a little incomplete and at times contradictory.
Looking into oak-upgrade more, I may just use this tool directly, as it does explicitly mention that it can be used for incremental upgrade:
If an existing repository is passed as the destination, then only a diff between source and destination will be migrated. It allows to migrate the content in a few iterations. For instance, following case is possible:
- migrate a large repository a week before go-live
- run the migration again every night (only the recent changes are copied)
- run the migration one final time before go-live
I'll experiment with it, but this coupled with a Shared blob storage (e.g. S3) may provide a quick enough synchronisation during the deploy itself for what I'm after.