9 Replies Latest reply on May 22, 2013 9:09 PM by Jörg Hoh

    Indexs in CQ5.4 and reading data from CRX

    bhssss

      Hi Jorg,

       

      We have a problem somewhat related to the above mentioned. We expect that our problem is around indexs and way CQ5.4 use indexes to refer to data from repository.

       

      1) Does CQ allow reading different branches of a tree parallely?

       

      2) Our application read different branches of tree using multiple threads (102) and we observe that it takes 30% more time when compared to reading whole tree using a single thread.

      3) When we profile application it indicates that our threads run mutually exclusive. Memory (heap) usage is around 40% of total memory & CPU utilization is around 17% on an average basis and it oscillates between (10-40%) and never exceeds 40%. From application perspective we just kick off all threads using executor service and continue processing as each thread returns.

      4) We will run this application somewhere between 15 & 20 times a day. Our observation is that 1st run of the day takes 50% more time when compared to subsequent runs. And it happens very consistently. We simulate this by re-starting the whole CQ application.

       

      5) Now the problem gets more interesting, we changed settings in repository.xml to keep index in memory. For this we need to re-start our application twice. Once without index in memory (midnight) so that all index merging completes and then in the morning (8am) we will re-start our application with index in memory. Our whole CQ application will take 30 minutes to start without index in memory & it takes 7mintues to start with index in memory. But when we run our application takes almost double the amount of time(4.5hrs) to complete with index in memory when compared to 2.5hrs without index in memory.

       

      Note :- All the details above are facts from testing without XIV SSD Cache which gets introduced now.

       

      6) We have newly introduced XIV SSD cache (6TB) from IBM, which sits, on top of our SAN disks. This is a persistent cache and 80-90% of total hits, get data from this cache. When we run our application with index in memory it takes 4.5 hours to complete, but when we run our application with out index in memory it takes 1.3 hours to complete.

       

      We are clueless now and thinking that way CQ5.4 manages indexes is the problem.

       

      Any help will be greatly appreciated.

       

      Thanks,

      Harry.

        • 1. Re: Indexs in CQ5.4 and reading data from CRX
          bhssss Level 1

          Please provide links to any documentation related to how CQ uses index files greatly helps.

          • 2. Re: Indexs in CQ5.4 and reading data from CRX
            bhssss Level 1

            Our testing indicates that reading different braches of a tree using different threads is slower than reading one branch after another.

            • 3. Re: Indexs in CQ5.4 and reading data from CRX
              Sham HC Level 7

              Hi Harry,

               

                 I am not sure of overall context here.  Please file daycare & in the ticket attach

                

              *   profiling data using the built-in profiler. http://dev.day.com/content/kb/home/Crx/Troubleshooting/AnalyzeUsingBuiltInProfiler.html

              *   Log files

              *   List of hf installed.

              *   Output of https://helpx.adobe.com/crx/kb/AnalyzePersistenceProblems.html

              *   Your startup script.

               

              Thanks,

              Sham

              • 4. Re: Indexs in CQ5.4 and reading data from CRX
                bhssss Level 1

                Hi Sham,

                 

                1) Does CQ allow reading different branches of a tree parallely?

                 

                2) Please provide links to any documentation related to how CQ uses index files greatly helps.

                 

                Best Regards,

                Harry.

                • 5. Re: Indexs in CQ5.4 and reading data from CRX
                  Jörg Hoh Adobe Employee

                  Harry,

                   

                  I did not really understand your application, but as you describe, the index-in-memory feature greatly influences the runtime of your application; and sometimes even to the worse.

                   

                  Let me clarify a few things:

                  * Queries can be efficient and non-efficient, depending on the query itself. And it's best if you can avoid queries alltogether. So you should at least review your application and check if you can replace queries by tree-traversales. Traversing the tree does not leverage the index at all, so it could improve the performance.

                  * Depending on the size of your index, it consumes a lot of Java heap. You instead of managing a 2G heap, you might end up with a 20G heap, which is of course much harder to tune, and if case has much longer garbage collection times. Have you checked your garbage collection when you have the index-in-memory setting turned on?

                  * You will always benefit from a fast SSD, but you can already avoid many performance problems when you have enough free RAM for disk buffering.

                  1 person found this helpful
                  • 6. Re: Indexs in CQ5.4 and reading data from CRX
                    bhssss Level 1

                    Hi Jorg/Sham,

                     

                    Does CQ allow reading different branches of a tree parallely?

                     

                    Can you just throw some ideas, when index inmemory influences the runtime on worse side?

                     

                    We will run this application somewhere between 15 & 20 times a day. Our observation is that 1st run of the day takes 50% more time when compared to subsequent runs. And it happens very consistently. We simulate this by re-starting the whole CQ application. Can you throw some ideas why it happens?

                     

                    Please provide links to any documentation related to how CQ uses index files greatly helps.

                     

                    • There are no queries in our application, all are tree traversals. I understand from your statements that tree traversal does not read indexes and hence performance with index in memory ON or OFF should have same performance.
                    • Our Indexes sizes is of 2GB and we allocate 9GB for our entire application, on profiling using yourkit we observed that memory consumption varies between 4GB & 6GB. When we run our application we see garbage collection(GC) is like 1500 minor collections & approximately 10 major GC happening, total GC time is 2mins when our application runs for 2 hours.
                    • Our entire CQ application has got 9GB memory and in the complete life cycle of application it never exceeds 6GB. So it has got enough free RAM.
                    • 7. Re: Indexs in CQ5.4 and reading data from CRX
                      Jörg Hoh Adobe Employee

                      Hi Harry,

                       

                      My bad, I misread your "index" as "Lucene Index". Therefor I put the "Index-In-Memory" into the wrong area.

                       

                      So, "Index in memory" is actually referring to the TarIndex; and loading this index to the heap memory  can speedup TarPM operations, mostly resolving nodes (operations like getNode(), listChildren() and so forth). So if you heavily do tree traversales, you should see a benefit from this action.

                       

                      (I think, that we can exclude the analysis of queries and lucene from your scenario, as you don't do any.)

                       

                      If you heavily traversing the tree, you might want to increase the CRX bundleCacheSize (default: 8 megabytes) to some reasonable value; in your case I would recommend to start with 512 megabyte, maybe increasing it even up to 2 gigabytes.  You will find some usage statistics in crx-quickstart/logs/crx/error.log.

                       

                      How large is your tarPM? How many nodes do you have in your repository? When you provide this information, we can give better recommendations. Have you already started your tuning cycle?

                       

                      And in any case I would suggest you to raise a Daycare ticket.

                       

                      cheers,

                      Jörg

                      • 8. Re: Indexs in CQ5.4 and reading data from CRX
                        bhssss Level 1

                        Hi Jorg,

                         

                        As none is answering the question, does CQ/CRX allow reading different branches of a tree parallely. Can I assume that CQ/CRX does not allow reading different braches of a tree parallely.

                         

                        We thought the same & push index in memory and results are worse. We are seeing that our application is running for longer time. Do you have any thoughts, why would a application run for longer time with index in memory.

                         

                        Our CRX bundleCacheSize is 512MB, what parameters should be considered for optimum bundleCacheSize?

                         

                        tarPM size is 50GB & index files are 2GB in size. We will be reading roughly around a million nodes(most(800,000) of these are under a single branch & others in some other branches under content).

                         

                        Thank You,

                        Harry.

                        • 9. Re: Indexs in CQ5.4 and reading data from CRX
                          Jörg Hoh Adobe Employee

                          Of course you can read different branches in parallel. But depending on the number of threads and how much you read, you might run into a cache-trashing scenario, where your threads compete against each other and the cache is not a big benefit then.

                           

                          Iterating throgh a million nodes might not be the most performant activity. What is your application doing? How doues your algorithm look like?