7 Replies Latest reply: May 11, 2012 6:53 AM by simon135 RSS

    Query result pagination performance

    simon135 Community Member

      Hi

       

      I have CQ5.4 code (extract below) which uses QueryBuilder to create a query.  The result set is quite large (~2000 nodes) and ordered. I set the hits per page and the start page as I display the results using a pager.

       

      Performance is good for the first page of results, but performance degrades quite significantly as the page start value is increased towards the end of the result set.  I find this strange as all nodes must always be accessed as the result set is sorted.

       

      Does anyone have suggestions as to how I might resolve this performance issue?

       

      Thanks

      Simon

       

       

       

           QueryBuilder queryBuilder = resource.getResourceResolver().adaptTo(QueryBuilder.class);

       

           Session session = resource.getResourceResolver().adaptTo(Session.class);

           Query query = queryBuilder.createQuery(PredicateGroup.create(map), session);

           if (query != null) {

               int hitLimit = (pageMaximum>0) ? pageMaximum : limit;

               if (limit>0 && hitLimit>limit)

                   hitLimit = limit;

               query.setHitsPerPage(hitLimit);

               query.setStart(pageStart);

               SearchResult result = query.getResult();

               totalMatches = result.getTotalMatches();

               actLimit = (limit>0) ? Math.min(limit, (int)totalMatches) : (int)totalMatches;

           }

        • 1. Re: Query result pagination performance
          aklimets Adobe Employee

          No idea why, do you have more information, like the actual query?

          • 2. Re: Query result pagination performance
            simon135 Community Member

            Sure, I can provide lots more information.  I've tried many different styles of query, but found the behaviour to always be the same.  The log extracts below show an example query and how increasing the offset using query.setStart() impacts the time taken by query.getResult().

             

            I modified the code a little for some extra debug so it's clear where the time is being spent, snippet below.

             

            I guess my questions really is, has anyone else used query.setStart() to set the offset for paging, did they see similar increases in response times and did they find a resolution ?

             

            Does anyone else agree that it's strange that all the hard work of searching and sorting is fast, and, as expected, takes the same time irrespective of the offset, but returning the results takes longer when the offset is increased ?

             

            Regards

            Simon

             

                 Query query = queryBuilder.createQuery(PredicateGroup.create(map), session);

             

                 if (query != null) {
                    int hitLimit = (pageMaximum>0) ? pageMaximum : limit;
                    if (limit>0 && hitLimit>limit)
                       hitLimit = limit;

               

                    query.setHitsPerPage(hitLimit);
                    query.setStart(pageStart);


                     log.debug("Time to prepare : " + ((new Date()).getTime()-start) + "ms");

             

                     SearchResult result = query.getResult();
                     
                     log.debug("Prep and getResult : " + ((new Date()).getTime()-start) + "ms");

             

             

            Example 1:  Offset 0, query takes 328 ms, results returned in 328 ms

             

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=0.html HTTP/1.1] com.xxx.wcm.components.List Time to prepare : 0ms
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (URL):
            group.0_property=jcr%3acontent%2fpathTags&group.0_property.0_value=testing&group.0_propert y.1_value=portal&group.0_property.2_value=de&group.0_property.and=true&group.p.or=true&ord erby=%40jcr%3acontent%2fcq%3alastModified&orderby.index=true&orderby.sort=desc&p.limit=25& p.offset=0
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (predicate tree):
            ROOT=group: limit=25, offset=0[
                {group=group: or=true[
                    {0_property=property: 1_value=portal, 2_value=de, property=jcr:content/pathTags, 0_value=testing, and=true}
                ]}
                {orderby=orderby: index=true, sort=desc, orderby=@jcr:content/cq:lastModified}
            ]
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query: //*[((jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing'))] order by jcr:content/@cq:lastModified descending
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query took 328 ms
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl >> xpath query returned 2643 results
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl entire query execution took 328 ms
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=0.html HTTP/1.1] com.xxx.wcm.components.List Prep and getResult : 328ms

             

             

            Example 2:  Offset 50, query takes 312ms, results returned in 890ms

             

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=50.html HTTP/1.1] com.xxx.wcm.components.List Time to prepare : 0ms

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (URL):

            group.0_property=jcr%3acontent%2fpathTags&group.0_property.0_value=testing&group.0_propert y.1_value=portal&group.0_property.2_value=de&group.0_property.and=true&group.p.or=true&ord erby=%40jcr%3acontent%2fcq%3alastModified&orderby.index=true&orderby.sort=desc&p.limit=25& p.offset=50

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (predicate tree):

            ROOT=group: limit=25, offset=50[

                {group=group: or=true[

                    {0_property=property: 1_value=portal, 2_value=de, property=jcr:content/pathTags, 0_value=testing, and=true}

                ]}

                {orderby=orderby: index=true, sort=desc, orderby=@jcr:content/cq:lastModified}

            ]

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query: //*[((jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing'))] order by jcr:content/@cq:lastModified descending

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query took 312 ms

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl >> xpath query returned 2643 results

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl entire query execution took 312 ms

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=50.html HTTP/1.1] com.xxx.wcm.components.List Prep and getResult : 890ms

             

            Example 3: Offset 2625, query takes 359ms, results returned in 10250ms

             

            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=2625.html HTTP/1.1] com.xxx.wcm.components.List Time to prepare : 0ms
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (URL):
            group.0_property=jcr%3acontent%2fpathTags&group.0_property.0_value=testing&group.0_propert y.1_value=portal&group.0_property.2_value=de&group.0_property.and=true&group.p.or=true&ord erby=%40jcr%3acontent%2fcq%3alastModified&orderby.index=true&orderby.sort=desc&p.limit=25& p.offset=2625
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (predicate tree):
            ROOT=group: limit=25, offset=2625[
                {group=group: or=true[
                    {0_property=property: 1_value=portal, 2_value=de, property=jcr:content/pathTags, 0_value=testing, and=true}
                ]}
                {orderby=orderby: index=true, sort=desc, orderby=@jcr:content/cq:lastModified}
            ]
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query: //*[((jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing'))] order by jcr:content/@cq:lastModified descending
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query took 359 ms
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl >> xpath query returned 2643 results
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl entire query execution took 359 ms
            GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start=2625.html HTTP/1.1] com.xxx.wcm.components.List Prep and getResult : 10250ms

            • 3. Re: Query result pagination performance
              aklimets Adobe Employee

              In all cases, the querybuilder query itself takes around 350 ms, which is good, but the entire result building takes longer, the higher the offset is. Based on this, it seems to me that the internal call to RowIterator.skip() on the JCR query api takes that long (it skips offset number of rows), as that's the only thing depending on offset (number of hits and hence building the query builder result list is stable).

               

              I would investigate what happens if you run this query in xpath directly and measure accordingly:

               

              import javax.jcr.*;

              import javax.jcr.query.*;

              //...

              String xpath = "//*[((jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing'))] order by jcr:content/@cq:lastModified descending"

              QueryManager qm = session.getWorkspace().getQueryManager();

              QueryResult result = qm.createQuery(xpath, javax.jcr.query.Query.XPATH).execute();

              RowIterator rowIter = rawQueryResult.getRows();

              // until now should be the same execution time (~350 ms)

               

              // measure this

              rowIter.skip(2625); // offset

               

              // and maybe this

              List<Node> nodes = new ArrayList<Node>();

              // limit

              for (long i =0; i < 25 && rows.hasNext(); i++) {

                  Row row = rows.nextRow();

                  nodes.add(row.getNode());

                  log.debug("found: {}", row.getNode().getPath());

              }

               

              • 4. Re: Query result pagination performance
                simon135 Community Member

                Hi, thanks for the useful input.

                 

                I'd previously tried using XPATH and jcrQuery and seen similar results to when I used QueryBuilder, but your version of the code throws up an interesting variation.

                 

                Previously my code used the setOffset method on the query, rather than skipping the row iterator, e.g:

                 

                ....

                Query jcrQuery = qm.createQuery(xpath, javax.jcr.query.Query.XPATH);

                jcrQuery.setOffset(2625);

                QueryResult result = jcrQuery.execute();

                ....

                 

                 

                With this version of the code, the execute() method took about 10 seconds to return, so the performance was the same as with QueryBuilder.

                 

                However, with your version of the code, excution of the query and getting the row iterator takes about 350ms as predicted, and the call to skip takes just under 5 seconds.  So, still a mysteriously long time, but over all the list is built in half the time of my previous attempts.

                • 5. Re: Query result pagination performance
                  simon135 Community Member

                  Hi

                   

                  I've finally found a solution.

                   

                   

                  I've reworked the search so that it returns the jcr:content nodes which match the property pathTags and ordered by cq:lastModified. Rather than returning the Page/Asset nodes which match the property jcr:content/pathTags and ordered by jcr:content/cq:lastModified. I then work with the parent of the nodes returned by the search.

                   

                  I do in using QueryBuilder, but the xpath would be:

                   

                   

                  String xpath = "/jcr:root/content//*[((@pathTags = 'portal' and @pathTags = 'de' and @pathTags = 'testing'))] order by @cq:lastModified descending"

                  The search results are now returned in less than 100ms (that's 100x faster if I can do the maths!), even with an offset of 2625.  I'm happy with that.

                  So, the lesson seems to be, searching based on the properties of a child node is really bad for performance and should be avoided.  Which seems to make sense when I think about it.  Shame the Adobe/Day documentation is littered with examples that do this.

                  • 6. Re: Query result pagination performance
                    Jeff Brown Community Member

                    So adding a path to the XPATH was faster for you than the "//*" in the log files you posted earlier?  That is opposite of what I have seen myself, where removing the path showed improvement for large result sets.  Reasoning is explained in Question #1 from a Jackrabbit committer in this old post (but is still referenced on the Jackrabbit Wiki so I assume still relevant) http://markmail.org/message/uew5xeyuzdb7v6bv

                    • 7. Re: Query result pagination performance
                      simon135 Community Member

                      Jeff, good observation.

                       

                      Now is probably the time to rewind a little.

                       

                      The reason I am searching using the pathTags properties is to avoid using path in the query. 

                       

                      We were told, although I never actually measured it, that using paths is slow, particularly when there are multiple paths - which I need to support.  Therefore we implemented an event listener which adds the appropriate properties to the pages and assets that we want to appear in the search results.  This allows us to search for items under a 'path', without using a path in the query.

                       

                      However, during development I  managed to tag a few things I shouldn't have done, so my search was returning items in the result which shouldn't have been there.  Hence I added the /jcr:root/content path to the query to filter out the things I didn't want.  That was still in the XPATH I posted earlier. Ooops!

                       

                      I have now tidied up my data and retested.  I've found no discernible difference between when the /jcr:root/content path is included or excluded from the query.  Perhaps things are different when the path is to something deeper in the hierarchy, but that would need more detailed testing.

                       

                      I'd like to reiterate though, that adding a path to the query was not the reason why the query is now much faster than the one I originally posted.  The change that made the difference was the change in the way I compare properties.  So, from this :

                       

                      (jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing') 

                       

                      to this :

                       

                      (@pathTags = 'portal' and @pathTags = 'de' and @pathTags = 'testing')

                       

                      The result set is then the jcr:content node rather than the page/asset node.  But it's easy enough to retrieve the parent node I'm actually interested in when I process the results.