Hi
I have CQ5.4 code (extract below) which uses QueryBuilder to create a query. The result set is quite large (~2000 nodes) and ordered. I set the hits per page and the start page as I display the results using a pager.
Performance is good for the first page of results, but performance degrades quite significantly as the page start value is increased towards the end of the result set. I find this strange as all nodes must always be accessed as the result set is sorted.
Does anyone have suggestions as to how I might resolve this performance issue?
Thanks
Simon
Session session = resource.getResourceResolver().adaptTo(Session.class);
Query query = queryBuilder.createQuery(PredicateGroup.create(map), session);
if (query != null) {
int hitLimit = (pageMaximum>0) ? pageMaximum : limit;
if (limit>0 && hitLimit>limit)
hitLimit = limit;
query.setHitsPerPage(hitLimit);
query.setStart(pageStart);
SearchResult result = query.getResult();
totalMatches = result.getTotalMatches();
actLimit = (limit>0) ? Math.min(limit, (int)totalMatches) : (int)totalMatches;
}
Sure, I can provide lots more information. I've tried many different styles of query, but found the behaviour to always be the same. The log extracts below show an example query and how increasing the offset using query.setStart() impacts the time taken by query.getResult().
I modified the code a little for some extra debug so it's clear where the time is being spent, snippet below.
I guess my questions really is, has anyone else used query.setStart() to set the offset for paging, did they see similar increases in response times and did they find a resolution ?
Does anyone else agree that it's strange that all the hard work of searching and sorting is fast, and, as expected, takes the same time irrespective of the offset, but returning the results takes longer when the offset is increased ?
Regards
Simon
Query query = queryBuilder.createQuery(PredicateGroup.create(map), session);
if (query != null) {
int hitLimit = (pageMaximum>0) ? pageMaximum : limit;
if (limit>0 && hitLimit>limit)
hitLimit = limit;
query.setHitsPerPage(hitLimit);
query.setStart(pageStart);
log.debug("Time to prepare : " + ((new Date()).getTime()-start) + "ms");
SearchResult result = query.getResult();
log.debug("Prep and getResult : " + ((new Date()).getTime()-start) + "ms");
Example 1: Offset 0, query takes 328 ms, results returned in 328 ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =0.html HTTP/1.1] com.xxx.wcm.components.List Time to prepare : 0ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (URL):
group.0_property=jcr%3acontent%2fpathTags&group.0_property.0_value=tes ting&group.0_property.1_value=portal&group.0_property.2_value=de&group .0_property.and=true&group.p.or=true&orderby=%40jcr%3acontent%2fcq%3al astModified&orderby.index=true&orderby.sort=desc&p.limit=25&p.offset=0
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (predicate tree):
ROOT=group: limit=25, offset=0[
{group=group: or=true[
{0_property=property: 1_value=portal, 2_value=de, property=jcr:content/pathTags, 0_value=testing, and=true}
]}
{orderby=orderby: index=true, sort=desc, orderby=@jcr:content/cq:lastModified}
]
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query: //*[((jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing'))] order by jcr:content/@cq:lastModified descending
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query took 328 ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl >> xpath query returned 2643 results
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =0.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl entire query execution took 328 ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =0.html HTTP/1.1] com.xxx.wcm.components.List Prep and getResult : 328ms
Example 2: Offset 50, query takes 312ms, results returned in 890ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =50.html HTTP/1.1] com.xxx.wcm.components.List Time to prepare : 0ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (URL):
group.0_property=jcr%3acontent%2fpathTags&group.0_property.0_value=tes ting&group.0_property.1_value=portal&group.0_property.2_value=de&group .0_property.and=true&group.p.or=true&orderby=%40jcr%3acontent%2fcq%3al astModified&orderby.index=true&orderby.sort=desc&p.limit=25&p.offset=5 0
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (predicate tree):
ROOT=group: limit=25, offset=50[
{group=group: or=true[
{0_property=property: 1_value=portal, 2_value=de, property=jcr:content/pathTags, 0_value=testing, and=true}
]}
{orderby=orderby: index=true, sort=desc, orderby=@jcr:content/cq:lastModified}
]
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query: //*[((jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing'))] order by jcr:content/@cq:lastModified descending
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query took 312 ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl >> xpath query returned 2643 results
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =50.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl entire query execution took 312 ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =50.html HTTP/1.1] com.xxx.wcm.components.List Prep and getResult : 890ms
Example 3: Offset 2625, query takes 359ms, results returned in 10250ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =2625.html HTTP/1.1] com.xxx.wcm.components.List Time to prepare : 0ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (URL):
group.0_property=jcr%3acontent%2fpathTags&group.0_property.0_value=tes ting&group.0_property.1_value=portal&group.0_property.2_value=de&group .0_property.and=true&group.p.or=true&orderby=%40jcr%3acontent%2fcq%3al astModified&orderby.index=true&orderby.sort=desc&p.limit=25&p.offset=2 625
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (predicate tree):
ROOT=group: limit=25, offset=2625[
{group=group: or=true[
{0_property=property: 1_value=portal, 2_value=de, property=jcr:content/pathTags, 0_value=testing, and=true}
]}
{orderby=orderby: index=true, sort=desc, orderby=@jcr:content/cq:lastModified}
]
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query: //*[((jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing'))] order by jcr:content/@cq:lastModified descending
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query took 359 ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl >> xpath query returned 2643 results
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =2625.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl entire query execution took 359 ms
GET /cm/content/testing/portal/de/simon/article2.contentparsys_list_start =2625.html HTTP/1.1] com.xxx.wcm.components.List Prep and getResult : 10250ms
In all cases, the querybuilder query itself takes around 350 ms, which is good, but the entire result building takes longer, the higher the offset is. Based on this, it seems to me that the internal call to RowIterator.skip() on the JCR query api takes that long (it skips offset number of rows), as that's the only thing depending on offset (number of hits and hence building the query builder result list is stable).
I would investigate what happens if you run this query in xpath directly and measure accordingly:
import javax.jcr.*;
import javax.jcr.query.*;
//...
String xpath = "//*[((jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing'))] order by jcr:content/@cq:lastModified descending"
QueryManager qm = session.getWorkspace().getQueryManager();
QueryResult result = qm.createQuery(xpath, javax.jcr.query.Query.XPATH).execute();
RowIterator rowIter = rawQueryResult.getRows();
// until now should be the same execution time (~350 ms)
// measure this
rowIter.skip(2625); // offset
// and maybe this
List<Node> nodes = new ArrayList<Node>();
// limit
for (long i =0; i < 25 && rows.hasNext(); i++) {
Row row = rows.nextRow();
nodes.add(row.getNode());
log.debug("found: {}", row.getNode().getPath());
}
Hi, thanks for the useful input.
I'd previously tried using XPATH and jcrQuery and seen similar results to when I used QueryBuilder, but your version of the code throws up an interesting variation.
Previously my code used the setOffset method on the query, rather than skipping the row iterator, e.g:
.... Query jcrQuery = qm.createQuery(xpath, javax.jcr.query.Query.XPATH); jcrQuery.setOffset(2625); QueryResult result = jcrQuery.execute(); .... With this version of the code, the execute() method took about 10 seconds to return, so the performance was the same as with QueryBuilder. However, with your version of the code, excution of the query and getting the row iterator takes about 350ms as predicted, and the call to skip takes just under 5 seconds. So, still a mysteriously long time, but over all the list is built in half the time of my previous attempts.
Hi
I've finally found a solution.
I've reworked the search so that it returns the jcr:content nodes which match the property pathTags and ordered by cq:lastModified. Rather than returning the Page/Asset nodes which match the property jcr:content/pathTags and ordered by jcr:content/cq:lastModified. I then work with the parent of the nodes returned by the search. I do in using QueryBuilder, but the xpath would be:
String xpath = "/jcr:root/content//*[((@pathTags = 'portal' and @pathTags = 'de' and @pathTags = 'testing'))] order by @cq:lastModified descending"
The search results are now returned in less than 100ms (that's 100x faster if I can do the maths!), even with an offset of 2625. I'm happy with that.
So, the lesson seems to be, searching based on the properties of a child node is really bad for performance and should be avoided. Which seems to make sense when I think about it. Shame the Adobe/Day documentation is littered with examples that do this.
So adding a path to the XPATH was faster for you than the "//*" in the log files you posted earlier? That is opposite of what I have seen myself, where removing the path showed improvement for large result sets. Reasoning is explained in Question #1 from a Jackrabbit committer in this old post (but is still referenced on the Jackrabbit Wiki so I assume still relevant) http://markmail.org/message/uew5xeyuzdb7v6bv
Jeff, good observation.
Now is probably the time to rewind a little.
The reason I am searching using the pathTags properties is to avoid using path in the query.
We were told, although I never actually measured it, that using paths is slow, particularly when there are multiple paths - which I need to support. Therefore we implemented an event listener which adds the appropriate properties to the pages and assets that we want to appear in the search results. This allows us to search for items under a 'path', without using a path in the query.
However, during development I managed to tag a few things I shouldn't have done, so my search was returning items in the result which shouldn't have been there. Hence I added the /jcr:root/content path to the query to filter out the things I didn't want. That was still in the XPATH I posted earlier. Ooops!
I have now tidied up my data and retested. I've found no discernible difference between when the /jcr:root/content path is included or excluded from the query. Perhaps things are different when the path is to something deeper in the hierarchy, but that would need more detailed testing.
I'd like to reiterate though, that adding a path to the query was not the reason why the query is now much faster than the one I originally posted. The change that made the difference was the change in the way I compare properties. So, from this :
(jcr:content/@pathTags = 'portal' and jcr:content/@pathTags = 'de' and jcr:content/@pathTags = 'testing')
to this :
(@pathTags = 'portal' and @pathTags = 'de' and @pathTags = 'testing')
The result set is then the jcr:content node rather than the page/asset node. But it's easy enough to retrieve the parent node I'm actually interested in when I process the results.
North America
Europe, Middle East and Africa
Asia Pacific