2 Replies Latest reply on Oct 13, 2013 3:39 PM by johnrellis

    Creating large collections via the SDK

    johnrellis Most Valuable Participant

      I'm having significant performance problems creating large collections via the SDK -- it's much slower than creating collections in the user interface.  For example, using the SDK to create a collection of 15K photos is 8 times slower than via the user interface (218 versus 27 seconds)!

       

      Does anyone have any relevant experience at making large collections?  Am I missing something?

       

      Here's what I've learned so far. Using a fresh test catalog of 25K photos,  I first measured the simplest approach to creating a collection:

       

          catalog:withWriteAccessDo (no timeout params)
              catalog:createCollection
              collection:addPhotos 
      

       

      This is fine for collections with fewer than 1K photos, but at 2K photos and larger, it really starts slowing down dramatically:

       

      Table1.PNG

      Graph1.PNG

       

      It takes almost two minutes to make a collection of 8K photos and ten minutes for a collection of 25K photos!

       

      Next, I tried adding photos in chunks of, say, 128 photos, one chunk per transaction:

       

          
          catalog:withWriteAccessDo (no timeout params)
              catalog:createCollection
      
          for each chunk of 128 photos
              catalog:withWriteAccessDo (no timeout params)
                  collection:addPhotos (128 photos)
      

      I measured various chunk sizes from 1 to 2048, and there's not much difference in total time with sizes between 64 and 1024.

       

      But even with chunking, the SDK is much slower than the UI at creating collections: 

       

      Table2.PNG

       

      In general, the larger the collection, the slower it is to create in the SDK - creating a collection of 10K photos is 5 times slower, and creating a collection of 15K photos is 8 times slower!  Here's a plot showing the ratio of the SDK time to the UI time versus the size of the collection:

       

      Graph2.PNG

       

      It's very suspicious that the slowth is linear in the size of the collection.  This suggests the SDK is using an inappropriate n-squared algorithm compared to the UI.

       

      I wonder if the difference between the UI and SDK methods is how the SDK handles the undo stack?  I'd guess the underlying SQL operations on the catalog are identical and not the cause of the difference, but who knows.

       

      These measurements were all done on LR 5.2 Windows 7 64-bit 8 GB memory and 7200 RPM disk.    LR 4.4 behaves very similarly.

       

      PS: I've been exploring the use of collections to represent search results in Any Filter.  Given LR's bias towards collections instead of filters, many users feel more comfortable accessing the search results via collections. And collections would make it easy for a user to "go back" to a previous set of search results.

        • 1. Re: Creating large collections via the SDK
          areohbee Level 5

          One thing to consider: when you add photos to a collection via SDK, you are also giving work to Lr native to do in the background (i.o.w. it's not finished when you return from with-do), and so as you go SDK is competing more with Lr native (Lr native may be becoming increasingly back-logged). Dunno if that has any bearing, but perhaps worth consideration...

           

          Lr performance is a big giant mystery to me. I mean, just exporting, I can watch Lr performance slow and slow and slow (when exporting large numbers of photos), even when there appears to be no memory leak. Ditto for other ops. So although I can't explain your observation (i.e. the decreased performance adding lotsa photos to collection), (nor any of my similar observations), it doesn't surprise me either.

           

          Rob

          • 2. Re: Creating large collections via the SDK
            johnrellis Most Valuable Participant

            when you add photos to a collection via SDK, you are also giving work to Lr native to do in the background (i.o.w. it's not finished when you return from with-do), and so as you go SDK is competing more with Lr native (Lr native may be becoming increasingly back-logged). Dunno if that has any bearing, but perhaps worth consideration...

            It did indeed have a bearing.  I noticed that background work too, so my measurement script had a 1-minute delay between the creation of collections, and I verified that was sufficient for LR to finish up its background work (in terms of CPU and disk i/o).

             

            Thanks.