Skip navigation
ACS LLC
Currently Being Moderated

Retrieving random record set from SQL and also applying an ORDER

Dec 16, 2011 3:31 PM

I need a query that is capable of obtaining X amount of random records from a SQL2008 database, but I also want those results to then order by a numberic field in the same table, camp_priority

 

I have

 

SELECT TOP #DisplayAmount#,camp_uid,camp_title,camp_text,camp_image_type
FROM campaigns
WHERE camp_uid IN

etc..etc

 

then at the bottom I have

 

ORDER BY newid(),camp_priority

 

The problem seems to be that I'm getting random records, but it's not ordering by the camp_priority

 

Anybody have any ideas?

 

Thanks

 

Mark

 
Replies
  • Currently Being Moderated
    Dec 16, 2011 3:54 PM   in reply to ACS LLC

    I have an idea.  Take the newid() out of the order by clause.

     

    Mind you, that only solves the order by problem.  I don't share your opinion that you are getting random records.  You'll have to give that one some more thought.

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 17, 2011 2:47 AM   in reply to ACS LLC

    I'd wanna check the performance of it before running with it as a solution, but this [sort of thing] should work:

     

    SELECT *

    FROM (

         -- your actual query here

         ORDER BY NEWID()

    ) subq

    ORDER BY subq.columnYouWantToOrderBy

     

    --

    Adam

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 17, 2011 4:36 PM   in reply to ACS LLC

    Your first subquery doesn't have a SELECT statement.  That'll be one source of errors.

     

    As for weighting the significance of your records before returning them "randomly", you could order them on the sum or product of two components:  a weighting and a random part.  A very simple (perhaps overly simple) would be to weight the records between 1-5, and then add a random component between 1-5 (or 1-10, whatever... you'll need to decide the significance of each).

     

    I can't comment on whether newid() is slow or not... have you got a citation for that (out of interest)?

     

    --

    Adam

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 18, 2011 1:53 AM   in reply to ACS LLC

    ACS LLC wrote:

     

    It looks like I must have the syntax wrong, it's throwing an error.

     

    I've posted the query lower down.

    Are you perhaps aiming for something like this?

     

    <CFQUERY name="GetCamp" DATASOURCE="#datasource#">

    SELECT *

    FROM

        (    SELECT TOP #DisplayAmount# camp_uid,camp_title,camp_text,camp_image_type

            FROM campaigns

            WHERE camp_uid IN

                (SELECT camptarget_camp_uid FROM CampTarget

                 WHERE camptarget_country = '#session.usercountry#')

            AND camp_UID IN

                (SELECT category_camp_UID FROM category_list

                 WHERE category_cat_uid IN

                        (SELECT aff_cat_uid FROM aff_cat

                         WHERE aff_cat_aff_sub_uid = #val(session.affsub)#))

            AND camp_status = 1

            AND camp_stack = 1

            <CFIF #ListLen(session.servedcamps)# GT "0">

            AND camp_uid NOT IN (#session.ServedCamps#)

            </CFIF>

            ORDER BY newid()

        ) subq

    ORDER BY subq.camp_priority

    </CFQUERY>

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 19, 2011 9:40 AM   in reply to ACS LLC

    > Notice how it goes from 4ms or less, to 329ms.. that seems like a big increase just to add in a NOT IN section

    > The test dbase only has 6 records in it, and camp_UID is the primary key. I even tried to add an extra index but it made no difference

    > What do you think? Normal behaviour?

     

    I think databases usually come with profiling tools for investigating this sort of thing...

     

    --

    Adam

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 19, 2011 10:44 AM   in reply to ACS LLC

     

    Not sure, wouldn't know where to start with that

     

    Surely the answer to that is the same as with any situation where there's a question of "how do I..."

     

    Type this into your browser:

     

    http://www.google.com

     

    After that, type in a string in the input box that reflects whatyou want to know.  Press "Search"...

     

    [etc]

     

    --

    Adam

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 19, 2011 10:48 AM   in reply to ACS LLC

    ACS LLC wrote:

     

    I did that, and it lead me back to the Adobe Forum ;-)

     

    You googled for how to do database profiling on whatever DB platform you're using, and the best link was back to these forums?

     

    [cough]bullsh!t

     

    Go and do some research.

     

    --

    Adam

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 19, 2011 5:58 PM   in reply to ACS LLC

    "not in " really slows down queries.  A very unintuitive alternative is to do something like this:

     

    select etc

    from etc

    where somefield in

    (select somefield

    from etc

    except

    select somefield

    from etc

    where somefield in (that list or subquery)

    )

     

    If your db does not support the keyword except, it might support the keyword minus.  If you are using mssql, it supports both but they behave differently so test early and test often.

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 20, 2011 6:01 AM   in reply to ACS LLC

    Both sample queries are slow, but it does show the difference.  It's a count of patients who visited yesterday for the first time this year.

     

    This query takes 43 seconds

     

    select count(distinct hsc_number)

    from patient join visit_fact using (hsc_number)

    where registration_date = '20111219'

    and hsc_number not in

    (select distinct hsc_number

    from visit_fact where registration_date between '20110101' and '20111218'

    );

     

    

    This one takes 9 seconds

    select count(distinct hsc_number)

    from patient join visit_fact using (hsc_number)

    where registration_date = '20111219'

    and hsc_number in

    (select distinct hsc_number from patient join visit_fact using (hsc_number)

    where registration_date = '20111219'

    except

    select distinct hsc_number

    from visit_fact where registration_date between '20110101' and '20111218'

    );

    They give the same answer.

     
    |
    Mark as:
  • Currently Being Moderated
    Feb 16, 2012 11:52 PM   in reply to ACS LLC

    ACS LLC wrote:

     

     

    I have records that have the priority of

    3

    5

    6

    6

    6

    8

     

    By getting the top 5, random, using priority, I would have expected records with priority

    3

    5

    6

    6

    6

     

    and 8 would not be present.

     

    However I just ran it and I get

    3

    5

    6

    6

    8

     

    That's not the desired result

     

    From what you've explained, I think your query is behaving as expected.  If you ask someone to choose 5 numbers at random out of the set {3,5,6,6,6,8}, and to sort the result, then there will 4 possible answers:

     

    {3,5,6,6,6}

    {3,5,6,6,8}

    {3,6,6,6,8}

    {5,6,6,6,8}

     

    Therefore, what you obtained was just one possibility. If you run the query frequently enough, it will produce each of the 4 possibilities.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points