• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Retrieving random record set from SQL and also applying an ORDER

Enthusiast ,
Dec 16, 2011 Dec 16, 2011

Copy link to clipboard

Copied

I need a query that is capable of obtaining X amount of random records from a SQL2008 database, but I also want those results to then order by a numberic field in the same table, camp_priority

I have

SELECT TOP #DisplayAmount#,camp_uid,camp_title,camp_text,camp_image_type
FROM campaigns
WHERE camp_uid IN

etc..etc

then at the bottom I have

ORDER BY newid(),camp_priority

The problem seems to be that I'm getting random records, but it's not ordering by the camp_priority

Anybody have any ideas?

Thanks

Mark

TOPICS
Database access

Views

4.9K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Enthusiast , Dec 18, 2011 Dec 18, 2011

Hey Adam,

I misread your original post, now I see how it was supposed to work, just posted below... got it working ... works great. hopefully it's not killing the CPU as this query will be use a LOT.. I'll just get a bigger CPU 😉

With regards to the NEWID() performance issue, I don't have a specific link, I was just hunting around different forums etc and found a few posts that had talked about performance issues, and one that had a command I'd never seen that took a percentage of records with a

...

Votes

Translate

Translate
LEGEND ,
Dec 16, 2011 Dec 16, 2011

Copy link to clipboard

Copied

I have an idea.  Take the newid() out of the order by clause.

Mind you, that only solves the order by problem.  I don't share your opinion that you are getting random records.  You'll have to give that one some more thought.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 17, 2011 Dec 17, 2011

Copy link to clipboard

Copied

I'd wanna check the performance of it before running with it as a solution, but this [sort of thing] should work:

SELECT *

FROM (

     -- your actual query here

     ORDER BY NEWID()

) subq

ORDER BY subq.columnYouWantToOrderBy

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 17, 2011 Dec 17, 2011

Copy link to clipboard

Copied

It looks like I must have the syntax wrong, it's throwing an error.

I've posted the query lower down.

I had to pause for some thought after posting this thread, because the logic that I'm trying to apply to this is probably not going to serve it's purpose. On one hand I want to draw random records from the campaigns table, let's say that's 6 random records that meet the required criteria of correct country, category etc. On the other hand I also want to use my camp_priority to push campaigns up the list.

The issue is - Let's suppose I have 100 campaigns in there with mixed priority levels from 1 to 10, 1 being the highest, and I randomly pick 6 of them and then order them by priority, it won't be doing much justice to the priority set up because it could technically randomly pick out 6 x priority=10. and some 9's... If I have it so it picks 6 random ones of the highest priority then that also causes an issue because you'd never get to see the lower priority campaigns.

I think the best way to handle this is to set all camp_priority to 5 so they are all level, and then anything we want to push up we then set to a higher level 1 to 4, so if we just set 2 campaigns to priority level 1 and 2 then these two would take up the first two slots due to their higher priority and the next 4 slots would be random as they are all same priority, after yapping on so much about this, I think your suggestion would work, if I could get it to function

btw I read that NEWID() has rather slow performance, but finding another solution looked way to complicated for me.

<CFQUERY name="GetCamp" DATASOURCE="#datasource#">

    SELECT TOP #DisplayAmount# camp_uid,camp_title,camp_text,camp_image_type

    FROM (campaigns

    WHERE camp_uid IN

    <!--- MATCH COUNTRY --->

(SELECT camptarget_camp_uid FROM CampTarget

WHERE camptarget_country = '#session.usercountry#')

    AND camp_UID IN

    <!--- ALL CATEGORIES THAT THE CAMPAIGN IS IN --->

(SELECT category_camp_UID FROM category_list

WHERE category_cat_uid IN

    <!--- BASED ON CATEGORIES REQUIRED FOR THIS TRACKING ACCOUNT UID --->

(SELECT aff_cat_uid FROM aff_cat

WHERE aff_cat_aff_sub_uid = #val(session.affsub)#)

)

AND camp_status = 1 AND camp_stack = 1

<CFIF #ListLen(session.servedcamps)# GT "0">

AND camp_uid NOT IN (#session.ServedCamps#)

</CFIF>

ORDER BY newid()

)

subq ORDER BY subq.camp_priority

</CFQUERY>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 17, 2011 Dec 17, 2011

Copy link to clipboard

Copied

Your first subquery doesn't have a SELECT statement.  That'll be one source of errors.

As for weighting the significance of your records before returning them "randomly", you could order them on the sum or product of two components:  a weighting and a random part.  A very simple (perhaps overly simple) would be to weight the records between 1-5, and then add a random component between 1-5 (or 1-10, whatever... you'll need to decide the significance of each).

I can't comment on whether newid() is slow or not... have you got a citation for that (out of interest)?

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 18, 2011 Dec 18, 2011

Copy link to clipboard

Copied

Hey Adam,

I misread your original post, now I see how it was supposed to work, just posted below... got it working ... works great. hopefully it's not killing the CPU as this query will be use a LOT.. I'll just get a bigger CPU 😉

With regards to the NEWID() performance issue, I don't have a specific link, I was just hunting around different forums etc and found a few posts that had talked about performance issues, and one that had a command I'd never seen that took a percentage of records with a lot of perfomance test graphs showing it was 10 times slow than the other approach.

I'm just going to have to see how it goes

Thanks for the help

Mark

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 18, 2011 Dec 18, 2011

Copy link to clipboard

Copied

ACS LLC wrote:

It looks like I must have the syntax wrong, it's throwing an error.

I've posted the query lower down.

Are you perhaps aiming for something like this?

<CFQUERY name="GetCamp" DATASOURCE="#datasource#">

SELECT *

FROM

    (    SELECT TOP #DisplayAmount# camp_uid,camp_title,camp_text,camp_image_type

        FROM campaigns

        WHERE camp_uid IN

            (SELECT camptarget_camp_uid FROM CampTarget

             WHERE camptarget_country = '#session.usercountry#')

        AND camp_UID IN

            (SELECT category_camp_UID FROM category_list

             WHERE category_cat_uid IN

                    (SELECT aff_cat_uid FROM aff_cat

                     WHERE aff_cat_aff_sub_uid = #val(session.affsub)#))

        AND camp_status = 1

        AND camp_stack = 1

        <CFIF #ListLen(session.servedcamps)# GT "0">

        AND camp_uid NOT IN (#session.ServedCamps#)

        </CFIF>

        ORDER BY newid()

    ) subq

ORDER BY subq.camp_priority

</CFQUERY>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 18, 2011 Dec 18, 2011

Copy link to clipboard

Copied

Thanks for the feedback.. it works.. after a little modfication .... the only thing I found that was that I had to add camp_priority to the query otherwise it would crash

I've given it a good test and it appears to do the job.

not sure if it's chewing up the CPU though.

Thanks

Mark

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

Although this is working, I am seeing something a little odd with regards to performance:

<CFQUERY name="GetCamp" DATASOURCE="#datasource#">

SELECT * FROM

(

    SELECT TOP #DisplayAmount#

camp_uid,camp_title,camp_text,camp_image_type,camp_priority

    FROM campaigns

    WHERE camp_uid IN

    <!--- MATCH COUNTRY --->

(SELECT camptarget_camp_uid FROM CampTarget

WHERE camptarget_country = '#session.usercountry#')

    AND camp_UID IN

    <!--- ALL CATEGORIES THAT THE CAMPAIGN IS IN --->

(SELECT category_camp_UID FROM category_list

WHERE category_cat_uid IN

    <!--- BASED ON CATEGORIES REQUIRED FOR THIS TRACKING ACCOUNT UID --->

(SELECT aff_cat_uid FROM aff_cat

WHERE aff_cat_aff_sub_uid = #val(session.affsub)#)

)

AND camp_status = 1 AND camp_stack = 1

<CFIF #ListLen(session.servedcamps)# GT "0">

AND camp_uid NOT IN (#session.ServedCamps#)

</CFIF>

ORDER BY newid()

)

subq ORDER BY subq.camp_priority

</CFQUERY>

The first time I run the query the session.served.camps is empty, and it executes in 4ms or less

CACHEDfalse
EXECUTIONTIME4
SQLSELECT * FROM  (  SELECT TOP 3 camp_uid,camp_title,camp_text,camp_image_type,camp_priority FROM campaigns WHERE camp_uid IN    (SELECT camptarget_camp_uid FROM CampTarget WHERE camptarget_country = 'us')  AND camp_UID IN    (SELECT category_camp_UID FROM category_list WHERE category_cat_uid IN (SELECT aff_cat_uid FROM aff_cat WHERE aff_cat_aff_sub_uid = 2)  )  AND camp_status = 1 AND camp_stack = 1    ORDER BY newid()  )  subq ORDER BY subq.camp_priority

Now when I reload it so that session.served.camps has a value, I get this

EXECUTIONTIME329
SQLSELECT * FROM  (  SELECT TOP 3 camp_uid,camp_title,camp_text,camp_image_type,camp_priority FROM campaigns WHERE camp_uid IN    (SELECT camptarget_camp_uid FROM CampTarget WHERE camptarget_country = 'us')  AND camp_UID IN    (SELECT category_camp_UID FROM category_list WHERE category_cat_uid IN (SELECT aff_cat_uid FROM aff_cat WHERE aff_cat_aff_sub_uid = 2)  )  AND camp_status = 1 AND camp_stack = 1   AND camp_uid NOT IN (4,3,7,5) ORDER BY newid()  )  subq ORDER BY subq.camp_priority

Notice how it goes from 4ms or less, to 329ms.. that seems like a big increase just to add in a NOT IN section

The test dbase only has 6 records in it, and camp_UID is the primary key. I even tried to add an extra index but it made no difference

What do you think? Normal behaviour?

Thanks

Mark

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

> Notice how it goes from 4ms or less, to 329ms.. that seems like a big increase just to add in a NOT IN section

> The test dbase only has 6 records in it, and camp_UID is the primary key. I even tried to add an extra index but it made no difference

> What do you think? Normal behaviour?

I think databases usually come with profiling tools for investigating this sort of thing...

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

Not sure, wouldn't know where to start with that

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

Not sure, wouldn't know where to start with that

Surely the answer to that is the same as with any situation where there's a question of "how do I..."

Type this into your browser:

http://www.google.com

After that, type in a string in the input box that reflects whatyou want to know.  Press "Search"...

[etc]

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

I did that, and it lead me back to the Adobe Forum 😉

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

ACS LLC wrote:

I did that, and it lead me back to the Adobe Forum 😉

You googled for how to do database profiling on whatever DB platform you're using, and the best link was back to these forums?

[cough]bullsh!t

Go and do some research.

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

That might have been a joke 😉

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

ahhh. found the performance tools (never used it before), but when I try to open it, I get a fatal error from SQL

Might have to reinstall.. urgh

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

"not in " really slows down queries.  A very unintuitive alternative is to do something like this:

select etc

from etc

where somefield in

(select somefield

from etc

except

select somefield

from etc

where somefield in (that list or subquery)

)

If your db does not support the keyword except, it might support the keyword minus.  If you are using mssql, it supports both but they behave differently so test early and test often.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 19, 2011 Dec 19, 2011

Copy link to clipboard

Copied

Thanks Dan. I had a feeling it was NOT IN... it's insane just how much of an effect it has on a query, I never knew, just stumbled across this issue

I'm using MS SQL 2008, so I guess both the commands are available to me, I've never come across or used EXCEPT or MINUS, will try to research that one.

I will definately test...test....test.. last thing I need is for this to serve the wrong data out!

From your experience have you found that the EXCEPT is approach that you suggest is much faster than NOT IN?

Thanks

Mark

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 20, 2011 Dec 20, 2011

Copy link to clipboard

Copied

Both sample queries are slow, but it does show the difference.  It's a count of patients who visited yesterday for the first time this year.

This query takes 43 seconds

select count(distinct hsc_number)

from patient join visit_fact using (hsc_number)

where registration_date = '20111219'

and hsc_number not in

(select distinct hsc_number

from visit_fact where registration_date between '20110101' and '20111218'

);

This one takes 9 seconds

select count(distinct hsc_number)

from patient join visit_fact using (hsc_number)

where registration_date = '20111219'

and hsc_number in

(select distinct hsc_number from patient join visit_fact using (hsc_number)

where registration_date = '20111219'

except

select distinct hsc_number

from visit_fact where registration_date between '20110101' and '20111218'

);

They give the same answer.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 20, 2011 Dec 20, 2011

Copy link to clipboard

Copied

I presume thats ms not seconds

either way, that's one heck of a difference!!!

I'm going to have to go back and do a lot of changing and testing! I had no idea that NOT IN was such a slow process

Thanks

Mark

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 16, 2012 Feb 16, 2012

Copy link to clipboard

Copied

I thought I had this down and working, but recently discovered it was not actually working as designed.

This is what I have for the SQL

<CFQUERY name="GetCamp" DATASOURCE="#datasource#">

SELECT * FROM

(

SELECT TOP 5

camp_uid,camp_title,camp_text,camp_image_type,camp_posting_url,camp_

priority

    FROM campaigns

    WHERE

camp_status = 1

AND camp_type = 2

ORDER BY newid()

)

subq ORDER BY subq.camp_priority

</CFQUERY>

The idea being that I can find the top 5 random records putting priority first.

I have records that have the priority of

3

5

6

6

6

8

By getting the top 5, random, using priority, I would have expected records with priority

3

5

6

6

6

and 8 would not be present.

However I just ran it and I get

3

5

6

6

8

That's not the desired result

I am guessing the random component is somehow overriding the priority

Anybody have any ideas?

Thanks

Mark

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 16, 2012 Feb 16, 2012

Copy link to clipboard

Copied

ACS LLC wrote:

I have records that have the priority of

3

5

6

6

6

8

By getting the top 5, random, using priority, I would have expected records with priority

3

5

6

6

6

and 8 would not be present.

However I just ran it and I get

3

5

6

6

8

That's not the desired result

From what you've explained, I think your query is behaving as expected.  If you ask someone to choose 5 numbers at random out of the set {3,5,6,6,6,8}, and to sort the result, then there will 4 possible answers:

{3,5,6,6,6}

{3,5,6,6,8}

{3,6,6,6,8}

{5,6,6,6,8}

Therefore, what you obtained was just one possibility. If you run the query frequently enough, it will produce each of the 4 possibilities.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 17, 2012 Feb 17, 2012

Copy link to clipboard

Copied

LATEST

got it. So what I am actually doing is ordering the random list, which is not what I wanted to do, what I need to do is take the top 5 ordered based on priority, randomly draw should there be records with the same priority and then order them

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation