Retrieving random record set from SQL and also applying an ORDER

Report · Dec 16, 2011

I need a query that is capable of obtaining X amount of random records from a SQL2008 database, but I also want those results to then order by a numberic field in the same table, camp_priority

I have

	SELECT TOP #DisplayAmount#,camp_uid,camp_title,camp_text,camp_image_type
	FROM campaigns
	WHERE camp_uid IN

etc..etc

then at the bottom I have

ORDER BY newid(),camp_priority

The problem seems to be that I'm getting random records, but it's not ordering by the camp_priority

Anybody have any ideas?

Thanks

Mark

Report · Dec 16, 2011

I have an idea. Take the newid() out of the order by clause.

Mind you, that only solves the order by problem. I don't share your opinion that you are getting random records. You'll have to give that one some more thought.

Report · Dec 17, 2011

I'd wanna check the performance of it before running with it as a solution, but this [sort of thing] should work:

SELECT *

FROM (

-- your actual query here

ORDER BY NEWID()

) subq

ORDER BY subq.columnYouWantToOrderBy

--

Adam

Report · Dec 17, 2011

It looks like I must have the syntax wrong, it's throwing an error.

I've posted the query lower down.

I had to pause for some thought after posting this thread, because the logic that I'm trying to apply to this is probably not going to serve it's purpose. On one hand I want to draw random records from the campaigns table, let's say that's 6 random records that meet the required criteria of correct country, category etc. On the other hand I also want to use my camp_priority to push campaigns up the list.

The issue is - Let's suppose I have 100 campaigns in there with mixed priority levels from 1 to 10, 1 being the highest, and I randomly pick 6 of them and then order them by priority, it won't be doing much justice to the priority set up because it could technically randomly pick out 6 x priority=10. and some 9's... If I have it so it picks 6 random ones of the highest priority then that also causes an issue because you'd never get to see the lower priority campaigns.

I think the best way to handle this is to set all camp_priority to 5 so they are all level, and then anything we want to push up we then set to a higher level 1 to 4, so if we just set 2 campaigns to priority level 1 and 2 then these two would take up the first two slots due to their higher priority and the next 4 slots would be random as they are all same priority, after yapping on so much about this, I think your suggestion would work, if I could get it to function

btw I read that NEWID() has rather slow performance, but finding another solution looked way to complicated for me.

SELECT TOP #DisplayAmount# camp_uid,camp_title,camp_text,camp_image_type

FROM (campaigns

WHERE camp_uid IN

(SELECT camptarget_camp_uid FROM CampTarget

WHERE camptarget_country = '#session.usercountry#')

AND camp_UID IN

(SELECT category_camp_UID FROM category_list

WHERE category_cat_uid IN

(SELECT aff_cat_uid FROM aff_cat

WHERE aff_cat_aff_sub_uid = #val(session.affsub)#)

)

AND camp_status = 1 AND camp_stack = 1

AND camp_uid NOT IN (#session.ServedCamps#)

</CFIF>

ORDER BY newid()

)

subq ORDER BY subq.camp_priority

</CFQUERY>

Report · Dec 17, 2011

Your first subquery doesn't have a SELECT statement. That'll be one source of errors.

As for weighting the significance of your records before returning them "randomly", you could order them on the sum or product of two components: a weighting and a random part. A very simple (perhaps overly simple) would be to weight the records between 1-5, and then add a random component between 1-5 (or 1-10, whatever... you'll need to decide the significance of each).

I can't comment on whether newid() is slow or not... have you got a citation for that (out of interest)?

--

Adam

Report · Dec 18, 2011

Hey Adam,

I misread your original post, now I see how it was supposed to work, just posted below... got it working ... works great. hopefully it's not killing the CPU as this query will be use a LOT.. I'll just get a bigger CPU 😉

With regards to the NEWID() performance issue, I don't have a specific link, I was just hunting around different forums etc and found a few posts that had talked about performance issues, and one that had a command I'd never seen that took a percentage of records with a lot of perfomance test graphs showing it was 10 times slow than the other approach.

I'm just going to have to see how it goes

Thanks for the help

Mark

Report · Dec 18, 2011

ACS LLC wrote:
It looks like I must have the syntax wrong, it's throwing an error.
I've posted the query lower down.

Are you perhaps aiming for something like this?

SELECT *

FROM

( SELECT TOP #DisplayAmount# camp_uid,camp_title,camp_text,camp_image_type

FROM campaigns

WHERE camp_uid IN

(SELECT camptarget_camp_uid FROM CampTarget

WHERE camptarget_country = '#session.usercountry#')

AND camp_UID IN

(SELECT category_camp_UID FROM category_list

WHERE category_cat_uid IN

(SELECT aff_cat_uid FROM aff_cat

WHERE aff_cat_aff_sub_uid = #val(session.affsub)#))

AND camp_status = 1

AND camp_stack = 1

AND camp_uid NOT IN (#session.ServedCamps#)

</CFIF>

ORDER BY newid()

) subq

ORDER BY subq.camp_priority

</CFQUERY>

Report · Dec 18, 2011

Thanks for the feedback.. it works.. after a little modfication .... the only thing I found that was that I had to add camp_priority to the query otherwise it would crash

I've given it a good test and it appears to do the job.

not sure if it's chewing up the CPU though.

Thanks

Mark

Report · Dec 19, 2011

Although this is working, I am seeing something a little odd with regards to performance:

SELECT * FROM

(

SELECT TOP #DisplayAmount#

camp_uid,camp_title,camp_text,camp_image_type,camp_priority

FROM campaigns

WHERE camp_uid IN

(SELECT camptarget_camp_uid FROM CampTarget

WHERE camptarget_country = '#session.usercountry#')

AND camp_UID IN

(SELECT category_camp_UID FROM category_list

WHERE category_cat_uid IN

(SELECT aff_cat_uid FROM aff_cat

WHERE aff_cat_aff_sub_uid = #val(session.affsub)#)

)

AND camp_status = 1 AND camp_stack = 1

AND camp_uid NOT IN (#session.ServedCamps#)

</CFIF>

ORDER BY newid()

)

subq ORDER BY subq.camp_priority

</CFQUERY>

The first time I run the query the session.served.camps is empty, and it executes in 4ms or less

CACHED	false
EXECUTIONTIME	4
SQL	SELECT * FROM ( SELECT TOP 3 camp_uid,camp_title,camp_text,camp_image_type,camp_priority FROM campaigns WHERE camp_uid IN (SELECT camptarget_camp_uid FROM CampTarget WHERE camptarget_country = 'us') AND camp_UID IN (SELECT category_camp_UID FROM category_list WHERE category_cat_uid IN (SELECT aff_cat_uid FROM aff_cat WHERE aff_cat_aff_sub_uid = 2) ) AND camp_status = 1 AND camp_stack = 1 ORDER BY newid() ) subq ORDER BY subq.camp_priority

Now when I reload it so that session.served.camps has a value, I get this

EXECUTIONTIME

329

SQL

SELECT * FROM ( SELECT TOP 3 camp_uid,camp_title,camp_text,camp_image_type,camp_priority FROM campaigns WHERE camp_uid IN (SELECT camptarget_camp_uid FROM CampTarget WHERE camptarget_country = 'us') AND camp_UID IN (SELECT category_camp_UID FROM category_list WHERE category_cat_uid IN (SELECT aff_cat_uid FROM aff_cat WHERE aff_cat_aff_sub_uid = 2) ) AND camp_status = 1 AND camp_stack = 1 AND camp_uid NOT IN (4,3,7,5) ORDER BY newid() ) subq ORDER BY subq.camp_priority

Notice how it goes from 4ms or less, to 329ms.. that seems like a big increase just to add in a NOT IN section

The test dbase only has 6 records in it, and camp_UID is the primary key. I even tried to add an extra index but it made no difference

What do you think? Normal behaviour?

Thanks

Mark

Report · Dec 19, 2011

> Notice how it goes from 4ms or less, to 329ms.. that seems like a big increase just to add in a NOT IN section

> The test dbase only has 6 records in it, and camp_UID is the primary key. I even tried to add an extra index but it made no difference

> What do you think? Normal behaviour?

I think databases usually come with profiling tools for investigating this sort of thing...

--

Adam

Report · Dec 19, 2011

Not sure, wouldn't know where to start with that

Report · Dec 19, 2011

Not sure, wouldn't know where to start with that

Surely the answer to that is the same as with any situation where there's a question of "how do I..."

Type this into your browser:

http://www.google.com

After that, type in a string in the input box that reflects whatyou want to know. Press "Search"...

[etc]

--

Adam

Report · Dec 19, 2011

I did that, and it lead me back to the Adobe Forum 😉

Report · Dec 19, 2011

ACS LLC wrote:
I did that, and it lead me back to the Adobe Forum 😉

You googled for how to do database profiling on whatever DB platform you're using, and the best link was back to these forums?

[cough]bullsh!t

Go and do some research.

--

Adam

Report · Dec 19, 2011

That might have been a joke 😉

Report · Dec 19, 2011

ahhh. found the performance tools (never used it before), but when I try to open it, I get a fatal error from SQL

Might have to reinstall.. urgh

Report · Dec 19, 2011

"not in " really slows down queries. A very unintuitive alternative is to do something like this:

select etc

from etc

where somefield in

(select somefield

from etc

except

select somefield

from etc

where somefield in (that list or subquery)

)

If your db does not support the keyword except, it might support the keyword minus. If you are using mssql, it supports both but they behave differently so test early and test often.

Report · Dec 19, 2011

Thanks Dan. I had a feeling it was NOT IN... it's insane just how much of an effect it has on a query, I never knew, just stumbled across this issue

I'm using MS SQL 2008, so I guess both the commands are available to me, I've never come across or used EXCEPT or MINUS, will try to research that one.

I will definately test...test....test.. last thing I need is for this to serve the wrong data out!

From your experience have you found that the EXCEPT is approach that you suggest is much faster than NOT IN?

Thanks

Mark

Report · Dec 20, 2011

Both sample queries are slow, but it does show the difference. It's a count of patients who visited yesterday for the first time this year.

This query takes 43 seconds

select count(distinct hsc_number)

from patient join visit_fact using (hsc_number)

where registration_date = '20111219'

and hsc_number not in

(select distinct hsc_number

from visit_fact where registration_date between '20110101' and '20111218'

);

This one takes 9 seconds

select count(distinct hsc_number)

from patient join visit_fact using (hsc_number)

where registration_date = '20111219'

and hsc_number in

(select distinct hsc_number from patient join visit_fact using (hsc_number)

where registration_date = '20111219'

except

select distinct hsc_number

from visit_fact where registration_date between '20110101' and '20111218'

);

They give the same answer.

Report · Dec 20, 2011

I presume thats ms not seconds

either way, that's one heck of a difference!!!

I'm going to have to go back and do a lot of changing and testing! I had no idea that NOT IN was such a slow process

Thanks

Mark

Report · Feb 16, 2012

I thought I had this down and working, but recently discovered it was not actually working as designed.

This is what I have for the SQL

SELECT * FROM

(

SELECT TOP 5

camp_uid,camp_title,camp_text,camp_image_type,camp_posting_url,camp_

priority

FROM campaigns

WHERE

camp_status = 1

AND camp_type = 2

ORDER BY newid()

)

subq ORDER BY subq.camp_priority

</CFQUERY>

The idea being that I can find the top 5 random records putting priority first.

I have records that have the priority of

3

5

6

8

By getting the top 5, random, using priority, I would have expected records with priority

3

5

6

and 8 would not be present.

However I just ran it and I get

3

5

6

8

That's not the desired result

I am guessing the random component is somehow overriding the priority

Anybody have any ideas?

Thanks

Mark

Report · Feb 16, 2012

ACS LLC wrote:
I have records that have the priority of
3
5
6
6
6
8
By getting the top 5, random, using priority, I would have expected records with priority
3
5
6
6
6
and 8 would not be present.
However I just ran it and I get
3
5
6
6
8
That's not the desired result

From what you've explained, I think your query is behaving as expected. If you ask someone to choose 5 numbers at random out of the set {3,5,6,6,6,8}, and to sort the result, then there will 4 possible answers:

{3,5,6,6,6}

{3,5,6,6,8}

{3,6,6,6,8}

{5,6,6,6,8}

Therefore, what you obtained was just one possibility. If you run the query frequently enough, it will produce each of the 4 possibilities.

Report · Feb 17, 2012

got it. So what I am actually doing is ordering the random list, which is not what I wanted to do, what I need to do is take the top 5 ordered based on priority, randomly draw should there be records with the same priority and then order them

Adobe Community

Retrieving random record set from SQL and also applying an ORDER

1 Correct answer