Copy link to clipboard
Copied
I need a query that is capable of obtaining X amount of random records from a SQL2008 database, but I also want those results to then order by a numberic field in the same table, camp_priority
I have
SELECT TOP #DisplayAmount#,camp_uid,camp_title,camp_text,camp_image_type | |
FROM campaigns | |
WHERE camp_uid IN |
etc..etc
then at the bottom I have
ORDER BY newid(),camp_priority
The problem seems to be that I'm getting random records, but it's not ordering by the camp_priority
Anybody have any ideas?
Thanks
Mark
Hey Adam,
I misread your original post, now I see how it was supposed to work, just posted below... got it working ... works great. hopefully it's not killing the CPU as this query will be use a LOT.. I'll just get a bigger CPU 😉
With regards to the NEWID() performance issue, I don't have a specific link, I was just hunting around different forums etc and found a few posts that had talked about performance issues, and one that had a command I'd never seen that took a percentage of records with a
...Copy link to clipboard
Copied
I have an idea. Take the newid() out of the order by clause.
Mind you, that only solves the order by problem. I don't share your opinion that you are getting random records. You'll have to give that one some more thought.
Copy link to clipboard
Copied
I'd wanna check the performance of it before running with it as a solution, but this [sort of thing] should work:
SELECT *
FROM (
-- your actual query here
ORDER BY NEWID()
) subq
ORDER BY subq.columnYouWantToOrderBy
--
Adam
Copy link to clipboard
Copied
It looks like I must have the syntax wrong, it's throwing an error.
I've posted the query lower down.
I had to pause for some thought after posting this thread, because the logic that I'm trying to apply to this is probably not going to serve it's purpose. On one hand I want to draw random records from the campaigns table, let's say that's 6 random records that meet the required criteria of correct country, category etc. On the other hand I also want to use my camp_priority to push campaigns up the list.
The issue is - Let's suppose I have 100 campaigns in there with mixed priority levels from 1 to 10, 1 being the highest, and I randomly pick 6 of them and then order them by priority, it won't be doing much justice to the priority set up because it could technically randomly pick out 6 x priority=10. and some 9's... If I have it so it picks 6 random ones of the highest priority then that also causes an issue because you'd never get to see the lower priority campaigns.
I think the best way to handle this is to set all camp_priority to 5 so they are all level, and then anything we want to push up we then set to a higher level 1 to 4, so if we just set 2 campaigns to priority level 1 and 2 then these two would take up the first two slots due to their higher priority and the next 4 slots would be random as they are all same priority, after yapping on so much about this, I think your suggestion would work, if I could get it to function
btw I read that NEWID() has rather slow performance, but finding another solution looked way to complicated for me.
<CFQUERY name="GetCamp" DATASOURCE="#datasource#">
SELECT TOP #DisplayAmount# camp_uid,camp_title,camp_text,camp_image_type
FROM (campaigns
WHERE camp_uid IN
<!--- MATCH COUNTRY --->
(SELECT camptarget_camp_uid FROM CampTarget
WHERE camptarget_country = '#session.usercountry#')
AND camp_UID IN
<!--- ALL CATEGORIES THAT THE CAMPAIGN IS IN --->
(SELECT category_camp_UID FROM category_list
WHERE category_cat_uid IN
<!--- BASED ON CATEGORIES REQUIRED FOR THIS TRACKING ACCOUNT UID --->
(SELECT aff_cat_uid FROM aff_cat
WHERE aff_cat_aff_sub_uid = #val(session.affsub)#)
)
AND camp_status = 1 AND camp_stack = 1
<CFIF #ListLen(session.servedcamps)# GT "0">
AND camp_uid NOT IN (#session.ServedCamps#)
</CFIF>
ORDER BY newid()
)
subq ORDER BY subq.camp_priority
</CFQUERY>
Copy link to clipboard
Copied
Your first subquery doesn't have a SELECT statement. That'll be one source of errors.
As for weighting the significance of your records before returning them "randomly", you could order them on the sum or product of two components: a weighting and a random part. A very simple (perhaps overly simple) would be to weight the records between 1-5, and then add a random component between 1-5 (or 1-10, whatever... you'll need to decide the significance of each).
I can't comment on whether newid() is slow or not... have you got a citation for that (out of interest)?
--
Adam
Copy link to clipboard
Copied
Hey Adam,
I misread your original post, now I see how it was supposed to work, just posted below... got it working ... works great. hopefully it's not killing the CPU as this query will be use a LOT.. I'll just get a bigger CPU 😉
With regards to the NEWID() performance issue, I don't have a specific link, I was just hunting around different forums etc and found a few posts that had talked about performance issues, and one that had a command I'd never seen that took a percentage of records with a lot of perfomance test graphs showing it was 10 times slow than the other approach.
I'm just going to have to see how it goes
Thanks for the help
Mark
Copy link to clipboard
Copied
ACS LLC wrote:
It looks like I must have the syntax wrong, it's throwing an error.
I've posted the query lower down.
Are you perhaps aiming for something like this?
<CFQUERY name="GetCamp" DATASOURCE="#datasource#">
SELECT *
FROM
( SELECT TOP #DisplayAmount# camp_uid,camp_title,camp_text,camp_image_type
FROM campaigns
WHERE camp_uid IN
(SELECT camptarget_camp_uid FROM CampTarget
WHERE camptarget_country = '#session.usercountry#')
AND camp_UID IN
(SELECT category_camp_UID FROM category_list
WHERE category_cat_uid IN
(SELECT aff_cat_uid FROM aff_cat
WHERE aff_cat_aff_sub_uid = #val(session.affsub)#))
AND camp_status = 1
AND camp_stack = 1
<CFIF #ListLen(session.servedcamps)# GT "0">
AND camp_uid NOT IN (#session.ServedCamps#)
</CFIF>
ORDER BY newid()
) subq
ORDER BY subq.camp_priority
</CFQUERY>
Copy link to clipboard
Copied
Thanks for the feedback.. it works.. after a little modfication .... the only thing I found that was that I had to add camp_priority to the query otherwise it would crash
I've given it a good test and it appears to do the job.
not sure if it's chewing up the CPU though.
Thanks
Mark
Copy link to clipboard
Copied
Although this is working, I am seeing something a little odd with regards to performance:
<CFQUERY name="GetCamp" DATASOURCE="#datasource#">
SELECT * FROM
(
SELECT TOP #DisplayAmount#
camp_uid,camp_title,camp_text,camp_image_type,camp_priority
FROM campaigns
WHERE camp_uid IN
<!--- MATCH COUNTRY --->
(SELECT camptarget_camp_uid FROM CampTarget
WHERE camptarget_country = '#session.usercountry#')
AND camp_UID IN
<!--- ALL CATEGORIES THAT THE CAMPAIGN IS IN --->
(SELECT category_camp_UID FROM category_list
WHERE category_cat_uid IN
<!--- BASED ON CATEGORIES REQUIRED FOR THIS TRACKING ACCOUNT UID --->
(SELECT aff_cat_uid FROM aff_cat
WHERE aff_cat_aff_sub_uid = #val(session.affsub)#)
)
AND camp_status = 1 AND camp_stack = 1
<CFIF #ListLen(session.servedcamps)# GT "0">
AND camp_uid NOT IN (#session.ServedCamps#)
</CFIF>
ORDER BY newid()
)
subq ORDER BY subq.camp_priority
</CFQUERY>
The first time I run the query the session.served.camps is empty, and it executes in 4ms or less
CACHED | false |
EXECUTIONTIME | 4 |
SQL | SELECT * FROM ( SELECT TOP 3 camp_uid,camp_title,camp_text,camp_image_type,camp_priority FROM campaigns WHERE camp_uid IN (SELECT camptarget_camp_uid FROM CampTarget WHERE camptarget_country = 'us') AND camp_UID IN (SELECT category_camp_UID FROM category_list WHERE category_cat_uid IN (SELECT aff_cat_uid FROM aff_cat WHERE aff_cat_aff_sub_uid = 2) ) AND camp_status = 1 AND camp_stack = 1 ORDER BY newid() ) subq ORDER BY subq.camp_priority |
Now when I reload it so that session.served.camps has a value, I get this
EXECUTIONTIME | 329 |
SQL | SELECT * FROM ( SELECT TOP 3 camp_uid,camp_title,camp_text,camp_image_type,camp_priority FROM campaigns WHERE camp_uid IN (SELECT camptarget_camp_uid FROM CampTarget WHERE camptarget_country = 'us') AND camp_UID IN (SELECT category_camp_UID FROM category_list WHERE category_cat_uid IN (SELECT aff_cat_uid FROM aff_cat WHERE aff_cat_aff_sub_uid = 2) ) AND camp_status = 1 AND camp_stack = 1 AND camp_uid NOT IN (4,3,7,5) ORDER BY newid() ) subq ORDER BY subq.camp_priority |
Notice how it goes from 4ms or less, to 329ms.. that seems like a big increase just to add in a NOT IN section
The test dbase only has 6 records in it, and camp_UID is the primary key. I even tried to add an extra index but it made no difference
What do you think? Normal behaviour?
Thanks
Mark
Copy link to clipboard
Copied
> Notice how it goes from 4ms or less, to 329ms.. that seems like a big increase just to add in a NOT IN section
> The test dbase only has 6 records in it, and camp_UID is the primary key. I even tried to add an extra index but it made no difference
> What do you think? Normal behaviour?
I think databases usually come with profiling tools for investigating this sort of thing...
--
Adam
Copy link to clipboard
Copied
Not sure, wouldn't know where to start with that
Copy link to clipboard
Copied
Not sure, wouldn't know where to start with that
Surely the answer to that is the same as with any situation where there's a question of "how do I..."
Type this into your browser:
After that, type in a string in the input box that reflects whatyou want to know. Press "Search"...
[etc]
--
Adam
Copy link to clipboard
Copied
I did that, and it lead me back to the Adobe Forum 😉
Copy link to clipboard
Copied
ACS LLC wrote:
I did that, and it lead me back to the Adobe Forum 😉
You googled for how to do database profiling on whatever DB platform you're using, and the best link was back to these forums?
[cough]bullsh!t
Go and do some research.
--
Adam
Copy link to clipboard
Copied
That might have been a joke 😉
Copy link to clipboard
Copied
ahhh. found the performance tools (never used it before), but when I try to open it, I get a fatal error from SQL
Might have to reinstall.. urgh
Copy link to clipboard
Copied
"not in " really slows down queries. A very unintuitive alternative is to do something like this:
select etc
from etc
where somefield in
(select somefield
from etc
except
select somefield
from etc
where somefield in (that list or subquery)
)
If your db does not support the keyword except, it might support the keyword minus. If you are using mssql, it supports both but they behave differently so test early and test often.
Copy link to clipboard
Copied
Thanks Dan. I had a feeling it was NOT IN... it's insane just how much of an effect it has on a query, I never knew, just stumbled across this issue
I'm using MS SQL 2008, so I guess both the commands are available to me, I've never come across or used EXCEPT or MINUS, will try to research that one.
I will definately test...test....test.. last thing I need is for this to serve the wrong data out!
From your experience have you found that the EXCEPT is approach that you suggest is much faster than NOT IN?
Thanks
Mark
Copy link to clipboard
Copied
Both sample queries are slow, but it does show the difference. It's a count of patients who visited yesterday for the first time this year.
This query takes 43 seconds
select count(distinct hsc_number)
from patient join visit_fact using (hsc_number)
where registration_date = '20111219'
and hsc_number not in
(select distinct hsc_number
from visit_fact where registration_date between '20110101' and '20111218'
);
This one takes 9 seconds
select count(distinct hsc_number)
from patient join visit_fact using (hsc_number)
where registration_date = '20111219'
and hsc_number in
(select distinct hsc_number from patient join visit_fact using (hsc_number)
where registration_date = '20111219'
except
select distinct hsc_number
from visit_fact where registration_date between '20110101' and '20111218'
);
They give the same answer.
Copy link to clipboard
Copied
I presume thats ms not seconds
either way, that's one heck of a difference!!!
I'm going to have to go back and do a lot of changing and testing! I had no idea that NOT IN was such a slow process
Thanks
Mark
Copy link to clipboard
Copied
I thought I had this down and working, but recently discovered it was not actually working as designed.
This is what I have for the SQL
<CFQUERY name="GetCamp" DATASOURCE="#datasource#">
SELECT * FROM
(
SELECT TOP 5
camp_uid,camp_title,camp_text,camp_image_type,camp_posting_url,camp_
priority
FROM campaigns
WHERE
camp_status = 1
AND camp_type = 2
ORDER BY newid()
)
subq ORDER BY subq.camp_priority
</CFQUERY>
The idea being that I can find the top 5 random records putting priority first.
I have records that have the priority of
3
5
6
6
6
8
By getting the top 5, random, using priority, I would have expected records with priority
3
5
6
6
6
and 8 would not be present.
However I just ran it and I get
3
5
6
6
8
That's not the desired result
I am guessing the random component is somehow overriding the priority
Anybody have any ideas?
Thanks
Mark
Copy link to clipboard
Copied
ACS LLC wrote:
I have records that have the priority of
3
5
6
6
6
8
By getting the top 5, random, using priority, I would have expected records with priority
3
5
6
6
6
and 8 would not be present.
However I just ran it and I get
3
5
6
6
8
That's not the desired result
From what you've explained, I think your query is behaving as expected. If you ask someone to choose 5 numbers at random out of the set {3,5,6,6,6,8}, and to sort the result, then there will 4 possible answers:
{3,5,6,6,6}
{3,5,6,6,8}
{3,6,6,6,8}
{5,6,6,6,8}
Therefore, what you obtained was just one possibility. If you run the query frequently enough, it will produce each of the 4 possibilities.
Copy link to clipboard
Copied
got it. So what I am actually doing is ordering the random list, which is not what I wanted to do, what I need to do is take the top 5 ordered based on priority, randomly draw should there be records with the same priority and then order them