• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

How do I select single entry based on mutiple values?

Community Beginner ,
Jul 15, 2011 Jul 15, 2011

Copy link to clipboard

Copied

Okay – here’s one for the guys smarter than me – IE: everyone.  An interesting query.

(CF8, MySQL5)

I’m building an internal stock library for a large firm.  Easy job – and its always the easy ones that knock you, ain’t it? 

Basically, its very very simple, I have three tables.

Table 1 is a list of images (ImageTbl) – this has a lot of data on it, but the only important thing from ImageTbl is the unique identifier, RowID.  It looks something like this

RowidPhoto
1Gold.jpg
2iron.jpg
3train.jpg
4foo.jpg
5bar.jpg

…Etc. for hundreds of images.

Table 2 is a list of features (FeatureTbl) that can be applied to any image, its literally just a RowID and a Feature, IE

RowidFeature
1Mining
2Coal
3Rail Transport
4Iron Ore
5Road Transport
6Refined Product
7Gold Ore

etc. for about 50-odd features.   Every image that is uploaded to the system is then given AT LEAST one of these categories, but it can be as many as 9 or 10. 

This is where the fun starts….  So, maybe I’ve been given the wrong advice, but I’ve been told not to store strings, but to “normalise” the database, which means I’ve got a third table which has an image number and a feature number: (ImageFeaturesTbl)

ImageNo FeatureNo
11
13
15
21
27
35
etc100's of records

(This is how I was told to do this, I can probably change the system if I really need to)

Now they want to have a “search” function.  Gimme all images that feature “coal”.  Got the features list as a checklist on a form, check the feature you want, hit submit – voila! Easy – run a query on ImageFeaturesTbl  and Bob is your auntie.

Select imageNo from ImageFeaturesTbl where feature=#FORM.FeatureNo#

Works a treat!  Sits out all the images that  has the "Coal" featreNo.  But now, what if there are more than 2 categories? What if they want all images that has “Coal” or “gold” on it – that one I also figured out, the “OR” query.  I can do a loop and search for images on ImageFeaturesTbl that has either one or the other. So this is not the problem either.

The fun really starts with the “and” query – select images that has both “coal” and “rail transport”… and this is where I get stuck.  I’ve literally worked till 4:00 AM this morning – and it’s a weekend; and up again at 7 trying to solve this puzzle.  I’m miserable!

Basically I’m wanting to get from ImageFeaturesTbl a list of image numbers that contains ALL the selected features. It could be one feature (always at least one) but it could be Nine or 10.  But it has to have ALL the featurees.  So the system has to give me all the images that shows coal being transported on rail in summer at the harbour, for instance. 

The database and tables exist, but how the hell do I write that select query….

Thanks - I'll email the winning answer a crate of beer. 

TOPICS
Advanced techniques

Views

1.6K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 16, 2011 Jul 16, 2011

Copy link to clipboard

Copied

Well I was at a beer festival last night, so I'm perhaps not in the best position to offer perfect advice, but the promise of virtual beer is simply too appealing to pass up.

What you could do (and I'm pretty sure MySQL supports this) is an EXISTS clause. Something like this:

SELECT

  I.Rowid,

  I.Img

FROM

  ImageTbl I

WHERE EXISTS (

  SELECT 1

  FROM

    ImageFeaturesTbl IF

  WHERE

    IF.ImageNo = I.RowId

  AND

    IF.FeatureNo = <cfqueryparam cfsqltpe="cf_sql_numeric" value="#SearchFeature#" /> )

That'll pull you back any rows for your primary example there you want one feature. For multiple features, you can simply do a CFLOOP with as many AND EXISTS () clauses in there for the other feature IDs you want.

That way you will pull back only rows matching all the clauses you wanted.

Did that make sense? My vision's still a little fuzzy.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jul 16, 2011 Jul 16, 2011

Copy link to clipboard

Copied

I'm not a MySQL person, but can't you simply join the tables and use an IN predicate?

Something like:

select distinct photo

from ImageTbl

         inner join ImageFeatureTbl on ImageFeatureTbl.ImageNo = ImageTbl.RowID

where Feature in ('#form.FeatureNo#')

You would want to make sure you have indexes setup to properly handle the join as well

as do a bit of work to make sure your FeatureNo list variable is setup properly (you probably

don't want to use the form variable directly).

Good Luck

--

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 16, 2011 Jul 16, 2011

Copy link to clipboard

Copied

I'm not a MySQL person, but can't you simply join the tables and use an IN predicate?

Something like:

select distinct photo

from ImageTbl

         inner join ImageFeatureTbl on ImageFeatureTbl.ImageNo = ImageTbl.RowID

where Feature in ('#form.FeatureNo#')

That would get photos which had feature x OR feature y. What the OP is after is photos that have feature x AND feature y.

I'm racking my brains as to whether I think think of something better than Owain's beer-soaked approach, but haven't come up with anything yet.  Other than going "mmmmm.... beer..."

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 16, 2011 Jul 16, 2011

Copy link to clipboard

Copied

select somefields

from table1 t1 join table3 t3 on t1.rowID = t3.rowId

where FeatureNo in (select rowId from table2 where lower (feature) like '%coal%')

and FeatureNo in (select rowId from table2 where lower (feature) like '%gold%')

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 18, 2011 Jul 18, 2011

Copy link to clipboard

Copied

select somefields

from table1 t1 join table3 t3 on t1.rowID = t3.rowId

where FeatureNo in (select rowId from table2 where lower (feature) like '%coal%')

and FeatureNo in (select rowId from table2 where lower (feature) like '%gold%')

I think the OP implied they were having some kind of structured search, so there's no need to search the text fields, you can just use the criteria primary keys.

Secondly I'd always go for an EXISTS clause over an IN, as the latter has to evaluate the entire subquery and return its dataset before it can evaluate the outer query. The whole point of an EXISTS is that as soon as it finds a value, it exits the clause and doesn't do any further processing.

You'd need to do some testing, but I'd be willing to get an IN would be much less efficient.

This is my sober reply.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 18, 2011 Jul 18, 2011

Copy link to clipboard

Copied

"Exists" and "in" are logically different.  Basing your choice on efficiency could get you the wrong answer faster.

Our different interpretations of the OP are neither here nor there.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 18, 2011 Jul 18, 2011

Copy link to clipboard

Copied

I retract my earlier post about "in" and "exists" being logically different.  I also followed Owain's invitation to test the two approaches for speed.  Here are two logically equivalent queries using redbrick software.

select registration_number

from admit_fact af

where date = yesterday

and admission

and registration_number

in (select registration_number from visit_fact vf

where registration_date = yesterday);

** STATISTICS ** (500) Time = 00:00:00.09 cp time, 00:00:00.11 time, Logical IO count=928
** INFORMATION ** (256) 24 rows returned.

select registration_number

from admit_fact af

where date = yesterday

and admission

and exists

(select 1 from visit_fact vf

where registration_date = yesterday

and vf.registration_number = af.registration_number);

** STATISTICS ** (500) Time = 00:00:00.28 cp time, 00:00:00.29 time, Logical IO count=6112
** INFORMATION ** (256) 24 rows returned.

Using "in" is faster.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 18, 2011 Jul 18, 2011

Copy link to clipboard

Copied

Simply cannot argue with those figures! Just goes to show the importance of testing your queries before putting them into production, something I do more and more these days.

Out of interest, what platform was that on? And was that on a completely non-indexed table?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 18, 2011 Jul 18, 2011

Copy link to clipboard

Copied

Simply cannot argue with those figures!

Of course you can.  For one thing, it flies in the face of logic.  As you said: EXISTS resolves as soon as it matches, whereas IN needs to keep going to build the entire recordset.  So - all things being equal - it's going to be faster.  It's doing less work.

I would question Dan's testing.

How many rows were resolved by the IN subquery?  I wager "not many".  As the number of results the subquery returns decreases, the difference between the two will diminish.

How many test runs did Dan make?  Looks like "one each".  That's no test.  Who knows what else the CPU was up to at that moment?  The testing should be done over a number of iterations.  With each iteration being parameterised differently, so the DB is not just spewing a cached result out as fast as it can.

Was each test result based on the first run of that SQL?  Or the nth?  Because if it's the first, then the query compile time is going to have a fair impact on the results.

Are the appropriate relationships and indexes in place?

I don't see anything here that represents a reasonable test of the hypothesis, and the conclusions - like I said - fly in the face of common sense, so I question the experiment.

So, yeah, one can argue with those figures.  Easily.

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 18, 2011 Jul 18, 2011

Copy link to clipboard

Copied

In the first example, the subquery returned 24 rows.  Here it is again where the subquery returns 4.1 million rows:

select registration_number

from admit_fact af

where date = yesterday

and admission

and registration_number

in (select registration_number from visit_fact vf

);

** STATISTICS ** (500) Time = 00:00:51.06 cp time, 00:00:56.07 time, Logical IO count=136900
** INFORMATION ** (256) 24 rows returned.

select registration_number

from admit_fact af

where date = yesterday

and admission

and exists

(select 1 from visit_fact vf

where vf.registration_number = af.registration_number

);

** STATISTICS ** (500) Time = 00:00:00.34 cp time, 00:00:00.43 time, Logical IO count=6380
** INFORMATION ** (256) 24 rows returned.

In this case "exists" is faster.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 18, 2011 Jul 18, 2011

Copy link to clipboard

Copied

Well as I say it demonstrates the need for testing if nothing else! I've done some testing with my own data here, unfortunately (?) Oracle's cost-based-optimiser is pretty clever and often notices you doing something stupid and changes the way it'll execute anyway so there's very little difference showing up between the two for me.

I don't have any MySQL databases to try with.

Still, passed some time

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 18, 2011 Jul 18, 2011

Copy link to clipboard

Copied

As stated, the database is redbrick (now an IBM product).  Three indexed fields were used, admit_fact.date, admit_fact.event_code (not visible in the query), and visit_fact.registration_date.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 19, 2011 Jul 19, 2011

Copy link to clipboard

Copied

Shoo guys, you really have got me thinking here…  the joins do not work – but I’m not that familiar with the joins… Adam Cameron is right, I want an AND not an OR – the or I’ve got sorted.

So far, Owain North’s 1st answer makes the most sense, but I can’t get it to work - something about incorrect syntax. 

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IF WHERE ImageNo = I.RowId AND IF.featureno = 3 )’

Owain – help, please…

Owain is also correct that I need a Rowid, not a text search. 

The problem I have with the “like” query is that it picks up 5, 15 and 25 when I only want 5. 

This thing is going to turn my remaining hairs grey….  I do however appreciate all your help, and this just showed me I know a lot less about SQL queries than I thought I did…

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Jul 19, 2011 Jul 19, 2011

Copy link to clipboard

Copied

syntax to use near 'IF WHERE ImageNo = I.RowId AND IF.featureno = 3 )’

I have barely skimmed the thread. But if that is not a typo, your problem is almost certainly using IF as a table alias (usually a key word).  Try using an abbreviation that is less likely to be a reserved word.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 19, 2011 Jul 19, 2011

Copy link to clipboard

Copied

As pointed out, it looks like you're using IF as a table alias, which is a reserved word. Seeing as you have a table called ImageFeatures, I'd hazard a guess that's exactly what you're doing

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 20, 2011 Jul 20, 2011

Copy link to clipboard

Copied

LATEST

I have found a solution.  Its not elegant, its not pretty, but it fulfils the most important criteria: happy client.

  1. This is what I did:
  2. Have the search criteria (index numbers) as a list.
  3. Run a query looking for all images with the FIRST criteria, output result set as list.
  4. Run a loop from SECOND criteria to “n” containing a second query looking for images in the list and output values to new list.

Ta-dah!

So I have two queries, one to just get an initial result set, and one looped.

Yes its ugly, but it works!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation