16 Replies Latest reply on Jul 19, 2011 11:54 PM by Owain North

    How do I select single entry based on mutiple values?

    Jellyhead

      Okay – here’s one for the guys smarter than me – IE: everyone.  An interesting query.

      (CF8, MySQL5)

      I’m building an internal stock library for a large firm.  Easy job – and its always the easy ones that knock you, ain’t it? 

      Basically, its very very simple, I have three tables.

      Table 1 is a list of images (ImageTbl) – this has a lot of data on it, but the only important thing from ImageTbl is the unique identifier, RowID.  It looks something like this

      RowidPhoto
      1Gold.jpg
      2iron.jpg
      3train.jpg
      4foo.jpg
      5bar.jpg

      …Etc. for hundreds of images.

      Table 2 is a list of features (FeatureTbl) that can be applied to any image, its literally just a RowID and a Feature, IE

      RowidFeature
      1Mining
      2Coal
      3Rail Transport
      4Iron Ore
      5Road Transport
      6Refined Product
      7Gold Ore

      etc. for about 50-odd features.   Every image that is uploaded to the system is then given AT LEAST one of these categories, but it can be as many as 9 or 10. 

      This is where the fun starts….  So, maybe I’ve been given the wrong advice, but I’ve been told not to store strings, but to “normalise” the database, which means I’ve got a third table which has an image number and a feature number: (ImageFeaturesTbl)

      ImageNo FeatureNo
      11
      13
      15
      21
      27
      35
      etc100's of records

      (This is how I was told to do this, I can probably change the system if I really need to)

      Now they want to have a “search” function.  Gimme all images that feature “coal”.  Got the features list as a checklist on a form, check the feature you want, hit submit – voila! Easy – run a query on ImageFeaturesTbl  and Bob is your auntie.

      Select imageNo from ImageFeaturesTbl where feature=#FORM.FeatureNo#

      Works a treat!  Sits out all the images that  has the "Coal" featreNo.  But now, what if there are more than 2 categories? What if they want all images that has “Coal” or “gold” on it – that one I also figured out, the “OR” query.  I can do a loop and search for images on ImageFeaturesTbl that has either one or the other. So this is not the problem either.

      The fun really starts with the “and” query – select images that has both “coal” and “rail transport”… and this is where I get stuck.  I’ve literally worked till 4:00 AM this morning – and it’s a weekend; and up again at 7 trying to solve this puzzle.  I’m miserable!

      Basically I’m wanting to get from ImageFeaturesTbl a list of image numbers that contains ALL the selected features. It could be one feature (always at least one) but it could be Nine or 10.  But it has to have ALL the featurees.  So the system has to give me all the images that shows coal being transported on rail in summer at the harbour, for instance. 

      The database and tables exist, but how the hell do I write that select query….

      Thanks - I'll email the winning answer a crate of beer. 

        • 1. Re: How do I select single entry based on mutiple values?
          Owain North Level 4

          Well I was at a beer festival last night, so I'm perhaps not in the best position to offer perfect advice, but the promise of virtual beer is simply too appealing to pass up.

           

          What you could do (and I'm pretty sure MySQL supports this) is an EXISTS clause. Something like this:

           

          SELECT

            I.Rowid,

            I.Img

          FROM

            ImageTbl I

          WHERE EXISTS (

            SELECT 1

            FROM

              ImageFeaturesTbl IF

            WHERE

              IF.ImageNo = I.RowId

            AND

              IF.FeatureNo = <cfqueryparam cfsqltpe="cf_sql_numeric" value="#SearchFeature#" /> )

           

          That'll pull you back any rows for your primary example there you want one feature. For multiple features, you can simply do a CFLOOP with as many AND EXISTS () clauses in there for the other feature IDs you want.

           

          That way you will pull back only rows matching all the clauses you wanted.

           

          Did that make sense? My vision's still a little fuzzy.

          • 2. Re: How do I select single entry based on mutiple values?
            editcorp Level 1

            I'm not a MySQL person, but can't you simply join the tables and use an IN predicate?

            Something like:

             

            select distinct photo

            from ImageTbl

                     inner join ImageFeatureTbl on ImageFeatureTbl.ImageNo = ImageTbl.RowID

            where Feature in ('#form.FeatureNo#')

             

            You would want to make sure you have indexes setup to properly handle the join as well

            as do a bit of work to make sure your FeatureNo list variable is setup properly (you probably

            don't want to use the form variable directly).

             

            Good Luck

            --

            • 3. Re: How do I select single entry based on mutiple values?
              Adam Cameron. Level 5

              I'm not a MySQL person, but can't you simply join the tables and use an IN predicate?

              Something like:

               

              select distinct photo

              from ImageTbl

                       inner join ImageFeatureTbl on ImageFeatureTbl.ImageNo = ImageTbl.RowID

              where Feature in ('#form.FeatureNo#')

               

              That would get photos which had feature x OR feature y. What the OP is after is photos that have feature x AND feature y.

               

              I'm racking my brains as to whether I think think of something better than Owain's beer-soaked approach, but haven't come up with anything yet.  Other than going "mmmmm.... beer..."

               

              --

              Adam

              • 4. Re: How do I select single entry based on mutiple values?
                Dan Bracuk Level 5

                select somefields

                from table1 t1 join table3 t3 on t1.rowID = t3.rowId

                where FeatureNo in (select rowId from table2 where lower (feature) like '%coal%')

                and FeatureNo in (select rowId from table2 where lower (feature) like '%gold%')

                • 5. Re: How do I select single entry based on mutiple values?
                  Owain North Level 4

                  select somefields

                  from table1 t1 join table3 t3 on t1.rowID = t3.rowId

                  where FeatureNo in (select rowId from table2 where lower (feature) like '%coal%')

                  and FeatureNo in (select rowId from table2 where lower (feature) like '%gold%')

                   

                  I think the OP implied they were having some kind of structured search, so there's no need to search the text fields, you can just use the criteria primary keys.

                   

                  Secondly I'd always go for an EXISTS clause over an IN, as the latter has to evaluate the entire subquery and return its dataset before it can evaluate the outer query. The whole point of an EXISTS is that as soon as it finds a value, it exits the clause and doesn't do any further processing.

                   

                  You'd need to do some testing, but I'd be willing to get an IN would be much less efficient.

                   

                  This is my sober reply.

                  • 6. Re: How do I select single entry based on mutiple values?
                    Dan Bracuk Level 5

                    "Exists" and "in" are logically different.  Basing your choice on efficiency could get you the wrong answer faster.

                     

                    Our different interpretations of the OP are neither here nor there.

                    • 7. Re: How do I select single entry based on mutiple values?
                      Dan Bracuk Level 5

                      I retract my earlier post about "in" and "exists" being logically different.  I also followed Owain's invitation to test the two approaches for speed.  Here are two logically equivalent queries using redbrick software.

                       

                      select registration_number

                      from admit_fact af

                      where date = yesterday

                      and admission

                      and registration_number

                      in (select registration_number from visit_fact vf

                      where registration_date = yesterday);

                      ** STATISTICS ** (500) Time = 00:00:00.09 cp time, 00:00:00.11 time, Logical IO count=928
                      ** INFORMATION ** (256) 24 rows returned.

                       

                      select registration_number

                      from admit_fact af

                      where date = yesterday

                      and admission

                      and exists

                      (select 1 from visit_fact vf

                      where registration_date = yesterday

                      and vf.registration_number = af.registration_number);

                      ** STATISTICS ** (500) Time = 00:00:00.28 cp time, 00:00:00.29 time, Logical IO count=6112
                      ** INFORMATION ** (256) 24 rows returned.

                       

                      Using "in" is faster.

                      • 8. Re: How do I select single entry based on mutiple values?
                        Owain North Level 4

                        Simply cannot argue with those figures! Just goes to show the importance of testing your queries before putting them into production, something I do more and more these days.

                         

                        Out of interest, what platform was that on? And was that on a completely non-indexed table?

                        • 9. Re: How do I select single entry based on mutiple values?
                          Adam Cameron. Level 5

                          Simply cannot argue with those figures!

                           

                          Of course you can.  For one thing, it flies in the face of logic.  As you said: EXISTS resolves as soon as it matches, whereas IN needs to keep going to build the entire recordset.  So - all things being equal - it's going to be faster.  It's doing less work.

                           

                          I would question Dan's testing.

                           

                          How many rows were resolved by the IN subquery?  I wager "not many".  As the number of results the subquery returns decreases, the difference between the two will diminish.

                           

                          How many test runs did Dan make?  Looks like "one each".  That's no test.  Who knows what else the CPU was up to at that moment?  The testing should be done over a number of iterations.  With each iteration being parameterised differently, so the DB is not just spewing a cached result out as fast as it can.

                           

                          Was each test result based on the first run of that SQL?  Or the nth?  Because if it's the first, then the query compile time is going to have a fair impact on the results.

                           

                          Are the appropriate relationships and indexes in place?

                           

                          I don't see anything here that represents a reasonable test of the hypothesis, and the conclusions - like I said - fly in the face of common sense, so I question the experiment.

                           

                          So, yeah, one can argue with those figures.  Easily.

                           

                          --

                          Adam

                          • 10. Re: How do I select single entry based on mutiple values?
                            Dan Bracuk Level 5

                            As stated, the database is redbrick (now an IBM product).  Three indexed fields were used, admit_fact.date, admit_fact.event_code (not visible in the query), and visit_fact.registration_date.

                            • 11. Re: How do I select single entry based on mutiple values?
                              Dan Bracuk Level 5

                              In the first example, the subquery returned 24 rows.  Here it is again where the subquery returns 4.1 million rows:

                               

                              select registration_number

                              from admit_fact af

                              where date = yesterday

                              and admission

                              and registration_number

                              in (select registration_number from visit_fact vf

                              );

                              ** STATISTICS ** (500) Time = 00:00:51.06 cp time, 00:00:56.07 time, Logical IO count=136900
                              ** INFORMATION ** (256) 24 rows returned.

                              select registration_number

                              from admit_fact af

                              where date = yesterday

                              and admission

                              and exists

                              (select 1 from visit_fact vf

                              where vf.registration_number = af.registration_number

                              );

                              ** STATISTICS ** (500) Time = 00:00:00.34 cp time, 00:00:00.43 time, Logical IO count=6380
                              ** INFORMATION ** (256) 24 rows returned.

                               

                              In this case "exists" is faster.

                              • 12. Re: How do I select single entry based on mutiple values?
                                Owain North Level 4

                                Well as I say it demonstrates the need for testing if nothing else! I've done some testing with my own data here, unfortunately (?) Oracle's cost-based-optimiser is pretty clever and often notices you doing something stupid and changes the way it'll execute anyway so there's very little difference showing up between the two for me.

                                 

                                I don't have any MySQL databases to try with.

                                 

                                Still, passed some time

                                • 13. Re: How do I select single entry based on mutiple values?
                                  Jellyhead Level 1

                                  Shoo guys, you really have got me thinking here…  the joins do not work – but I’m not that familiar with the joins… Adam Cameron is right, I want an AND not an OR – the or I’ve got sorted.

                                   

                                  So far, Owain North’s 1st answer makes the most sense, but I can’t get it to work - something about incorrect syntax. 

                                  You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IF WHERE ImageNo = I.RowId AND IF.featureno = 3 )’

                                  Owain – help, please…

                                   

                                  Owain is also correct that I need a Rowid, not a text search. 

                                   

                                  The problem I have with the “like” query is that it picks up 5, 15 and 25 when I only want 5. 

                                   

                                  This thing is going to turn my remaining hairs grey….  I do however appreciate all your help, and this just showed me I know a lot less about SQL queries than I thought I did…

                                  • 14. Re: How do I select single entry based on mutiple values?
                                    -==cfSearching==- Level 4
                                    syntax to use near 'IF WHERE ImageNo = I.RowId AND IF.featureno = 3 )’

                                     

                                    I have barely skimmed the thread. But if that is not a typo, your problem is almost certainly using IF as a table alias (usually a key word).  Try using an abbreviation that is less likely to be a reserved word.

                                    • 15. Re: How do I select single entry based on mutiple values?
                                      Owain North Level 4

                                      As pointed out, it looks like you're using IF as a table alias, which is a reserved word. Seeing as you have a table called ImageFeatures, I'd hazard a guess that's exactly what you're doing

                                      • 16. Re: How do I select single entry based on mutiple values?
                                        Jellyhead Level 1

                                        I have found a solution.  Its not elegant, its not pretty, but it fulfils the most important criteria: happy client.

                                        1. This is what I did:
                                        2. Have the search criteria (index numbers) as a list.
                                        3. Run a query looking for all images with the FIRST criteria, output result set as list.
                                        4. Run a loop from SECOND criteria to “n” containing a second query looking for images in the list and output values to new list.

                                        Ta-dah!

                                        So I have two queries, one to just get an initial result set, and one looped.

                                        Yes its ugly, but it works!