8 Replies Latest reply on Jul 14, 2011 1:07 PM by insuractive

    comparing different data source tables

    wam4 Level 1

      I need to find records in one table that are not in another but the other table is in a different dsn. I've tried this so many ways, as a query of a query, referencing an array, as a loop and I'm still not getting the correct results.  I can't do a join that I'm aware of as these are different datasources. The best bet I thought would be a query of a query or a loop.  This is a backup table so the column names are the same. I've tried using #quotedvaluelist( )# in the query as well  but there are over 4,000 records and it's timing out.

       

      For my loop attempt the code looked like this but it ignores the id that is not in the other table. I was hoping the cfif statement would pull out those that did not match but it seems to skip these.  Any suggestions would be appreciated. Thanks Wendy

       

      <cfquery datasource="#dsn#" name="q1">

                select ID from table1

                 WHERE  (STATUS = 'A' or
            STATUS = 'L' or
            STATUS = 'P' or
            STATUS = 'S')
           ORDER BY convert(numeric,ID)

       

      </cfquery>

       

      <cfloop query="q1">

           <cfquery datasource="#dsn2#" name="q2">

                select ID, STATUS
              from table2      

           WHERE ID = '#trim(q1ID)#'

            </cfquery>
          <cfif #q2.recordcount# eq "0">
        <cfoutput>#q2.ID#  #q2.STATUS#</cfoutput>
          </cfif>

       

      </cfloop>

        • 1. Re: comparing different data source tables
          Adam Cameron. Level 5

          Just because they're different DSNs as far as CF's concerned doesn't mean the DBs quite possibly still can be configured to talk to each other directly.

           

          If poss, always do your data manipulation on the DB, not via CF.

           

          --

          Adam

          • 2. Re: comparing different data source tables
            wam4 Level 1

            I don't have access to the other dsn. We pull a nightly dump of that and I'm not allowed to make changes to their database. So I need to

            come up with a query that finds where our table might be off from there's. We have some old data that needs to be inactivated whereas they deleted their records. I need a query that will locate our records that are not in their table.

            • 3. Re: comparing different data source tables
              Adam Cameron. Level 5

              I'm confused.


              You say you don't have access to the "other" data - fair enough - but you also say you get a dump of it each day.  So you do have access to that, yes?  So I'm guessing this dump is a DB back-up which you're restoring, given you're accessing it with CFQUERY?

               

              Can you clarify what the DB infrastructure is here?  And what you do and do not have access to, and what access you have (eg: root or read-only, etc).

               

              And how much data are we talking?  You say you're having problems with 4000+ records, but that's just a WHERE IN limitation I'm guessing, and not related to how many records you actually need to work with.  That's not a great approach anyhow.

               

              You might look at how to bulk load data, and rather than trying to filter on long lists of values, bulk load the key values into a table, and then just do a join that way.  But you're not giving us enough info to go to really answer this.

               

              --

              Adam

              • 4. Re: comparing different data source tables
                wam4 Level 1

                I have a table called "people" on one server (not my server but I can query records from it). I have another table on another server (my server) in a different datasource (which is our back up table) called "people" as well.  I have some records that are in my table which need to be inactivated. But these records are not in the other site's server. Someone must have deleted them a while back.

                 

                I can run a query to pull all the active records in the other site's "people" table. I do not have the ability to delete, update or insert, anything in to this table just query.

                 

                My first query works fine. I need to be able to find the records that are active in my table but are missing in their table.

                 

                For instance table1 on their server contains Bob, Thom, Jill. Table1 on my server contains Bob, Nancy, Thom, Jill. Without manually reviewing 5000 records in each table, I need a query that finds "Nancy" in my table so I can deactivate that record in my table.  Does this make more sense?  Thanks for your help.

                • 5. Re: comparing different data source tables
                  Adam Cameron. Level 5

                  I have a table called "people" on one server (not my server but I can query records from it).

                   

                  And you cannot make this connection via your local DB?  It needs to be done via CF?  Have you actually investigated whether you're able to get read only access to this other DB straight from your own DB?  If "they" will let you connect via JDBC, they should be OK to make a DB to DB connection?  Even if it's a different DB platform, it'll generally be possible to get them to talk to one another.

                   

                  For instance table1 on their server contains Bob, Thom, Jill. Table1 on my server contains Bob, Nancy, Thom, Jill. Without manually reviewing 5000 records in each table, I need a query that finds "Nancy" in my table so I can deactivate that record in my table.  Does this make more sense?  Thanks for your help.

                   

                  So it's about 5000 records you need to process then?

                   

                  Can you bulk-upload the IDs to the DB?  Most DBs will allow the bulk uploading of a text file representing data.  Then you could load that into a temp table and then use a join to that in our UPDATE query.  I probably wouldn't bother with that with so few as 5000 records though.  But IANADBO, so that's not a particularly well-informed position to take.  My yardstick is just that it's some horsing around that is probably more complicated than it would need to be.

                   

                  You should be able to do that lot with a coupla UPDATE whatever SET whatever WHERE ID IN (listOfParams).  It's ugly but it'll work.  However given you mention quotedValueList() earlier, it sounds like you're hard-coding your IDs in the SQL statement.  Don't hard-code dynamic values.  Use parameters.

                   

                    What DB system(s) are you using, btw?  This is generally fairly important info when asking DB questions ;-)

                   

                  I'm surprised with only 5000-odd records a simple loop over your first query and an UPDATE query for each ID in the first query doesn't just work.

                   

                  --

                  Adam

                  • 6. Re: comparing different data source tables
                    Dan Bracuk Level 5

                    Everything Adam said about doing this at the db level is correct.  However, if that option is simply not available, there is a more efficient way of doing it than the one in your OP.

                     

                    First, instead of looping through the first query, just run q2 once.  The gist of what you want to do is:

                     

                    select id, status

                    from table 2

                    where id not in (the list from q1)

                     

                    However, 4000 records is probably too long a list.   Go to cflib.org and find a function called ListSplit.  It converts your long list into an array of short lists.  Then your query would be something like this

                     

                    select id, status

                    from table 2

                    where 1=2

                    <cfloop through your array>

                    or id not in (the list in each array element)

                    </cfloop>

                     

                    Another way, which might be quicker, would be something like this.

                     

                    <cfloop through your array>

                    select id, status

                    from table 2

                    where id not in (the list from each array element)

                    </cfloop>

                     

                    Then run a union Q of Q.  This is counterintuitive, because you are making multiple trips to the db, but it really might be quicker.  Try both methods.

                    • 7. Re: comparing different data source tables
                      Darwan Leonardo Sitepu

                      If you want to search or compare the data from the query q1 that is not contained in thequery q2, you can use this

                       

                      <cfoutput>

                      <cfquery datasource="#dsn#" name="q1">

                                select ID from table1

                                 WHERE  (STATUS = 'A' or

                            STATUS = 'L' or

                            STATUS = 'P' or

                            STATUS = 'S')

                           ORDER BY convert(numeric,ID)

                      </cfquery>

                      <cfquery datasource="#dsn2#" name="q2">

                        select ID, STATUS

                      from table2     

                      WHERE NOT ID IN (#QuotedValueList(q1.ID,",")#)

                      </cfquery>

                      <cfdump var="#q2#">

                      </cfoutput>

                      • 8. Re: comparing different data source tables
                        insuractive Level 3

                        Depending on the size of your target data table, certain SQL DB's hit a wall when trying to process long lists of text in a WHERE IN statement.  As long as you control YOUR database sever, you should be able to connect to the remote server to perform your selects from your database.  After all, all the same connection details that you used to set up your database in CF can be used to set up a remote connection to the database on your sql server.

                         

                        If that doesn't work for some reason, what about something like this:

                         

                        1) Select * from remote table and drop it into a temp table on your DB (or may a not-so-temp table if this is something that you're going to do a lot - you may want to have indexes in place)

                        2) Write a stored proc on your DB that looks for duplicates and does the heavy lifting

                        3) Drop your temp table

                        4) Done