7 Replies Latest reply on May 8, 2007 10:13 AM by coastercam

    How can I achieve this with less code or faster?

    silstorm
      Hi Folks,

      I have 2 database tables, 'products' and 'products_staging'. Every day, the 'products_staging' table is updated with a fresh batch of products (which are imported from xml files in a seperate process using Navicat software). I need to somehow compare this database with the 'products' database to look for specific changes - i.e changes in price, deleted products, or new products. As some of the fields in the 'products' database are manually altered, such as the product name, I cannot do a straight sync with them. Therefore, I use the only field that will remain unchanged as the comparison field.

      The somewhat crude method I have come up with is to loop through the 'products_staging' table, looking for that specific field against the 'products' table. If it finds it, it then checks for a change in the price field (i.e has the price of that product changed). If there is a change, the 'products' table is updated. Likewise, if it's not found, it's treated as a new product and inserted as such and so on. At the end of the loop, the record is deleted from the 'products_staging' table - therefore at the end of the process, the table is empty ready for the next import the following day.

      At the moment, the database handles around 300,000 rows - but at some point I'm likely to be dealing with many more than that. It's working as it is, but taking a considerable time to do so - around 4 hours to go through the lot.

      Any suggestions on a better way of accomplishing this? My existing crude code is pasted below (the use of maxrows and the redirect is to prevent time-outs - parse2.cfm is virtually identical to this);

        • 1. Re: How can I achieve this with less code or faster?
          cf_dev2 Level 1
          silstorm,

          I only skimmed your code, so I may be wrong. But couldn't you do this with a few sql statements? This is untested and obviously simplified but something like this (MS SQL syntax)

          • 2. Re: How can I achieve this with less code or faster?
            Level 7
            Firstly, if possible, get the DBs to talk to each other without involving
            CF. That will just be slowing things down. CF's for generating HTML
            pages, and whilst it CAN interact with databases, it's not the best tool
            for the job.

            If that's not possible, use CF as a bridge between them, but only to bulk
            insert all the staging data into a temporary table (either literally a
            temporary table, or a permanent one used for holding data temporarily ;-)
            on the production DB, and then use a DB procedure to process all the data,
            again using only the DB server to do so.

            Lastly, if you MUST use CF to do the data manipulation, then minimise your
            hits to the DB. Instead of doing single hits to the production DB in your
            prodconnect query, get ALL the data you're likely to need for updates in
            one hit, then use CF to generate a payload for a bulk update; similarly any
            missing data from production can be inferred from that the difference
            between the staging and update data, and that can be used as your source of
            payload for a bulk insert.

            As a rule of thumb, if I find myself needing to perform single-row queries
            within a loop of another query, I start thinking I'm doing something wrong.

            Oh, yeah: when dealing with large amounts of queries like that, ALWAYS use
            <cfqueryparam> tags instead of static SQL strings. Every non-parameterised
            query you pass to the DB engine will need to be compiled first, and then
            it's also cached. Which slows you down, and eats memory on the DB server.

            This will leave the problem that you're not going to be generating your
            report as you go. However I imagine that's a "nice to have" compared to
            ensuring the thing actually works. You can generate your report
            separately, after the data processing has been done. I'm fairly certain
            you would be able to coerce a log out of the DB server, somehow, anyway.

            --
            Adam
            • 3. Re: How can I achieve this with less code or faster?
              cf_dev2 Level 1
              >Firstly, if possible, get the DBs to talk to each other without involving
              >CF. That will just be slowing things down. CF's for generating HTML
              >pages, and whilst it CAN interact with databases, it's not the best tool
              >for the job.

              I was focused on eliminating the looping first. But I agree completely.

              The poster mentioned that the data was imported daily. Both of the cfquery's appear to use the same datasource. So I assumed (perhaps falsely) that the the tables were either in the same database or that databases could already communicate with each other.

              >You can generate your report separately, after the data processing has
              > been done. I'm fairly certain you would be able to coerce a log out of the
              > DB server, somehow, anyway.

              Agreed. As for the rest of your comments...

              "Yeah. What he said" ;-)

              • 4. Re: How can I achieve this with less code or faster?
                silstorm Level 1
                Thanks to both of you for your replies. Naturally, if I can take CF out of the equation it would be the best route, so I will experiment with direct database queries (new ground for me!)

                It's almost 1am here now and much beer has been consumed, so I will look into what you have both said tomorrow and update accordingly. I wanted to thank you both for your input before I headed off for the night.
                • 5. Re: How can I achieve this with less code or faster?
                  BKBK Adobe Community Professional & MVP
                  much beer has been consumed
                  Tut tut tut. Doesn't fire the Coldfusion neurons as well as coffee.



                  • 6. Re: How can I achieve this with less code or faster?
                    Level 7
                    > much beer has been consumed
                    > Tut tut tut. Doesn't fire the Coldfusion neurons as well as coffee.

                    ;-)

                    I'm not sure that @ 1am on a Saturday night / Sunday morning the CF neurons
                    are *supposed* to be firing! Beer sounds like a much more appropriate
                    notion.

                    --
                    Adam
                    • 7. Re: How can I achieve this with less code or faster?
                      coastercam
                      silstorm

                      I would take CF out of the process completely. Below is some TRANSACT_SQL that can be run on the database to do this job.
                      It will look for change in description, change in price, Added records to product table that are new to staging table, deleted records from the products table that are missing from the staging table, and clear the staging table.

                      It requires a table to archive the deleted products rows.
                      Also, the productID cannot be changed (Primary Key)

                      You should later add error trapping with COMMIT TRANSACTION on no errors.

                      Hope this helps. MUCH FASTER!!

                      CREATE PROCEDURE [dbo].[testStaging] AS

                      --PRODUCT_STAGE DESC CHANGE
                      UPDATE product
                      set .product.ProdDesc =product_stage.ProdDesc
                      from product inner join product_stage on product.prodID = product_stage.prodID
                      where product.prodDesc <> product_stage.prodDesc

                      --PRODUCT_STAGE PRICE CHANGE
                      UPDATE product
                      set product.ProdPrice = product_stage.ProdPrice
                      from product inner join .product_stage on product.prodID = product_stage.prodID
                      where product.ProdPrice <> product_stage.ProdPrice

                      --NEW RECORDS IN PRODUCT_STAGE
                      INSERT into product
                      select ps.prodID, ps.ProdDesc, ps.prodPrice
                      from product_Stage ps where ps.prodID not IN(select prodID from Product)



                      --MISSING RECORDS IN PRODUCT_STAGE
                      --MOVE DELETE RECORD TO ARCHIVE TABLE
                      INSERT into product_delArc
                      select prodID, ProdDesc, prodPrice, USER, GETDATE()
                      from product where prodID not IN(select prodID from Product_Stage)

                      --DELETE RECORD FROM PRODUCT TABLE
                      DELETE
                      from product where prodID not IN(select prodID from Product_Stage)


                      --CLEAR PRODUCT_STAGE
                      delete product_stage from
                      product_stage ps inner join product p on p.prodID = ps.prodID
                      GO