4 Replies Latest reply on Feb 16, 2018 11:21 AM by seasonedweb

    Importing GEDCOM DB Files to SQL Server

    seasonedweb Level 1



      I successfully imported 2500+ records (individuals) from the Ancestry.com site in gedcom format to SQL server using "GEDMagic".


      Unfortunately, the individual/spouse connection is not imported.


      However, they were imported into SQL Server in PK pairs- John Smith is PK ID 200 & his spouse, Mary is PK ID 201.


      Each pair is one PK number + or -


      I am trying to avoid physically entering 2500 corresponding spouse ID's- I only have so much time left on earth.


      My question is- has anyone had experience dealing with importing GEDCOM db files to SQL, and/or at least advise a strategy to select the individual and his/her spouse easily with a query?


      I tried


      <cfquery name="get_ID" datasource="#Request.BaseDSN#">

        SELECT ged_ID

        FROM ged_main



      <CFSET plusone =#get_ID.ged_ID# + 1>

        <CFSET minusone =#get_ID.ged_ID# - 1>



      <cfquery name="find_spouse" datasource="#Request.BaseDSN#">

      SELECT *

      FROM ged_main

      WHERE ged_ID = #plusone#

      OR ged_ID = #minusone#


      ORDER BY ged_lastname ASC</cfquery>


      No luck with this-


      Thanks in advance



        • 1. Re: Importing GEDCOM DB Files to SQL Server
          seasonedweb Level 1

          Sorry- forgot to add each individual record has the PK ID and the spouse_ID columns.




          John Smith ID 200, Spouse ID 201


          Mary (Jones) Smith ID 201 Spouse ID 200

          • 2. Re: Importing GEDCOM DB Files to SQL Server
            charlie@carehart.org Adobe Community Professional & MVP

            N.B., here's good news. I’m pretty sure I can both explain why this “didn’t work” and how to fix it.


            While I know nothing about the gedcom data, there are a couple of things just not right about your approach (which have nothing to do with the specifics of the particular data, really).


            And I’d also like to show you an even better way to retrieve the records, which once it makes sense should make it a LOT easier for you to do the kind of manipulation of this data that I suspect you will want. :-)


            Let’s take things one step at a time.


            1) When you say “no luck with this”, do you mean you got an error? Or that it “worked” but you did not see the result you expect? It would help to be more clear when asking here (because people might presume incorrectly and answer with something that does not apply to you. Even I am risking that.)


            2) But I will assume there was NO error, and it may be that you are just not seeing any result. That could be because you are not doing anything with the result of the second query. So how would you know then if it worked or not?


            But perhaps you really did do something to output the "result", and chose not to show us. It would be helpful for us, though, if nothing else to see what you intended to output from the queries, especially for those of us who don’t know that data.


            And if you did have something outputting the results of the second query, you may not like the result.


            3) Well, for one thing I suspect you would see only one record (if you did output the results), and perhaps THAT'S why you feel things did not work.


            The problem there is that while your first query does get all the records, your code after that (the cfset’s and the cfquery) are going to execute only ONCE (and so those CFSET values will get only the LAST record of the first query result).


            It would seem you meant to do that pair of CFSETs and the second cfquery as an operation to be performed logically on EACH result from the first query. Is that right? If so, then to do that you would want to wrap all that (the CFSETs and the 2nd CFQUERY) in a CFOUTPUT QUERY=”get_id” that would loop over the first result (or technically, if you were not outputting anything, you could do CFLOOP QUERY=”get_id”, but you will be outputting something, I’m sure).


            And that may be your main issue, in which case you may be tempted to stop. But if you have time, do read on.


            4) Because really, whenever you do a "cfquery in a loop over results from some previous query", that’s nearly always a clue that there’s likely a better solution, in terms of doing the work better in SQL (and in the database), by doing a SQL "JOIN" instead.


            What does that mean? Well, before showing it for YOUR case (of a single table), it might make more sense to talk about a more traditional setup of 2 tables.


            Let's say you were for instance taking the productid from a products table and wanted to find all the orders for that in the orders table, where it too had a productid. (As you are indicating yourself, the productid in products would be its “pk”, its primary key, and the productid in orders would be called a “foreign key”.) The point is that the query would want to find any orders in the orders table where the productid was the same as a productid under consideration in the products table. Let’s say there was an orderdate in orders you wanted to see.


            And for that, one could do:


            select p.productid, p.productname, o.orderdate

            from products p inner join orders o

            on p.productid=o.productid

            order by p.productname


            And out would come a list of each product by name and any orderdates for that. (If a product had multiple orders, the productname would be listed repeatedly on each line with the different orderdates. There’s a feature in CFML to group such output, in this case by productname, using a GROUP=”productname” on the CFOUTPUT QUERY tag, and then to use a nested CFOUTPUT (no attributes) within that loop to then get the "order" data for each. But that goes beyond the scope of this reply for you. You can look it up for more info.)


            Before moving on, notice the use of “p” as an alias for products in front of the column names, and as created on the “from products p”, which could be written in most DBs also as “products AS p”.  I did the same for orders (“o”), and the reason this is needed is that since the productid is in both tables, the DB needs to know “which” you need.


            5) The problem in your case is there is no second table, so you may have even known about the above but didn’t see how to use it for your single table. But what you want here is sometimes called a self-join, as we are joining the table to itself. What we’re joining then is the different records which again share some common key/id. You said that the PK is ged_id, and your second note said that if someone has a spouse then that spouse’s spouse_id holds the ged_id value of their spouse.


            As such, we just need to join the one table on these two keys. Before showing that, let’s say also that you would then want to output the first and last name of each person in the table and also those of their spouse (they could have different names, potentially). Again, knowing only what I know of the gedcom data from your note here, I’m guessing something like this would work:


            select p.ged_firstname, p.ged_lastname, s.ged_firstname, s.ged_lastname

            from ged_main p inner join ged_main s

            on p.ged_id=s.spouse_id

            order by p.get_lastname


            Note that it’s still an inner join (not a “self join” syntax, per se, though perhaps some DBs have one).


            And note especially how the same ged_main table is referenced, but first as a table alias “p” (for “person”, conceptually) and then as a table “s” (for “spouse”). Again, yes it’s the same table, but looked at in two ways. And thus the columns obtained are logically “from each table” (or technically, from the different records for each person, joined when they do have a spouse.)


            Hope that makes sense. You could then just output those columns, and see each person and their spouse. (What if they don't have a spouse? They won't show up in the list at all. More on that in a moment.)


            6) Before trying this on your own, you may want to consider limiting the rows in some way, so you don’t create a huge query and output (especially if you make a mistake and somehow create what’s called a cartesian product).


            There are many ways to limit the result of a query. Perhaps the easiest to start with would be adding a WHERE clause to limit what you get, such as to get only people whose last names start with “a”:


            where p.ged_lastname like ‘a%’


            Or you may know you that in some DBs can use a TOP keyword, to get only the first “x” results, like:


            select top 10 p.ged_firstname, p.ged_lastname, s.ged_firstname, s.ged_lastname

            from ged_main p inner join ged_main s

            on p.ged_id=s.spouse_id

            order by p.get_lastname


            And CF also has a maxrows attribute for both the cfquery and cfoutput query tags, each with a slightly different purpose though they achieve basically the same result.


            7) Finally, I hinted above how this code will ONLY find people who DO have spouses, because it’s an INNER join (and if you don’t say INNER, that’s what most DBs presume to do).


            If you really wanted to find all people WHETHER THEY HAVE A SPOUSE OR NOT, then you’d need to change to using an OUTER JOIN instead, as in:


            select p.ged_firstname, p.ged_lastname, s.ged_firstname, s.ged_lastname

            from ged_main p LEFT join ged_main s

            on p.ged_id=s.spouse_id

            order by p.get_lastname


            The difference is that this query will return ALL the people in the table (technically, the “persons” table, which is on the left side of the join), regardless of whether they have a matching record in the “spouse” table (meaning, whether their ged_id appears in someone else’s spouse_id field, in the table on the "right" side of the join). Pretty cool, eh?


            Let us know how you get on. And sorry, I know perhaps all you “needed” was my point 3 above. But I offer this as a “teach a person to fish” moment.


            8) One last thing: I could have easily made a typo in the above (or you could do so while putting it into use). If there are errors, or if the results are not quite what you expect, please do read again what I wrote, and see if you may not perhaps detect the mistake on your own. Again, we here don’t know your data, so we may be challenged to help understand why something may still be “wrong”.



            • 3. Re: Importing GEDCOM DB Files to SQL Server
              seasonedweb Level 1



              Thanks for taking the time to suggest the ways to accomplish this.

              Here are all the details concerning my imported GEDCOM file.


              Each record was auto-incremented on import & "ged_ID" is the primary key.

              I gave each record a ged_spouse_ID .

              All individuals have spouses.

              All female spouses are in the DB are under their maiden names.


              John Smith ged_ID =  200, ged_spouse_ID =  201


              Mary Jones  ged_ID = 201, ged_spouse_ID = 200


              I have solved the issue of performing a query that results in output showing the individual and the ged_ID of the individualss before and after - only to make it easier for me to find couples, as that is how the records were arranged in the GEDCOM import.


              I will eventually need to hand input all spouse ID's- the GEDCOM file did not have that connection.


              I have temporarily entered zeros in all null ged_spouse_ID columns. ged_spouse_ID is type integer.


              So to display a query result that matches spouses,  I need to do a self join as you suggested.


              Wouldn't I need to include  p.ged_ID, s.ged_ID, p.ged_souse_ID, s.ged_spouse_ID in SELECT for this to work?


              You suggested:


              select p.ged_firstname, p.ged_lastname, s.ged_firstname, s.ged_lastname

              from ged_main p inner join ged_main s

              on p.ged_id=s.ged_spouse_ID


              Thanks again for you time & assistance.



              • 4. Re: Importing GEDCOM DB Files to SQL Server
                seasonedweb Level 1



                One of the query outputs that require the spousal match is the "details" page which displays information on both parties,

                here is my query:


                <cfquery name="detail_individual" datasource="#Request.BaseDSN#">






















                FROM ged_main p

                INNER JOIN ged_main s


                ON p.ged_id=s.ged_spouse_ID


                WHERE p.ged_ID = #URL.ged_ID#


                </cfquery> ?


                Action page shows 0 results using the above.