Parse Flat data file into a nested structure.
ilssac Feb 4, 2011 4:56 PMThis has been driving me crazy all day long.
I have a flat data file that I would like to parse into a nested data structure.
Small sample of the data:
0 HEAD 1 SOUR FTW 2 VERS Family Tree Maker (16.0.350) 2 NAME Family Tree Maker for Windows 2 CORP MyFamily.com, Inc. 3 ADDR 360 W 4800 N 4 CONT Provo, UT 84604 3 PHON (801) 705-7000 0 TRLR
If anybody recognizes this, yes that is a small piece of a GEDCOM file. That is what I am trying to parse. For anybody who is unfamiliar with this data format. The first number is the level of a piece of data. Level 0 are root elements of a data segment. Level 1 rows relate to the closest preceding level 0 data row. Level 2 rows relate to the closest preceding Level 1 data row. And so on.
Here is an example of the desired output nesting the various elements to the related parent.
<cfset foobar = {
HEAD = {lvl=0,
SOUR = {lvl=1,data="FTW",
VERS = {lvl=2,data="Family Tree Maker (16.0.350)"},
NAME = {lvl=2,data="Family Tree Maker for Windows"},
CORP = {lvl=2,data="MyFamily.com, Inc.",
ADDR = {lvl=3,data="360 W 4800 N",
CONT = {lvl=4,data="Provo, UT 84604"}},
PHON = {lvl=3,data="(801) 705-7000"}}}},
TRLR = {lvl=0}
}>
<cfdump var="#foobar#">
I think I am looking at some kind of recursive function to properly nest this data, but I just can not figure out how to do so.
I have this basic function that will output each row of data as a seperate structure key
<cffunction name="parseFile">
<cfargument name="file" required="yes">
<cfargument name="line" required="no" type="string" default="">
<cfscript>
var returnStruct = structNew();
var subStruct = structNew();
var cur_line = "";
var next_line = "";
var line_lvl = "";
var line_key = "";
var loop = true;
if (len(trim(arguments.line)) EQ 0) {
cur_line = fileReadLine(arguments.file);
}
else
{
cur_line = arguments.line;
}
do {
if (not FileISEOF(arguments.file)) {
next_line = fileReadLine(arguments.file);
}
else
{
next_line = "-1";
loop = false;
}
line_lvl = listFirst(cur_line, ' ');
cur_line = listRest(cur_line, ' ');
line_key = listFirst(cur_line, ' ');
cur_line = listRest(cur_line, ' ');
returnStruct[line_key] = structNew();
returnStruct[line_key]["level"] = line_lvl;
cur_line = next_line;
} while (loop);
return returnStruct;
</cfscript>
</cffunction>
<cfscript>
gedcom_file = FileOpen(getDirectoryFromPath(getCurrentTemplatePath()) & "Ian Skinner.GED","read");
/*gedcom_data = {individuals = structNew(),
families = structNew(),
sources = structNew(),
notes = structNew()};*/
gedcom_data = parseFile(gedcom_file);
</cfscript>
<cfdump var="#gedcom_data#" label="Final Output">
I have tried numerous ways to recursive call this function in order to nest the elements. None of them have produced the expect output in the above hand coded example. What has got me closest is to recursive call the parseFile() function near the end of the while loop if the level of the next line is greater than the current line level:
if (listFirst(next_line,' ') GT line_lvl) {
parseFile(arguments.file,next_line);
}
This works fairly well as long as the next line level is the same as or higher than the previous line level. But once the next line level is lower, the recursive call will not fall back to the proper parent level. The current function call just finishes out looping over the file data. Everything I have tried to provide a proper exit to the recursive function calls when the next line data belong to a previous parent row has just mangled the data horribly.


