2 Replies Latest reply on Dec 30, 2008 7:07 PM by *gsb*

    SQLite/Flex Nightmare 3.2 Petabytes

      First things first, I haven't slept for 36 hours, brain is shutting down, so this may sound incoherent. I apologize, please go easy.

      Say I have an aerial imagery collection that is about 3.2 petabytes in total. This will work in SQLite since I'm only storing about 300,000 values and I already tested it. The raw data is disorganized.

      I'll skip details and go to the problem. I have a database and need to read three things into it. Latitude, longitude, and image location. Latitude and longitude are stored in a metadata.txt file.

      Example metadata location:
      \metadata\country\metadata_by_city_name\Oct1508\flightLine_#\metadata.txt

      The actual images are stored in a completely different folder that is similar in nature, but doesn't match the metadata folder.

      Example Image Location:
      \Raw_images\country\image_by_city_name\10-15-2008\flight_line#\1001.jpg
      \Raw_images\country\image_by_city_name\10-15-2008\flight_line#\1002.jpg

      The problem is that the image folder and the metadata folders don't match.

      Now normally that wouldn't be a problem, because I could just reformat one date into another, by manipulating a string, etc. But the problem lies in the fact that there is no standard naming convention for the folders. Sometimes it is named Oct152008 when another place the date folder would be formatted as oct_152008 or 1015_2008.

      I need to read in the metadata.txt files in each folder to the database but have no idea how to do this due to the current state of things...

      I'm thinking the only way to do this is to hand jam it. Someone would have to make a spreadsheet with two columns. On the left the image path and on the right, the metadata path. Then I could write code reading in one path, and pointing to the other.

      I would do this using C++ or C. Is this really the only way I can do this? This would be months of hand jamming that I'm not looking forward to.

      Any ideas?
        • 1. Re: SQLite/Flex Nightmare 3.2 Petabytes
          EWN-CMI
          Could you not try to standardize by writing a little filename parser, iterate through the existing file system and store the standardized version in a new file system. If the Date folder syntax is the only area of concern then there can only be a limited number of formats as you have listed in your post plus 921_08. You could even write a little search tool that picks a metadata folder and then tries each date combination in the images folders. It may consume a great deal of CPU time searching or parsing, but I believe you could save a month or two and carpel tunnel. Good Luck.
          • 2. Re: SQLite/Flex Nightmare 3.2 Petabytes
            *gsb* Level 1
            I agree with the above by EWN-CMI.

            Me, I might try PHP at least for a starter to test and refine a search tool.
            If the file exists like... and make the one-to-one chart without renaming the original files. ...IMHO