6 Replies Latest reply on May 4, 2009 1:17 PM by Panoholic

    DNG and long term data integrity

    greg_d128

      I am using DNG format as my archival format. I am planning to keep my photos alive for a long time (as long as I am alive certainly). I have a pretty good backup / redundancy setup created (mirror disk, copied to 3rd disk once a month, that copied to LTO3 tape and keeping last 3 months worth of tape). I am concerned about silent data corruption.

       

      I am a system administrator in my day job, so I am somewhat knowledgable about data corruption and redundancy. According to the research at CERN (http://cern.ch/Peter.Kelemen/talk/2007/kelemen-2007-C5-Silent_Corruptions.pdf link to presentation), drives right now do not notify us of all possible corruption of data. Trying to preserve a large archive for 20 - 30 - 50? years we need to have a strategy for dealing with data corruption.

       

      The best solution I can see is to extend the DNG standard to provide optional ability to recover from errors (possibly in various strength levels). For an archival format, I see that as a must have.

       

      Greg

        • 1. Re: DNG and long term data integrity
          Ramón G Castañeda Level 4

          Greg,

           

          I see no problem with your perspective as long as you keep the original raw, PSD and TIFF files.

           

          Quite frankly, I'm not concerned with my ability to open my raw (supported by Photoshop and ACR), PSD and TIFF files during my lifetime.  If anything, I question wheter, in the long run, only Adobe applications may support DNG files to the bitter end.  I'm guessing it will all depend on whether Nikon and Canon follow the example of Pentax and others in supporting in-camera generated DNG files.

           

          On a side note, I found that "silent corruption" is attributable to IT administrators in nearly all cases.  < ducking and running >

          • 2. Re: DNG and long term data integrity
            greg_d128 Level 1

            Thanks for a very prompt reply.

             

            I generally keep PSD and DNG files. I get rid of the raw files. Not every RAW file has a PSD file associated with it (only the ones that I've edited / published, etc). Quite possibly I may have to switch to including raw files inside my dng's. Still, an error checking and correcting solution would be much better. When you have 2 supposedly identical files, and they're different - which one is the "correct" one? Can you detect that without human intervention? As for silent corruption being attributable to IT admin's.. well.. In the large, highly visible cases it's definitely true. We all have our bad days, and sometimes it means we spend the next 3 days recovering all of user's data off of tape (speaking from experience) ;-)

             

            Reading DNG's is a much easier problem. On the root of the drive I've got the DNG specification document in a number of formats. DNG is actually very close to TIFF, and I know I can build a parser for it in less than a day (I have as part of my schoolwork). Add a few days to handle compression, etc. and I could get the data off in a week or so. The fact that format is publicly documented means it will easily survive Adobe.  PSD's a lot more problematic. I haven't decided how I'm going to deal with those yet (Re-save as a new file for each upgrade? What if that corrupts it? Should I keep the older formats?)

             

            Found a much easier to read article on this topic: http://blogs.zdnet.com/storage/?p=191.

            • 3. Re: DNG and long term data integrity
              Ramón G Castañeda Level 4

              Thanks for the link to that article.

               

              Glad I've stayed away from WD drives.

              • 5. Re: DNG and long term data integrity
                xbytor2 Level 4

                The email gateway seems to be randomly dropping replies. It's like this Jive software is deliberately trying to get me to stop posting.

                 

                Dropped message:

                 

                greg_d128 wrote:
                > Still, an error checking and correcting solution would be much better. When you have 2 supposedly identical files, and they're different - which one is the "correct" one? Can you detect that without human intervention?  
                I will be looking at file system support for some help here. zfs/RAID-Z will help with bit rot, non-identical copies, etc... but regardless of what is used, archives will still require active management and periodic verification scans. Simply burning your stuff to tape or disc or whatever is not enough.

                Adding an additional level of error correction/checksums at the DNG level wouldn't hurt and may be appropriate when the file system and storage media are less than ideal. Graceful degradation would be nice. And so would a pony.

                > Reading DNG's is a much easier problem. On the root of the drive I've got the DNG specification document in a number of formats.
                I've heard grumblings that the DNG spec is a bit vague in areas. Having not attempted to implement a parser for it, I can't really speak to the matter directly but it does raise concerns. I'll feel more comfortable when a proper 'standard' is in place and agreed upon by something larger than an Adobe-centric community. Two independent FOSS implementations that meet a complete set of validation requirements would be a bare minimum. Specs from a single vendor, while maybe a good start, is not sufficient for the long haul.

                > PSD's a lot more problematic.
                You have a gift for understatement.

                > I haven't decided how I'm going to deal with those yet (Re-save as a new file for each upgrade? What if that corrupts it? Should I keep the older formats?)
                I'm torn here, too. Two solutions I see are:

                1) Keep all versions of PS available. VM technology makes this somewhat manageable. It is safe to say that there will be a way to run WinXP/SP2 in a VM on whatever computer you happen to own 20 years from now. This may conflict with some Adobe licensing fine print somewhere, but the alternative is that you potentially no longer have access to your own data.
                BTW, this is a major problem with MS Office documents right now.

                2) Whenever a new major rev of PS comes out, save a new copy for that rev and retain all of the old copies.



                Paranoia is a good thing when it comes to archives.

                -X

                • 6. Re: DNG and long term data integrity
                  Panoholic Level 2

                  The DNG specification version 1.2 introduced a tag named RawImageDigest, which is an MD5 digest. Not suitable for error correction, but for error recognition.

                   

                  Gabor