I cannot think of any way that the Areca controller could corrupt a file unless your system is not fast enough to transfer the data. This would especially be true with raw files or a series of high resolution photos.
- Have you measured the read/write rates? RAID 5 is lousy for large files.
- Where are your Adobe "Media Cache Files"?
- Have you turned off compression?
- How full are your drives?
When was the last time you ran a parity verification on that array? Do you still have the original raws backed up elsewhere?
Thanks for the quick reply. The corruption does not happen immediately. It happens over time. I cannot say exactly how long, but older files seem to be more affected.
To answer your questions:
1. This actually is a RAID3 array to address the large file /speed issue (per Harm Millaard / PPBM6/7 recommendations. It's why I got the Areca card to begin with). Here's the ATTO Benchmark.
As you can see the burst speeds vary, but I would safely say ~900MB/s Read and ~500MB/s write. I would guess speed is not the issue. By the way on the secondary system, I converted the RAID3 to RAID5 but corruption still continues.
2. I keep the Media Cache files on a separate drive (SSD Array). I also keep Lightroom catalogs/previews on the separate SSD array. RAW show as corrupt in Lightroom or direct open in Photoshop
3. Compression was never enabled.
4. 25% Free Space
Thanks for your help!
Thanks for the quick response.
This problem is chronic and been going on for several years. I have checked the parity in the past, but it did not help. It has been a while since I've run the check, but I can do it again.
The controller has an option to "Check Volume Set" and I select all available options.
- Scrub Bad Block If Bad Block Is Found, Assume Parity Data Is Good.
- Re-compute Parity If Parity Error Is Found, Assume Data Is Good.
Yes, I keep backups, and have restored the some files, but it keeps occurring. Sometimes, I get a backup of a corrupt file and it's difficult to tell how far back I need to go to restore the file.
I've also tried updating to the latest firmware and drivers for the Areca card. Also all systems components just to be sure.
I used to be an enterprise Systems Administrator for years dealing specifically with storage arrays and I've never come across anything like this. I'd really rather not spend a lot of money moving to a new solution, but it just seems to be getting worse.
Thanks very much for your help.
Well that sounds like bad blocks for sure and progressive at that which means you likely have a bad drive. Also those are desktop drives instead of enterprise drives which means they have a higher chance of bad blocks developing and they don't have the Error recovery time out feature the enterprise drives do. This entire array should not be in a parity raid until you have enterprise drives. Even a Raid 10 block raid would be pushing it with this many but likely help here. Either way the raid needs to be redone and fully initialized again. I would highly suggest you replace the drives with enterprise drives as well now since you should have them anyway. Otherwise change to a raid 10 or continue to deal with these block errors.
That's what I was afraid of. Was hoping there was any other solution.
Thanks for the support. Greatly appreciated.
I do have another theory, but before I begin, I'll say Eric Bowen from ADK is an absolute uber-tech and he knows a LOT more than I ever will .
What you are seeing is absolutely bizarre!
Areca cards are awesome, but their support is not that great. You might want to pose this same question on the Areca's Owners Forum on hardforum.com. There are some really experienced Areca users that frequent that forum that may recognize this particular failure mode.
Your firmware is pretty old; I updated all of my Areca's (1880 and 1882 series cards) to v1.52 a long time ago. I would suggest you do the same. Areca does not share all the "fixes" that they may do in firmware and in general newer is better.
I read a long, long time ago about some subtle incompatibilities with some drives and Areca controller cards specifically dealing with SATA NCQ support. My number one guess, and hope for you, is that you have this turned on and that by turning it off you may resolve this crazy issue.
Finally, I don't agree with Eric about only using non-Enterprise drives with Areca parity RAIDS. I've been violating that rule for too many years using WD and Hitachi drives to feel that is your only problem here. Before I ran Areca's with non-enterprise drives I did a lot of www research regarding users building 48 drive arrays using non-Enterprise drives and Areca controllers (for them Enterprise was way more expensive with that many drives in play). And personally, I've had probably a dozen Areca cards, used probably around 60 various drives in them over the years and never, never, ever lost any data. My use has been 100% RAID 5 (and some RAID 0) and while I have had varied hardware failures (5 bad drives - 3 were Enterprise and 2 were non-Enterprise), a bad RAM stick (Areca replaced on wtty. - never had any file errors at all, but the Areca error log showed about 6 events where memory ECC had to kick in to prevent a loss), a bad Areca CPU (totally cratered, but when then sent me a RMA card the drives picked right up and again, no data loss whatsoever).
Yes, Enterprise drives are recommended, but Areca's newer firmware bios settings allow you to even disable TLER completely which can result is REALLY slow array performance but NO data loss under any circumstance.
Best of luck! What you are dealing with sounds VERY frustrating!
Tony Leps wrote:
4. 25% Free Space
Tony, there is one other problem that could contribute to your gradual degradation. Your 75% full drives mean that your read and write performance is not what shows in your Atto test. Here as an example is an Areca HD Tune test of my 1880 with two Seagate ST2000DM001 drives in RAID 0, notice the fall-off in performance versus disk fullness.
Disabling TLER does not mean no data loss ever. It means the drives wont time out in the process of trying to repair a bad block or pull the data from a bad block to move it to a known good location. TLER just prevents the drives from taking to long trying to repair the data/block and the controllers prematurely marking them out as bad drives since the drives don't respond to the controller when running the repair operations. BTW if you don't run parity verifications regularly disabling TLER just increases the likelihood of volume corruption and eventual collapse. I know many who have used desktop drives for parity raids as well. I can also tell you many of them have lost their entire arrays due to corruption and volume collapse. Many had to rebuild their raids constantly because the drives were marked as bad from taking to long in a repair operation. Drive manufacturers ship drives with up to 1% bad blocks. That is the reality of the industry. If you don't write zeroes to those drives before putting them in raids then they have to find those blocks as they are used. So this will occur at some point regardless. Enterprise drives are not that much more than desktop drives. So the question becomes what is your data and time worth. That answers whether your willing to risk desktop drives in a parity raid or not. Raid 0 has never been a question with Desktop drives. Only Parity raids and larger Block level mirroring redundant arrays.
There are possibility that your DSLR MOV file may get corrupt or damaged. I had also faced this problem with my camera. We have to handle it carefully in order to avoid mov files getting corrupt. The corrupt or damaged mov files can be repaired by using third party software. I was able to recover the damaged DSLR mov file with the help of MOV File Repair Tool. Most importantly this software is very user friendly.