Hvorfor gjøre det enkelt når du kan bruke FreeNAS? (Ekstra)

Sokkalf™ · 8. mai 2014

Kjører du ESXi i bunn?

Jepp.

Lindsay · 8. mai 2014

@ Sokkalf det er nettopp hvordan minne leser fra disken som er forskjellen mellom ECC og ikke ECC

What happens when non-ECC RAM goes bad in a system for file systems that don't do their own parity and checksumming?

So pretend a file is loaded into RAM and then saved to an NTFS/ext4/UFS/whatever-file-system-you-want-that-doesn't-do-its-own-parity/checksumming.

The file is 8bits long. 00110011.

Since our file got stored in RAM, its been corrupted. Now its 01110011. Then that is forwarded to the disk subsystems and subsequently stored on the disk trashed. If this were file system data you might have bigger problems too. So now, despite what your destination is, a hard disk, an SSD, or a redundant array, the file will be saved wrong. No big deal except for that file. It might make the file unable to be opened in your favorite office suite or it might cause a momentary corruption of the image on your streaming video.

What happens when non-ECC RAM goes bad in a ZFS system?

So pretend that our same 8bit file is stored in RAM wrong.

Same file as above and same length: 00110011.

The file is loaded into RAM, and since its going to be stored on your ultra-safe zpool it goes to ZFS to be paritied and checksummed. So your RAIDZ2 zpool gets the file. But, here's the tricky part. The file is corrupted now thanks to your bad RAM location.

ZFS gets 01110011 and is told to safely store that in the pool. So it calculates parity data and checksums for the corrupted data. Then, this data is saved to the disk. Ok, not much worse than above since the file is trashed. But your parity and checksum isn't going to help you since those were made after the data was corrupted.

But, now things get messy. What happens when you read that data back? Let's pretend that the bad RAM location is moved relative to the file being loaded into RAM. Its off by 3 bits and move it to position 5. So now I read back the data:

Read from the disk: 01110011.

But what gets stored in RAM? 01111011.

Ok, no problem. ZFS will check against its parity. Oops, the parity failed since we have a new corrupted bit. Remember, the checksum data was calculated after the corruption from the first memory error occurred. So now the parity data is used to "repair" the bad data. So the data is "fixed" in RAM. Now it's supposed to be corrected to 01110011, right? But, since we have that bad 5th position, its still bad! Its corrected for potentially one clock cycle, but thanks to our bad RAM location its corrupted immediately again. So we really didn't "fix" anything. 01111011 is still in RAM. Now, since ZFS has detected corrupted data from a disk, its going to write the fix to the drive. Except its actually corrupting it even more because the repair didn't repair anything. So as you can see, things will only get worse as time goes on.

Now let's think about your backups.

If you use rsync, then rsync is going to backup the file in its corrupted form. But what if the file was correct and later corrupted? Well, thanks to rsync, you've actually synced the corrupted copy to your backups! Great job on those backups! Yes, that last sentence was sarcasm.

What about ZFS replication? Surely that's better, right? Well sort of. Thanks to those regular snapshots your server will happily replicate the corruption to your backup server. And lets not forget the added risk of corruption during replication because when the ZFS checksums are being calculated to be piped over SSH those might be corrupted too!

But we're really smart. We also do religious zpool scrubs. Well, guess what happens when you scrub the pool. As that stuck memory location is continually read and written to, zfs will attempt to "fix" corrupted data that it thinks is from your hard disk and write that data back. But instead it is actually reading good data from your drive, corrupting it in RAM, fixing it in RAM(which doesn't fix it as I've shown above), and then writing the "fixed" data to your disk. So you are literally trashing your entire pool while trying to do a scrub. So good job on being proactive with your zpool, you've trashed all of your data because you couldn't be bothered to buy appropriate server hardware. Yes, that last sentence was sarcasm again.

So in conclusion:

1. All that stuff about ZFS self-healing, that goes down the drain because you didn't use ECC RAM.
2. Your backups quite possibly will be trashed because of bad RAM. Based on forum users over the last 18 months, you've got almost no chance your backups will be safe by the time you realize your RAM is bad.
3. Scrubs are the best thing you can do for ZFS, but they can also be your worst enemy if you use bad RAM.
4. The parity data, checksums, and actual data need to all match. If not, then repairs start taking place. And what are you to do when you need to replace a disk and your parity data and actual data don't match because of corruption? You lose your data. So was ZFS really that much more reliable than a hardware RAID because you wanted to go with non-ECC RAM?

To protect your data from loss with ZFS, here's what you need to know:

1. Use ECC RAM. Period. If you don't like that answer, sorry. It's just a fundamental truth.
2. ZFS uses parity, checksums, mirrors, and the copies parameter to protect your data in various ways. Checksums prove that the data on the disk isn't corrupted, parity/mirrors/copies corrects those errors. As long as you have enough parity/mirrors/copies to fix any error that ever occurs, your data is 100% safe(again, assuming you are using ECC RAM). So running a RAIDZ1 is very dangerous because when one disk fails you have no more protection. During the long(and strenuous) task of resilvering your pool you run a very high risk of encountering errors on the remaining disks. So any error is detected, but not corrected. Let's hope that the error isn't in your file system where corruption could be fatal for your entire pool. In fact, about 90% of users that lose their data had a RAIDZ1 and suffered 2 disk failures.
3. If you run out of parity/mirrors and your pool is unmountable, you are in deep trouble. There are no recovery tools for ZFS, and quotes from data recovery specialists startin the 5 digit range. All those tools you've used in the past for recovery of desktop file systems, they don't work with ZFS. ZFS is nothing like any of those file systems, and recovery tools typically only find billions of 4k blocks of data that looks like fragments to a file. Good luck trying to reconstruct your files. Clearly it would be cheaper(and more reliable) to just make a backup, even if you have to build a second FreeNAS server. Let's not forget that if ZFS is corrupted just badly enough to be unmountable because of bad RAM, even if your files are mostly safe, you'll have to consider that 5 digit price tag too.
4. Usually, when RAM goes bad you will normally lose more than a single memory location. The corruption is usually a breakdown of the insulation between locations, so adjacent locations start getting trashed too. This only creates more multi-bit errors.
5. ZFS is designed to repair corruption and isn't designed to handle corruption that you can't correct. That's why there's no fsck/chkdsk for ZFS. So once you're at the point that ZFS' file structure is corrupted and you can't repair it because you have no redundancy, you are probably going to lose the pool(and the system will probably kernel panic).

So now that you're convinced that ECC really is that important, you can build a system with ECC for relatively cheap...

Endret 8. mai 2014 av Lindsay

zirum · 8. mai 2014

Dette gjelder jo ikke alle fabrikk-NASer, men det å bruke freenas/nas4free sammen med Plex Media Server er helt genialt. Transcoding on the fly til mobile enheter, samt en meget waf-vennlig app til Samsung smart tv. Og har man sett halve filmen på en plattform, f.eks tv, kan man sekunder etter uten ekstra egeninnsats, plukke opp fra samme sted på hvilken som helst annen device/pc/tv.

Men har vært en terskel å få det på plass...

Savagedlight · 8. mai 2014

Om en mister 2,3 disker i et vdev betyr muligens att du ikke følger så godt med.

Men raid er på ingen måte en fullgod backup.

Ikke nødvendigvis. Det er mange grunner til at disker ryker, bl.a. fysisk stress, produksjonsfeil, og generell slitasje. Det er i grunn anbefalt å alltid ha minst 2 disker paritet, da man ser at flere disker ryker rett etter den første, grunnet stresset med resilvering. Dette er en generell erfaring med alle typer RAID.

Med litt uflaks så ryker det totalt tre disker før den første er byttet ut & resilvered, spesielt når det er normalt med 3-4 TB disker.

Ang. backup: RAID/redundancy er ikke backup. Snapshots er heller ikke backup, selv om det dekker en del av de samme bruksområdene ("å nei, jeg slettet en fil jeg ikke skulle!"). Du har ikke backup før du har dataene dine lagret på separat media. F.eks. en annen server, en harddisk, dvd/bd-plate, tape.. whatever. Det er også anbefalt å lagre på minst to forskjellige typer backup-media (f.eks. hdd og tape), men for hjemmebrukere skulle det være mer enn nok å bruke en hdd som kobles fra etter backupen er tatt, kombinert med noe optisk media av kritiske filer i ny og ne.

Endret 8. mai 2014 av Savagedlight

hernil · 8. mai 2014

Btrfs. Jeg bare kaster det ut der

Caddy · 8. mai 2014

OpenMediaVault. <3

jonny · 10. mai 2014

Selv bruker jeg Ubuntu server og ZFS (se zfsonlinux.org). Har kjørt dette i et par år uten problemer, selv om dette fortsatt er litt på beta-stadiet. Dette kjøres på en maskin med AMD Phenom II X6 1055t med 16GB ECC RAM. For tiden har jeg et 5x2TB RAIDZ-oppsett, har planer om å endre til RAIDZ2 etter hvert (må da kopiere over alle data).

Har tidligere prøvd FreeNAS, men ble litt oppgitt over at web-grensesnittet og zpool-/zfs-kommandoer ikke alltid spilte på lag. Det er forøvrig lenge siden jeg prøvde dette, mulig dette har blitt mye bedre nå.

ATWindsor · 9. juli 2014

@ Sokkalf det er nettopp hvordan minne leser fra disken som er forskjellen mellom ECC og ikke ECC

What happens when non-ECC RAM goes bad in a system for file systems that don't do their own parity and checksumming?

So pretend a file is loaded into RAM and then saved to an NTFS/ext4/UFS/whatever-file-system-you-want-that-doesn't-do-its-own-parity/checksumming.

The file is 8bits long. 00110011.

Since our file got stored in RAM, its been corrupted. Now its 01110011. Then that is forwarded to the disk subsystems and subsequently stored on the disk trashed. If this were file system data you might have bigger problems too. So now, despite what your destination is, a hard disk, an SSD, or a redundant array, the file will be saved wrong. No big deal except for that file. It might make the file unable to be opened in your favorite office suite or it might cause a momentary corruption of the image on your streaming video.

What happens when non-ECC RAM goes bad in a ZFS system?

So pretend that our same 8bit file is stored in RAM wrong.

Same file as above and same length: 00110011.

The file is loaded into RAM, and since its going to be stored on your ultra-safe zpool it goes to ZFS to be paritied and checksummed. So your RAIDZ2 zpool gets the file. But, here's the tricky part. The file is corrupted now thanks to your bad RAM location.

ZFS gets 01110011 and is told to safely store that in the pool. So it calculates parity data and checksums for the corrupted data. Then, this data is saved to the disk. Ok, not much worse than above since the file is trashed. But your parity and checksum isn't going to help you since those were made after the data was corrupted.

But, now things get messy. What happens when you read that data back? Let's pretend that the bad RAM location is moved relative to the file being loaded into RAM. Its off by 3 bits and move it to position 5. So now I read back the data:

Read from the disk: 01110011.

But what gets stored in RAM? 01111011.

Ok, no problem. ZFS will check against its parity. Oops, the parity failed since we have a new corrupted bit. Remember, the checksum data was calculated after the corruption from the first memory error occurred. So now the parity data is used to "repair" the bad data. So the data is "fixed" in RAM. Now it's supposed to be corrected to 01110011, right? But, since we have that bad 5th position, its still bad! Its corrected for potentially one clock cycle, but thanks to our bad RAM location its corrupted immediately again. So we really didn't "fix" anything. 01111011 is still in RAM. Now, since ZFS has detected corrupted data from a disk, its going to write the fix to the drive. Except its actually corrupting it even more because the repair didn't repair anything. So as you can see, things will only get worse as time goes on.

Now let's think about your backups.

If you use rsync, then rsync is going to backup the file in its corrupted form. But what if the file was correct and later corrupted? Well, thanks to rsync, you've actually synced the corrupted copy to your backups! Great job on those backups! Yes, that last sentence was sarcasm.

What about ZFS replication? Surely that's better, right? Well sort of. Thanks to those regular snapshots your server will happily replicate the corruption to your backup server. And lets not forget the added risk of corruption during replication because when the ZFS checksums are being calculated to be piped over SSH those might be corrupted too!

But we're really smart. We also do religious zpool scrubs. Well, guess what happens when you scrub the pool. As that stuck memory location is continually read and written to, zfs will attempt to "fix" corrupted data that it thinks is from your hard disk and write that data back. But instead it is actually reading good data from your drive, corrupting it in RAM, fixing it in RAM(which doesn't fix it as I've shown above), and then writing the "fixed" data to your disk. So you are literally trashing your entire pool while trying to do a scrub. So good job on being proactive with your zpool, you've trashed all of your data because you couldn't be bothered to buy appropriate server hardware. Yes, that last sentence was sarcasm again.

So in conclusion:

1. All that stuff about ZFS self-healing, that goes down the drain because you didn't use ECC RAM.

2. Your backups quite possibly will be trashed because of bad RAM. Based on forum users over the last 18 months, you've got almost no chance your backups will be safe by the time you realize your RAM is bad.

3. Scrubs are the best thing you can do for ZFS, but they can also be your worst enemy if you use bad RAM.

4. The parity data, checksums, and actual data need to all match. If not, then repairs start taking place. And what are you to do when you need to replace a disk and your parity data and actual data don't match because of corruption? You lose your data. So was ZFS really that much more reliable than a hardware RAID because you wanted to go with non-ECC RAM?

To protect your data from loss with ZFS, here's what you need to know:

1. Use ECC RAM. Period. If you don't like that answer, sorry. It's just a fundamental truth.

2. ZFS uses parity, checksums, mirrors, and the copies parameter to protect your data in various ways. Checksums prove that the data on the disk isn't corrupted, parity/mirrors/copies corrects those errors. As long as you have enough parity/mirrors/copies to fix any error that ever occurs, your data is 100% safe(again, assuming you are using ECC RAM). So running a RAIDZ1 is very dangerous because when one disk fails you have no more protection. During the long(and strenuous) task of resilvering your pool you run a very high risk of encountering errors on the remaining disks. So any error is detected, but not corrected. Let's hope that the error isn't in your file system where corruption could be fatal for your entire pool. In fact, about 90% of users that lose their data had a RAIDZ1 and suffered 2 disk failures.

3. If you run out of parity/mirrors and your pool is unmountable, you are in deep trouble. There are no recovery tools for ZFS, and quotes from data recovery specialists startin the 5 digit range. All those tools you've used in the past for recovery of desktop file systems, they don't work with ZFS. ZFS is nothing like any of those file systems, and recovery tools typically only find billions of 4k blocks of data that looks like fragments to a file. Good luck trying to reconstruct your files. Clearly it would be cheaper(and more reliable) to just make a backup, even if you have to build a second FreeNAS server. Let's not forget that if ZFS is corrupted just badly enough to be unmountable because of bad RAM, even if your files are mostly safe, you'll have to consider that 5 digit price tag too.

4. Usually, when RAM goes bad you will normally lose more than a single memory location. The corruption is usually a breakdown of the insulation between locations, so adjacent locations start getting trashed too. This only creates more multi-bit errors.

5. ZFS is designed to repair corruption and isn't designed to handle corruption that you can't correct. That's why there's no fsck/chkdsk for ZFS. So once you're at the point that ZFS' file structure is corrupted and you can't repair it because you have no redundancy, you are probably going to lose the pool(and the system will probably kernel panic).

So now that you're convinced that ECC really is that important, you can build a system with ECC for relatively cheap...

Hewr forutsettes det at rammen kontinuerlig flipper en bit ett spesielt sted på brukken, hvor ofte skjer det, og hvor fort går ikke alt til helvete uansett filsystem om det er tilfellet? Det vil hope seg opp med bitfeil hele tiden om du har en slik feil, ZFS eller ikke.

AtW

007CD · 9. juli 2014

Vi får se om jeg orker å gi det en skikkelig test, jeg har en 2Gb DDR2 RAM brikke med feil på, (Memtest sier det er rundt 900000 feil etter en runde med tester på bare den brikken) så får vi se om FreeNAS bare tryner, eller om vi greier å korrupte poolen.

Det er i allefall planen.

Logg inn

Hvorfor gjøre det enkelt når du kan bruke FreeNAS? (Ekstra)

Anbefalte innlegg

Sokkalf™

Videoannonse

Lindsay

zirum

Savagedlight

hernil

Caddy

jonny

ATWindsor

007CD

Opprett en konto eller logg inn for å kommentere

Opprett konto

Logg inn

Hvem er aktive 0 medlemmer