Unrepairable data corruption on a raidz2 when all drives show zero errors -- HOW???

3-2-1-backup@alien.top · 11 months ago

Unrepairable data corruption on a raidz2 when all drives show zero errors -- HOW???

Sopel97@alien.top · 11 months ago

It’s definitely not the recent ZFS bug that others mentioned here. Simply to the fact that when corruption occurs due to that bug it cannot be identified, the filesystem is consistent.

https://discourse.practicalzfs.com/t/recurring-permanent-errors-in-healthy-zpool/919/5 is this relevant perhaps?

Most_Mix_7505@alien.top · 11 months ago

Does it have ECC memory?

3-2-1-backup@alien.top · 11 months ago

This is my backup server, so no. Primary does.

Most_Mix_7505@alien.top · 11 months ago

That might be the culprit

imakesawdust@alien.top · 11 months ago

What version of ZFS are you running? Are you using native ZFS encryption?

Run two scrubs and see if the problem goes away. Has to be at least two.

3-2-1-backup@alien.top · 11 months ago

Well, two steps forwards, one step back. The scrub I ran yesterday at least showed some errors, but I’m having trouble identifying exactly what is the actual problem. I think I’ll sleep on it and form a new plan in the morning.

Controller failure? RAM failure? Dmesg shows absolutely nothing, no panics no anything so I’m not thinking it’s ram. Hmmmm… maybe I’ll run mtest after I get some sleep.

3-2-1-backup@BackupServer:~$ sudo zpool status -vx
pool: data_pool3
state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
 see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
 scan: scrub repaired 40K in 07:07:07 with 4 errors on Tue Nov 28 22:39:33 2023
config:

    NAME                        STATE     READ WRITE CKSUM
    data_pool3                  ONLINE       0     0     0
      raidz2-0                  ONLINE       0     0     0
        wwn-0x5000ccax1  ONLINE       0     0     8
        wwn-0x5000ccax2 ONLINE       0     0    10
        wwn-0x5000ccax3 ONLINE       0     0     8
        wwn-0x5000ccax4 ONLINE       0     0     8
        wwn-0x5000ccax5 ONLINE       0     0     8
        wwn-0x5000ccax6 ONLINE       0     0     8
        wwn-0x5000ccax7 ONLINE       0     0     8
        wwn-0x5000ccax8 ONLINE       0     0     8

errors: Permanent errors have been detected in the following files:

    data_pool3/(redacted)/downloads@backup_script-2023-11-28-0901:/(redacted).mkv
    data_pool3/(redacted)@backup_script-2023-11-28-2001:/ISOs/Ubuntu/23.10/ubuntu-23.10.1-desktop-amd64.iso
    data_pool3/(redacted)@backup_script-2023-11-07-0901:/(redacted).mkv

Hey wow, even though my problem is getting worse (maybe), an actual honest-to-god ISO showed up in the problem file list!

ultrahkr@alien.top · 11 months ago

There’s a bad ZFS corruption that applies to certain pools created with ZFS 2.1.x+

Maybe you’re hitting it?

NOTE: It’s under extremely specific conditions, there’s no need to panic…