overview for traches

Where I live (not the US) I’m seeing closer to $240 per TB for M-disc. My whole archive is just a bit over 2TB, though I’m also including exported jpgs in case I can’t get a working copy of darktable that can render my edits. It’s set to save xmp sidecars on edit so I don’t bother with backing up the database.

I mostly wanted a tool to divide up the images into disk-sized chunks, and to automatically track changes to existing files, such as sidecar edits or new photos. I’m now seeing I can do both of those and still get files directly on the disk, so that’s what I’ll be doing.

I’d be careful with using SSDs for long term, offline storage. I hear they lose data if not powered for a long time. IMO metadata is small enough to just save a new copy when it changes

Incremental backups to optical media: tar, dar, or something else? in c/selfhosted@lemmy.world

[–] traches@sh.itjust.works 1 points 5 days ago (2 children)

I’ve been thinking through how I’d write this. With so many files it’s probably worth using sqlite, and then I can match them up by joining on the hash. Deletions and new files can be found with different join conditions. I found a tool called ‘hashdeep’ that can checksum everything, though for incremental runs I’ll probably skip hashing if the size, times, and filename haven’t changed. I’m thinking nushell for the plumbing? It runs everywhere, though they have breaking changes frequently. Maybe rust?

ZFS checksums are done at the block level, and after compression and encryption. I don’t think they’re meant for this purpose.

Self-Hosted podcast has announced that episode 150 is their last. in c/selfhosted@lemmy.world

[–] traches@sh.itjust.works 42 points 6 days ago* (last edited 5 days ago) (5 children)

Aww, man, I’m conflicted here. On one hand, I’ve enjoyed their work for years and they seem like good dudes who deserve to eat. On the other, they’re AI enthusiast crypto-bros and that’s just fucking exhausting. I deal with enough of that bullshit at work

Edit: rephrase for clarity

Is selfhosting your Girlfriend a good idea? 😂 in c/selfhosted@lemmy.world

[–] traches@sh.itjust.works 2 points 6 days ago

humans are neat

What's something that's free that everyone should know about? in c/asklemmy@lemmy.world

[–] traches@sh.itjust.works 50 points 6 days ago (4 children)

I know lemmy is social media for people with a favorite Linux distro so I’m preaching to the choir here, but so much software is free as in speech it is truly wonderful. It’s like the only thing I love about being a millennial

Incremental backups to optical media: tar, dar, or something else? in c/selfhosted@lemmy.world

[–] traches@sh.itjust.works 1 points 1 week ago

Yeah, you're probably right. I already bought all the stuff, though. This project is halfway vibes based; something about spinning rust just feels fragile you know?

I'm definitely moving away from the complex archive split & merge solution. fpart can make lists of files that add up to a given size, and fd can find files modified since a given date. Little bit of plumbing and I've got incremental backups that show up as plain files & folders on a disk.

Incremental backups to optical media: tar, dar, or something else? in c/selfhosted@lemmy.world

[–] traches@sh.itjust.works 3 points 1 week ago* (last edited 1 week ago) (4 children)

Ohhh boy, after so many people are suggesting I do simple files directly on the disks I went back and rethought some things. I think I'm landing on a solution that does everything and doesn't require me to manually manage all these files:

fd (and any number of other programs) can produce lists of files that have been modified since a given date.
fpart can produce lists of files that add up to a given size.
xorrisofs can accept lists of files to add to an iso

So if I fd a list of new files (or don't for the first backup), pipe them into fpart to chunk them up, and then pass these lists into xorrisofs to create ISOs, I've solved almost every problem.

The disks have plain files and folders on them, no special software is needed to read them. My wife could connect a drive, pop the disk in, and the photos would be right there organized by folder.
Incremental updates can be accomplished by keeping track of whenever the last backup was.
The fpart lists are also a greppable index; I can use them to find particular files easily.
Corruption only affects that particular file, not the whole archive.
A full restore can be accomplished with rsync or other basic tools.

Downsides:

Change detection is naive. Just mtime. Good enough?
Renames will still produce new copies. Solution: don't rename files. They're organized well enough, stop messing with it.
Deletions will be disregarded. I could solve this with some sort of indexing scheme, but I don't think I care enough to bother.
There isn't much rhyme or reason to how fpart splits up files. The first backup will be a bit chaotic. I don't think I really care.
If I rsync -a some files into the dataset, which have mtimes older than the last backup, they won't get slurped up in the next one. Can be solved by checking that all files are already in the existing fpart indices, or by just not doing that.

Honestly those downsides look quite tolerable given the benefits. Is there some software that will produce and track a checksum database?

Off to do some testing to make sure these things work like I think they do!

46

Incremental backups to optical media: tar, dar, or something else? (sh.itjust.works)

submitted 1 week ago* (last edited 1 week ago) by traches@sh.itjust.works to c/selfhosted@lemmy.world

30 comments fedilink

I'm working on a project to back up my family photos from TrueNas to Blu-Ray disks. I have other, more traditional backups based on restic and zfs send/receive, but I don't like the fact that I could delete every copy using only the mouse and keyboard from my main PC. I want something that can't be ransomwared and that I can't screw up once created.

The dataset is currently about 2TB, and we're adding about 200GB per year. It's a lot of disks, but manageably so. I've purchased good quality 50GB blank disks and a burner, as well as a nice box and some silica gel packs to keep them cool, dark, dry, and generally protected. I'll be making one big initial backup, and then I'll run incremental backups ~monthly to capture new photos and edits to existing ones, at which time I'll also spot-check a disk or two for read errors using DVDisaster. I'm hoping to get 10 years out of this arrangement, though longer is of course better.

I've got most of the pieces worked out, but the last big question I need to answer is which software I will actually use to create the archive files. I've narrowed it down to two options: dar and bog-standard gnu tar. Both can create multipart, incremental backups, which is the core capability I need.

Dar Advantages (that I care about):

This is exactly what it's designed to do.
It can detect and tolerate data corruption. (I'll be adding ECC data to the disks using DVDisaster, but defense in depth is nice.)
More robust file change detection, it appears to be hash based?
It allows me to create a database I can use to locate and restore individual files without searching through many disks.

Dar disadvantages:

It appears to be a pretty obscure, generally inactive project. The documentation looks straight out of the early 2000s and it doesn't have https. I worry it will go offline, or I'll run into some weird bug that ruins the show.
Doesn't detect renames. Will back up a whole new copy. (Problematic if I get to reorganizing)
I can't find a maintained GUI project for it, and my wife ain't about to learn a CLI. Would be nice if I'm not the only person in the world who could get photos off of these disks.

Tar Advantages (that I care about):

battle-tested, reliable, not going anywhere
It's already installed on every single linux & mac PC , and it's trivial to put on a windows pc.
Correctly detects renames, does not create new copies.
There are maintained GUIs available; non-nerds may be able to access

Tar disadvantages:

I don't see an easy way to locate individual files, beyond grepping through snar metadata files (that aren't really meant for that).
The file change detection logic makes me nervous - it appears to be based on modification time and inode numbers. The photos are in a ZFS dataset on truenas, mounted on my local machine via SMB. I don't even know what an inode number is, how can I be sure that they won't change somehow? Am I stuck with this exact NAS setup until I'm ready to make a whole new base backup? This many blu-rays aren't cheap and burning them will take awhile, I don't want to do it unnecessarily.

I'm genuinely conflicted, but I'm leaning towards dar. Does anyone else have any experience with this sort of thing? Is there another option I'm missing? Any input is greatly appreciated!