cybervegan

joined 1 year ago
[–] cybervegan@lemmy.world 5 points 4 days ago (2 children)

Mandatory where?

[–] cybervegan@lemmy.world 2 points 6 days ago

Yeah it was wild, but I suspect few orgs do things that way any more.

[–] cybervegan@lemmy.world 3 points 1 week ago

Salmon reversing aid.

[–] cybervegan@lemmy.world 6 points 1 week ago (2 children)

You mean in the context of high availability?

tl;dr: It's to test if the cluster fail-over configuration is working properly.

So this was before things like Kubernetes or Terraform were a thing, so had to be done by the operating system itself. The simplest HA cluster is made of two nodes, one in "active node", the other "passive". The active node does all the work, and the passive node just keeps its data synchronised with the active node. I used to use DRBD for this, which is a system for copying writes to the active node over a network link to the passive node. That only gives you a "second, up-to-date copy" which is not that useful on its own - you also need a way to automatically switch over to using the passive node if the active one "dies", and for that I used to use "heartbeat", which simply passes packets back and forth between the two cluster members - ping-pong style - and if the passive node notices that the active node hasn't sent its scheduled packet for, say, 10 seconds, it cuts it off the current active node (kills it), and promotes itself to the active role, thus preserving the service. Killing the "other node" is necessary to stop data corruption or user requests going to a node that can't actually service them, and is called STONITH - Shoot The Other Node In The Head. STONITH can involve an electronically controlled switch, which literally cuts off power to the "other" node, or can isolate it on the network, by shutting down its network ports on the switch, or in a VM setup, sending a notification to the hypervisor to kill the VM.

The reason you need to be able to kill the kernel on the active node, is that when you manually shut down the active node, it automatically informs the passive node that it's going down, known as an "orderly fail-over", and you're not actually testing if the heartbeat fail-over works, you're just testing an orderly fail-over. Killing the active node's kernel tests that the passive node is properly configured to take over during a catastrophic failure of the active node. You can watch the heartbeat status go from "up" to "down", and then see the passive node decide to take over, promote itself and bring up its services, and begin processing requests.

To make sure it's all working, you need to test orderly fail-overs first, from both nodes, then test disorderly fail-overs both ways, by using the kernel gun on the active node.

Things moved on from Heartbeat-based HA clusters to multimode clusters managed by Corosync and other software, enabling other strategies to be employed. This was eventually supplanted by "orchestration" systems like Kubernetes, and proprietary Virtual Cloud systems that move this functionality to the platform rather than the operating system.

[–] cybervegan@lemmy.world 8 points 1 week ago* (last edited 1 week ago) (4 children)

Nah man. "kill" doesn't shut the system down quickly. This is the "instant death" way - the kernel reset gun - no shutdown scripts, no disk sync, just reset to BIOS boot sequence, instantly:

As root:

echo 1 > /proc/sys/kernel/sysrq

echo b > /proc/sysrq-trigger

If you change out the "b" in the second command for "o" it will just halt the kernel instead of rebooting. Still switched on, but the system is doing absolutely nothing.

I used to use this trick all the time to test high availability server clusters.

[–] cybervegan@lemmy.world 1 points 1 week ago

Sounds great, look forward to seeing that. After using it a bit more, another thing occurred to me - there's no way to open arbitrary files. I don't use MarkDown for "just notes" or "just one thing", I keep markdown files all over the place. I had set the repository directory to be that of my blog posts during first run, but then I can't open things in my notes directory or documents folder, and I can't see anywhere in the settings dialogue to change it. Am I missing something?

[–] cybervegan@lemmy.world 1 points 1 week ago (1 children)

Yeah "but not as annoying" lol. No idea what you mean about jeeps: I'm in the UK, and not a car enthusiast either.

[–] cybervegan@lemmy.world 13 points 1 week ago (2 children)

Seems quite good - I've tried a LOT of MarkDown editors over the years, but until quite recently, I'd stuck with Zettlr for a long time. I've recently reinstalled my laptop, which made me look for alternatives to some software, and I've been playing round with MarkText for the last few days, which seems nice.

HelixNotes is definitely good - if I had to drop MarkText, I think I could get on well with it. I like that they have a debian repository, so I can keep it updated with the usual system update software. I downloaded the AppImage as a quick test, but it didn't work because it was compiled against an old version of glibc.

The only thing I don't like so far is the format toolbar is at the bottom of the editor screen, and I haven't found a way to move it.

[–] cybervegan@lemmy.world -1 points 1 week ago

You're quite right, Ozone is actually O~3~, I got that wrong. I should have looked it up, but I didn't, hence the error. I'm so sorry I mislead you - can you forgive me? Ozone is actually very interesting - did you know there is a layer of the upper atmosphere known as The Ozone Layer, and that it has a hole in it? Also, Ozone is sometimes produced by chemical reactions and electrical arcs - it has a distinctive, Ozoney smell. As you also made mistakes, I think we are now even - have you ever considered taking up a career as a Large Language Model?

[–] cybervegan@lemmy.world 5 points 1 week ago (3 children)

What about Microsoft Bob? Doesn't that count as their first attempt?

[–] cybervegan@lemmy.world 9 points 1 week ago

Just wait until LLMs are used to design most of them - they will be distinctively average.

[–] cybervegan@lemmy.world 6 points 1 week ago

I usually do that too, but this time round I've really not had the spoons. Had the "Red Letter" yesterday - "we're sending in the heavies". I still CBA - send 'em. We don't have a telly and we don't watch any telly via the web - we HATE TV. I don't even watch any U-tubers regularly. Been this way for >20 years.

 

To be honest, I've seen commercial 7' racks in data centres and computer rooms that were worse than the worst ones here!

I was once tasked with rejigging 3 racks in a remote computer room. The racks were arranged in an "L" pattern due to the constraints of the room. None of the doors - front or back - could close because of cables running between servers and switches. Some cables actually ran diagonally across the L shape. A lot of cables were jammed between the mounting rails, and 3 metre cables were used where a 50cm one would have done, or 2 metre ones where a 3 meter or more was needed. Almost nothing was labelled, and where it was, it was wrong. The cable colour coding scheme was ignored, and nothing was recorded. There were servers racked on a slant - TWO nuts off on one side - and even mounted back-to-front. Others were literally sat directly on top of other kit, not bolted in at all. RAID arrays for critical servers were mounted in adjacent racks, with the cables running around the opened rear rack door, and there were a number of suspicious, unmarked servers, of odd brands that were hooked into the main switch, that nobody could identify. One turned out to be an abandoned Nagios server, but one was never identified, and nothing broke, nobody screamed when I turned it off.

Just about all the horrible things you have seen or heard about were in that room. It took weeks to sort it out.

view more: next ›