Hello everyone,

Recently I have returned to managing a kubernetes cluster in my homelab with Ansible on RHEL distros. Since I haven’t touched to the installation stages since quite a long time I started to look for tutorials from the base installation to the cni configuration, MetalLB setup and metrics server installation.

In every single tutorial, I have seen major issues that made me pull my hair:

  • First and the worst, most tutorials obviously have the firewall disabled or tells you to deactivate it. Just. No. I know deactivating it makes everything much easier and many issues disappear as soon as you run a systemctl stop firewalld. But if you want to teach correcty, you wouldn’t recommend something that would make you fired on the spot.

  • CNI installations are straight forward but miss important information for troubleshooting. Stuff like putting flannel interfaces in the internal zone or adding some direct forwarding rules to firewalld can be necessary but again, everyone and their mothers have their firewall off so they never talk about it.

  • In MetalLB, the configMap used by the speakers is not created automatically by the official manifest. Missing it is impossible as the speaker straight up do not start and the logs are straightforward. Yet I have never seen one tutorial mention it.

  • Again in metalLB, if the controller is on a worker node, webhooks are not accessible and you cannot configure the load balancer. It’s rare-ish and easy to fix but again, never seen any mention of that

  • While Flannel, MetalLB, Weave, … clearly state which ports you need to open for their solutions, tutorials never do (firewall? Someone?)

  • The metrics server has some … Particularities (like the need to modify the startup arguments or the dnsPolicy). Those are easily found in the github issues due to how frequent they’re but I can never seem to find a tutorial mentionning those extra configuration to do.

  • Various basic stuff like a worker node + a cni being needed for coreDNS and the master node to become ready. Or how to verify your deployment of ingress/cni/metalLB is working correctly. If you are familiar with Kubernetes, it’s not too hard to find the solution to those but when most of your audience, it should be explicit to at least share a random nginx manifest to test if everything is good.

This is mainly a rant because it is crazy to see that a tutorial that is supposed to explain the documentation but faster is utterly useless because of course, you won’t get any forwarding issues between interfaces if your device is an open bar.

And that most of them are like this.

So to everyone who also tried to follow tutorials for the set up of their clusterw what was your experience with them? Were they also useless or did you find a gem that didn’t simply copy pasted the documentation and took screenshots of an working cluster setup without trying their guide?

  • yara@feddit.de
    link
    fedilink
    English
    arrow-up
    15
    ·
    edit-2
    10 months ago

    Most tutorials I read nowadays include some parts where the author suddenly writes: This setting is terrible and should never be enabled in a production environment, however since this is just a demo I’ll use it either way! So in the end I usually have to stick to the official documentation + forum posts… I really dont understand (money and clicks…) how someone can be proud of their tutorials when they aren’t even remotely production ready…

  • ramble81@lemm.ee
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    2
    ·
    10 months ago

    Sweet, I was looking for something like this. Please let me know when you write a tutorial addressing these points so I can help follow it. Link it here!

  • notfromhere@lemmy.one
    link
    fedilink
    English
    arrow-up
    6
    ·
    10 months ago

    You bring up a lot of great points. I disabled the firewall on my bare metal cluster nodes and didn’t give it another thought. I had to go digging to figure out how to encrypt secrets, and NFS StorageClass is not very great security wise either. Not to mention lack of isolation for privileged containers. I found kata containers a good solution to that. Then there’s wireguard between workers I don’t know if I got working correctly because I can’t figure out how to really test it.

  • bigredgiraffe@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    10 months ago

    I have run into this issue a lot, I have always found that most of the tutorials set things up in isolation and never talk about integration points or how to build a whole solution.

    On the MetalLB configmap point, that’s another issue I have run into. In the earlier days of metallb it was configured differently and the configmap was automatically created but that has since changed, took me a bit to figure out when that changed as their docs aren’t explicit if I remember correctly. Annoying either way.

    I think the reason most tutorials turn off the firewall is in a well configured cloud environment like AWS the host firewall is redundant due to security groups and that is what everyone targets the tutorials for unfortunately and they never explain that even with “disable this if you have other mitigating controls in place” or something.

    I have also wondered if we have finally reached the era where the majority of content creators and consumers have never touched an on-prem network and don’t even think about that lens anymore, another good example of this is trying to configure MetalLB in a host with multiple interface that don’t have the same networks available (you know, like using dedicated interfaces for storage like you should), for a long time it just wasn’t possible and metallb would announce all networks on all interfaces which made it basically not functional heh. Whatever the reason is, you are not alone in being annoyed :D

    Anyway, these are great points, I have been pondering writing up a larger set of tutorial about my setup since it’s more similar to a small enterprise anymore, I should get on that hah.

  • Aa!@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    A couple of main points:

    • You are reading tutorials to help you get it up and running. Most of the time these are designed to walk you through setting things up on a fresh node, and most often just VMs on an isolated (trusted) network. When you are providing a guide to just get someone up and running, the first thing to do is establish a known baseline configuration to start from.
    • Kubernetes is a complex distributed application, and as such, the audience is generally expected to be relatively experienced. Meaning if you don’t know how to configure your firewall, people assume you aren’t going through this tutorial.

    Still, I feel your pain. When trying to get into these technologies, most people who have done the work are engineers, and we stink at writing documentation. I’m sure you’re familiar with it, we automate the solutions for issues we encounter, and then those tools or automatic configurations fail to make it to the end user.

    And I’m probably biased, but don’t use a video guide for this sort of thing. It’s just the wrong medium for a technical tutorial.