You have no doubt noticed that federation is breaking again. I am painfully aware of it. The issue is with the symphony queue runner that processes incoming messages from other instances. Occasionally, the server receives a message that causes the queue runner to die. I have to manually remove the offending message out of rabbitmq. The message does not appear to be malicious, rather there is something malformed in an otherwise legit looking post that causes the queue to die. I am working with the mbin team to track down what it is about the messages that causes the problem, but sadly until I there is a fix, this is going to keep happening

  • Nougat@fedia.io
    link
    fedilink
    arrow-up
    6
    ·
    9 months ago

    Growing pains are to be expected. You’re probably aware that some people (myself included) are shifting here [from|in addition to] kbin.social; that extra load probably doesn’t help.

    • jerry@fedia.ioOPM
      link
      fedilink
      arrow-up
      1
      ·
      9 months ago

      Ah - that is what we’re here for. I know kbin has had a cloud of uncertainty around it. Did something recently happen on kbin.social?

      • Nougat@fedia.io
        link
        fedilink
        arrow-up
        3
        ·
        9 months ago

        Ernest made a post today, yes, but kbin.social has reached a point which demands a next level of administration (from both technical and non-technical perspectives). While I want that project to thrive, there is writing on the wall which unfortunately cannot be ignored.

  • jerry@fedia.ioOPM
    link
    fedilink
    arrow-up
    4
    ·
    9 months ago

    The server is busily processing the 1,200,000 messages that queued up over the past 20 hours. It’s died 3 times in the past few minutes, so I’m not optimistic about how long this will take

  • jerry@fedia.ioOPM
    link
    fedilink
    arrow-up
    4
    ·
    9 months ago

    The good news is that I think I figured out where the problematic messages are coming from. Now I have to figure out what it is about them.

      • jerry@fedia.ioOPM
        link
        fedilink
        arrow-up
        1
        ·
        9 months ago

        it took 3 days to process the backlog, but it’s caught up now and I’ve not seen any re-occurrence of the prior problem.