• xmunk@sh.itjust.works
    link
    fedilink
    arrow-up
    4
    ·
    7 months ago

    For bonus points, this failure is in a cron job that sends out recently queued messages. It runs once every ten minutes - last weekend we had 12 failures: four were in a cluster on their own, one was in a run of two, and six were in a single continuous run.

    Please note that this server is unused by our business so no messages ever get naturally queued. Every day we sync the live production server to this server at about 9 PM - assuming an employee was queuing up a message before the snapshot is taken there might be a number of unsent messages in the snapshot - those messages will all be sent by the first cron job after the sync.

    It is a wonderfully awful problem that has me wanting to pull out my luscious locks.

      • xmunk@sh.itjust.works
        link
        fedilink
        arrow-up
        4
        ·
        7 months ago

        Yup, luck is appreciated and I’m trying to get more eyes but unfortunately I’m a senior dev that has the second highest seniority at the company so I feel guilty dragging others into it.

        • Sacreblew
          link
          fedilink
          arrow-up
          3
          ·
          7 months ago

          Lots of logging to triangulate when it fails and what variables it has at the time.