• howrar
    link
    fedilink
    arrow-up
    6
    ·
    6 months ago

    This article got me curious about how these 1-bit models worked so I read up on it a bit.

    https://arxiv.org/html/2402.11295v3

    The model parameters aren’t completely converted to 1-bit. It’s decomposed into a sign matrix (the 1-bit part) and two full precision vectors which together make a rank 1 approximation of the original matrix. So if I understand correctly, this means everything still functions the same way as a regular transformer. Input vectors, intermediate values, and outputs, all are full precision and have no problem going through nonlinearities.