Jevon’s paradox says that efficiency will increase use. Microsoft is buying a nuclear power plant for AI shit. If they can train a trillion-parameter model for one-thousandth the cost… they will instead train a quadrillion-parameter model.
Or I guess if they’re smart they’ll train a trillion-parameter model longer. Or iterate like crazy, when training takes hours instead of months.
https://www.microsoft.com/en-us/research/publication/bitnet-scaling-1-bit-transformers-for-large-language-models/ use 1 bit instead of 8 or 16, yay performance gainz
Jevon’s paradox says that efficiency will increase use. Microsoft is buying a nuclear power plant for AI shit. If they can train a trillion-parameter model for one-thousandth the cost… they will instead train a quadrillion-parameter model.
Or I guess if they’re smart they’ll train a trillion-parameter model longer. Or iterate like crazy, when training takes hours instead of months.
So will the return of the flag conclude the adventures of ressource usage in computers?