this is quite worrying for OpenAI as the rate token prices have been plummeting thanks to Meta and its going to have to keep cutting its prices while capex remains flat. whatever Sam says in interviews just think the opposite and the whole picture comes together.<p>It's almost a mathematical certainty that people who invested in OpenAI will need to reincarnate in multiple universes to ever see that money again but no bother many are probably NVIDIA stock holders to even out the damage.
I don't need exact results. FP8 quantization is almost lossless and even 6-bit quantization is usually acceptable. Can this be combined with quantization?
Other than portability and privacy, are there any benefits to running a local model with a 4090, versus running the same model on-demand on a cloud service with the same or more powerful card?
Is it me or is this paper basically missing all technical information?<p>I get that Therese proprietary technology, but if so, can we please not put this on arxiv and pretend it’s a scientific contribution?