1 million token context window. Multi modal inputs. This must be costing Google a fortune to offer free inference with a window of this size. ~200mm USD training costs alone by recent estimates from Stanford(if memory serves).<p>I'd recommend throwing some thick documentation at it. Images must be uploaded separately. If you use the full window, expect lengthy inference compute times. I've been highly impressed so far. Greatly expands capability for my daily use cases. They say they've stretched it to 10M in research.