A new AI model called Step 3.7 Flash has entered the scene, packing 198 billion parameters and the ability to reason across text, images, and video. The model is optimized for NVIDIA GPUs, a clear sign that performance and hardware compatibility were key design goals.
Why 198 Billion Parameters Matters
Parameter count is a rough measure of a model's capacity. At 198 billion, Step 3.7 Flash sits among the largest open-weights models available. That scale lets it handle complex reasoning tasks — but it also demands serious compute power. The optimization for NVIDIA GPUs suggests the developers targeted high-throughput inference and training on widely available hardware.
Multimodal Capabilities: Text, Image, and Video
Step 3.7 Flash isn't just a language model. It's multimodal, meaning it can process and connect information from text, images, and video in a single pass. That opens up applications like visual question answering, video summarization, and mixed-media content analysis. The model doesn't need separate pipelines for each format — it reasons across all three natively.
Optimized for NVIDIA GPUs
The model's optimization for NVIDIA GPUs points to practical deployment. Many AI teams rely on NVIDIA's CUDA ecosystem and Tensor Core architecture. Step 3.7 Flash likely exploits those features for faster inference and lower memory use. That could make it more accessible to researchers and companies already running NVIDIA hardware.
Who built Step 3.7 Flash and when it will be available aren't clear from the details released so far. But the specs alone — 198B parameters, full multimodal reasoning, GPU tuning — signal a serious contender in the large-model race.




