Nvidia's Nemotron-Cascade 2 is a 30B MoE model that activates only 3B parameters at inference time, yet achieved gold ...
We recommend using virtual environments since the code relies on some deprecated features of the dependent libraries. This codebase primarily relies on JAX and the equinox library for computation and ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results