A recent paper compares two ways of getting AI models to "think" through hard problems. One is the classic chain-of-thought approach where the model writes out its reasoning steps in words. The other is latent thought where the model does the extra thinking internally, in its hidden states, without spitting out tokens. The authors did a rigorous theoretical analysis plus some experiments, and there are a couple of interesting takeaways.
If a problem can be split up into independent pieces that get combined later like evaluating a big math expression, checking if two nodes in a graph are connected, or computing edit distance, latent thought can process all pieces at the same depth in one shot. Chain-of-thought, on the other hand, has to go step by step through every single operation, which takes a lot more steps.
The paper proves this by connecting these reasoning styles to circuit complexity classes. Essentially, latent thought with a small number of loops can simulate deep parallel circuits, while chain-of-thought with the same small number of steps can't. So for highly parallel tasks, you'd rather have the model think silently in its embeddings than write a long chain of words.
The flip side is that chain-of-thought can do something latent thought can't which is that it can use stochastic decoding. Every time the model writes a new token, it's making a random choice based on probabilities. This allows chain-of-thought to run randomized algorithms that estimate hard counting problems, like figuring out how many ways there are to satisfy a DNF logic formula or sampling random graph colorings. Latent thought's internal steps are deterministic, so it can't inject that kind of randomness. The paper proves that under standard assumptions, there are approximate counting and sampling tasks where chain-of-thought has a provable advantage.
They tested on algorithmic tasks such as word problems in group theory, graph connectivity, arithmetic expression evaluation, edit distance. Latent thought reached high accuracy with far fewer iterations than chain-of-thought in all of them. For example, on connectivity, a looped transformer with 2 loops got 80% while CoT needed way more steps to catch up. However, on approximate counting and sampling tasks, chain-of-thought could estimate values and generate samples close to the true distribution, while latent thought just couldn't match that because it lacked the stochastic component.
So the core take away is that there's no universal best approach. If your problem is parallelizable, latent thought is dramatically faster in terms of reasoning iterations. If your problem needs randomized approximation, chain-of-thought is the way to go.