gen‑ai.news
← Back
Image

New AI model called "Count Anything" does exactly what it says, and that's harder than it sounds

Counting objects in images sounds straightforward, but it has long been one of the more stubborn problems in computer vision. Earlier systems were typically trained for narrow domains - counting people in a crowd, for instance, or cells in a microscopy slide - and performed poorly when asked to generalize beyond those specific contexts. "Count Anything" is designed to break that constraint by accepting an open-ended text prompt as its only guidance, allowing a single model to handle a broad range of counting tasks without retraining or fine-tuning for each category.

According to the researchers, the model achieves roughly half the error rate of previous general-purpose counting systems on standard benchmarks. That is a meaningful improvement, since counting accuracy tends to degrade quickly as scenes become more complex or the target objects vary in size, occlusion, and appearance. The text-prompt approach means a user can simply describe what they want counted - "white blood cells," "people waiting in line," "cars in a parking lot" - and the model attempts to locate and tally each instance accordingly.

The underlying approach likely draws on the growing body of work that combines vision encoders with language models, allowing visual understanding to be steered by natural language descriptions rather than fixed category labels. This kind of open-vocabulary design is increasingly common in object detection and segmentation, and applying it to counting is a logical extension. The challenge is that counting demands not just identifying that something is present, but precisely localizing and distinguishing every individual instance - a harder requirement than simple classification or detection.

Despite the progress, the model has clear limits. Very dense configurations - tightly packed crowds or overlapping cells - still produce higher error rates, which is consistent with the difficulty of separating individual instances when they occlude one another heavily. Ambiguous or abstract text prompts also cause problems, since the model must interpret what the user means before it can begin counting. These limitations suggest that "Count Anything" is a solid step forward for general-purpose visual counting rather than a finished solution, and the domain will likely see continued iteration as training data and architecture choices improve.

Enjoy this story? Get the next one in your inbox.

Twice a week: the most important stories in generative image and video AI, distilled into a 2-minute read.

Free. Unsubscribe any time. No spam, ever.

Your next read

Image

The EU doesn't really know what a deepfake is, and that's becoming a problem for retail

A major European retail trade group is pushing back against the EU AI Act's transparency requirements, arguing that AI-generated product imagery - think a sofa in a computer-generated living room - should not be classified alongside deepfakes. The dispute exposes a genuine ambiguity in the regulation's language that has real consequences for how online retail operates. With platforms like Zalando reporting that 90 percent of their marketing content is already AI-generated, the stakes are signifi

Image

Adobe’s redesigned AI studio remembers what your creations look like

Adobe is rolling out a redesigned Firefly AI studio in private beta, bringing editing and image generation into a single interface. A key addition is the ability to save named visual elements - characters, objects, and backgrounds - so they can be reused consistently across projects without drifting in appearance.

Image

Adobe brings its Firefly AI Assistant inside of Premiere, Photoshop and Illustrator

Adobe has integrated its Firefly AI assistant directly into Premiere Pro, Photoshop, and Illustrator, bringing generative AI tools into the core workflow of its most widely used creative applications. Rather than requiring users to switch between separate tools or platforms, the assistant is now accessible from within each app. The move reflects Adobe's ongoing effort to embed AI capabilities at the point where creative work actually happens.