Last month, Vana mainnet launched, a major milestone for the first open protocol for AI data sovereignty. On Vana, we’ve seen dozens of DataDAOs get started, enabling users to pool and monetize data ranging from Twitter and sleep metrics to web analytics and genetic information.
With Vana, users can reclaim sovereignty of their data, permissioning it with any EVM-compatible wallet, just like their funds. By joining DataDAOs, individuals write their data to Vana controlled by their wallet, then delegate access to the DataDAO, making it available for AI model training while maintaining control. These collectives aggregate valuable training data (which is only useful as sufficient user scale, as a single data point isn’t enough to train an AI model on) and ensures users reclaim their economic stake in the AI economy.
Over the holidays, I've been posting on X about my wishlist for DataDAOs in 2025. Here's a roundup of some of my favorites:
Health 
There's a big opportunity for health-focused DataDAOs spanning medical records, DEXA scans, fitness tracking (Strava, Oura rings), nutrition, and beyond. While these valuable datasets are currently siloed behind regulatory constraints and platform walls, DataDAOs could unlock their potential for AI training.
With sufficient scale, these datasets could power models that predict health risks and improve preventative care. The impact could extend beyond individual optimization to breakthrough applications like understanding rare diseases and predicting treatment responses. DataDAOs could start by focusing on specific data types, like medical imaging, while collaborating with other DataDAOs that aggregate complementary data like sleep patterns - all connected through users' wallets.
I've been interested in this space for a while and had built a Jupyter notebook to analyze my sleep and exercise data. I often ran into limitations of not having enough data to correct for trends unrelated to changes in my behavior. Having access to a much larger dataset would lead to much better insights at both the individual and community level. 
Research
OpenAI's O3 showed what's possible with specialized data from mathematicians - they collected data from 60 mathematicians - now imagine scaling this across fields, and giving PhD students ownership of the AI they create. Someone could build a DataDAO built from PhD research contributions, giving models access to deeper and more niche datasets from which to derive training data.
In fields like physics, advanced math, engineering, biology, chemistry, and other STEM fields where novel research is advanced periodically, having the most up-to-date training data is a clear advantage. While scientific papers are often stuck behind paywalls, researchers could directly contribute not just their published work, but also their drafts, notes, and research-in-progress through DataDAOs - unlocking important knowledge that rarely makes it into formal publications. Much like human learning benefits from seeing both successes and failures, training AI on both successful research and unsuccessful attempts could lead to more robust and nuanced AI models.
Self-Driving Cars
DePIN and DataDAOs are a natural pair: individuals are ideally-positioned to collect relevant data about the physical world. One big area where we could see a DataDAO form is around self-driving car data.
Tesla has a huge training data advantage with 5M+ cars collecting data for self-driving AI, making it hard for other companies to catch up. Tesla owners, along with other electric car drivers could form a DataDAO to pool video recordings as training data to sell to others. I think the parts of the dataset that would be most important to incentivize are the edge cases that self-driving cars still need to learn from: driving near construction sites, encountering bad weather conditions, and other training data representing unusual situations for a self-driving car.
Home Mapping
Another DePIN use-case for which DataDAOs are well-positioned is in home-mapping. Roomba vacuums sit on a huge amount of floor plan and usage data, which is why Amazon tried to buy them for $1.4B (this acquisition ended up getting blocked). With a home-mapping DataDAO, users could monetize this data for furniture design and robotics training, while preserving their own privacy. This extends beyond just floor plans - smart home devices like Ring doorbells capture valuable data about how we interact with our spaces. As home robotics evolve to handle more household tasks, they'll need diverse training data showing how people navigate and use their homes. A DataDAO could help users securely monetize this rich data while maintaining control over their privacy.
Screen Recordings
Many people are already users of screen recording tooling like Rewind AI or meeting recording tooling like Read AI. We could use recording data to build a DataDAO that teaches AI agents how to independently do knowledge work, and reward contributors for their submissions. 
Language and Translation
Language and audio DataDAOs represent a major opportunity in 2025. Users could contribute text conversations from their phones that aren't available on the public internet, providing AI models with authentic examples of casual language. The opportunity extends to voice translation as well. Current models typically convert speech to text, translate the text, then convert back to speech - a process that can lose important nuances. With enough training data from a voice-focused DataDAO, we could build models that translate directly from speech to speech, preserving tone, emotion, and natural speaking patterns. This would enable more accurate and natural-sounding translations.
Build on Vana
Interested in building one of these DataDAOs, or want to start something that isn’t on this list? Get in touch to learn more about joining the Vana Foundation’s accelerator, and check out the Vana docs here. If you have any questions on a specific DataDAO, feel free to DM me on X. We’re excited to see what you launch in 2025!
.png)

.png)