Research &
Explore our contributions to the field of AI research through academic papers, conference presentations, and technical reports
Filter by Year
Filter by Topic

M3DR: Towards Universal Multilingual Multimodal Document Retrieval
Adithya S Kolavi, Vyoman Jain
arXiv 2025
M3DR introduces a comprehensive framework for multilingual multimodal document retrieval, achieving 152% improvement over baselines on cross-lingual retrieval. We release NetraEmbed and ColNetraEmbed, two 4B parameter models supporting 22 languages with state-of-the-art performance across diverse script families including Latin, Devanagari, Dravidian, CJK, and more.

Nayana: A Foundation for Document-Centric Vision-Language Models via Multi-Task, Multimodal, and Multilingual Data Synthesis
Adithya S Kolavi, Samarth P, Vyoman Jain
ICCV 2025 Workshops | Computer Vision for Document Analysis and Classification (CV4DC)
Nayana presents a comprehensive synthetically generated dataset of 3 million document images with hierarchical annotations for training document-centric vision-language models. The dataset spans 22 languages and enables multi-task learning across layout detection, OCR, document retrieval, and more, providing a foundation for universal document understanding.

Nayana OCR: A Scalable Framework for Document OCR in Low-Resource Languages
Adithya Kolavi, Samarth P, Vyoman Jain
NAACL 2025 | Language Models for Underserved Communities (LM4UC)
We introduce Nayana, a scalable framework for adapting Vision-Language Models to low-resource languages using synthetic data generation and parameter-efficient fine-tuning. Using LoRA, we demonstrate effective adaptation across 10 Indic languages (Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu) without requiring extensive manually annotated datasets.

ViViD - Vision Language model for Unified Visual Understanding of Documents
Adithya S Kolavi
CVPR 2025 | Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo 2025)
A vision-language model specifically optimized for document understanding tasks, capable of processing diverse document formats with high accuracy.

Nayana - A Unified Foundation Model for Multilingual, Multimodal, and Multitask Intelligence
Adithya S Kolavi, Samarth P, Vyoman Jain
LlamaCon 2025 | LLama Impact Grant 2024 winner
Winner of the 2024 Llama impact grant from Meta, this paper presents a foundation model architecture designed for multilingual and multimodal applications.

CAPTAIN: Continuous Automated Planning Through Autonomous Internet Navigation
Adithya S Kolavi
AAAI 2025 | Large Language Models for Planning (LM4Plan)
A novel framework for autonomous web navigation and task planning using large language models to perform complex multi-step operations.
Opportunities
Join our team or support our research
Join our Team
Application Process
Please fill out the form below to show interest in our open positions. We will review your application and get back to you within 2-3 weeks.
Support Our Research
If you like our research and would like to sponsor our projects and open source initiatives, please get in touch. Your sponsorship will greatly help us continue developing innovative solutions and advancing the field of AI.
- βSupport cutting-edge AI research
- βContribute to open source development
- βHelp make AI accessible to everyone