📚

Academic Contributions

Research & Publications

Explore our contributions to the field of AI research through academic papers, conference presentations, and technical reports

Filter by Year

Filter by Topic

Multilingual AIDocument RetrievalMultimodal AIVision-Language Models2025

M3DR: Towards Universal Multilingual Multimodal Document Retrieval

Adithya S Kolavi, Vyoman Jain

arXiv 2025

M3DR introduces a comprehensive framework for multilingual multimodal document retrieval, achieving 152% improvement over baselines on cross-lingual retrieval. We release NetraEmbed and ColNetraEmbed, two 4B parameter models supporting 22 languages with state-of-the-art performance across diverse script families including Latin, Devanagari, Dravidian, CJK, and more.

DatasetVision-Language ModelsDocument UnderstandingMultilingual AI2025

Nayana: A Foundation for Document-Centric Vision-Language Models via Multi-Task, Multimodal, and Multilingual Data Synthesis

Adithya S Kolavi, Samarth P, Vyoman Jain

ICCV 2025 Workshops | Computer Vision for Document Analysis and Classification (CV4DC)

Nayana presents a comprehensive synthetically generated dataset of 3 million document images with hierarchical annotations for training document-centric vision-language models. The dataset spans 22 languages and enables multi-task learning across layout detection, OCR, document retrieval, and more, providing a foundation for universal document understanding.

OCRLow-Resource LanguagesDocument ProcessingIndic Languages2025

Nayana OCR: A Scalable Framework for Document OCR in Low-Resource Languages

Adithya Kolavi, Samarth P, Vyoman Jain

NAACL 2025 | Language Models for Underserved Communities (LM4UC)

We introduce Nayana, a scalable framework for adapting Vision-Language Models to low-resource languages using synthetic data generation and parameter-efficient fine-tuning. Using LoRA, we demonstrate effective adaptation across 10 Indic languages (Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu) without requiring extensive manually annotated datasets.

Vision-Language ModelsDocument UnderstandingMultimodal AIFoundation Models2025

ViViD - Vision Language model for Unified Visual Understanding of Documents

Adithya S Kolavi

CVPR 2025 | Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo 2025)

A vision-language model specifically optimized for document understanding tasks, capable of processing diverse document formats with high accuracy.

Coming Soon

Foundation ModelsMultilingualMultimodalMultitask Learning2025

Nayana - A Unified Foundation Model for Multilingual, Multimodal, and Multitask Intelligence

Adithya S Kolavi, Samarth P, Vyoman Jain

LlamaCon 2025 | LLama Impact Grant 2024 winner

Winner of the 2024 Llama impact grant from Meta, this paper presents a foundation model architecture designed for multilingual and multimodal applications.

Coming Soon

Automated PlanningWeb NavigationLLM ApplicationsAutonomous Systems2024

CAPTAIN: Continuous Automated Planning Through Autonomous Internet Navigation

Adithya S Kolavi

AAAI 2025 | Large Language Models for Planning (LM4Plan)

A novel framework for autonomous web navigation and task planning using large language models to perform complex multi-step operations.

Opportunities

Join our team or support our research

Join our Team

Application Process

Please fill out the form below to show interest in our open positions. We will review your application and get back to you within 2-3 weeks.

Apply Now

Support Our Research

If you like our research and would like to sponsor our projects and open source initiatives, please get in touch. Your sponsorship will greatly help us continue developing innovative solutions and advancing the field of AI.

✓
Support cutting-edge AI research
✓
Contribute to open source development
✓
Help make AI accessible to everyone

Contact Research Team