Apollo Project Bringing the Doctor to You: Medical AI in Your Language

The Breakdown

Imagine a world where you can access vital health information in your native language, regardless of where you live. A new project called Apollo is making this vision a reality by creating medical large language models (LLMs) that can understand and respond to queries in six of the world's most spoken languages: English, Chinese, Hindi, Spanish, French, and Arabic.

Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

Despite the vast repository of global medical knowledge predominantly being in English, local languages are crucial for delivering tailored healthcare services, particularly in areas with limited medical resources. To extend the reach of medical AI advancements to a broader population, we aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6.1 billion. This effort culminates in the creation of the ApolloCorpora multilingual medical dataset and the XMedBench benchmark. In the multilingual medical benchmark, the released Apollo models, at various relatively-small sizes (i.e., 0.5B, 1.8B, 2B, 6B, and 7B), achieve the best performance among models of equivalent size. Especially, Apollo-7B is the state-of-the-art multilingual medical LLMs up to 70B. Additionally, these lite models could be used to improve the multi-lingual medical capabilities of larger models without fine-tuning in a proxy-tuning fashion. We will open-source training corpora, code, model weights and evaluation benchmark.

arXiv.orgXidong Wang

This is a "game-changer" (I'm allowed to use the term this time, right?) for people in underserved communities who may not have access to English-language medical resources or qualified medical professionals. With Apollo, they can get answers to their health questions, learn about different treatments, and even translate medical documents into their own language.

Here's a breakdown of what Apollo brings to the table:

Multilingual Magic: Apollo speaks your language! By incorporating medical data from various sources in six different languages, Apollo can understand and respond to queries in a way that is culturally relevant and linguistically accurate.
Democratizing Healthcare: One of the biggest hurdles to good healthcare is access to information. Apollo breaks down this barrier by making medical knowledge more accessible, regardless of location or language spoken.
Small But Mighty: Apollo's medical LLMs are lightweight, meaning they can be used even in areas with limited computing resources. This makes Apollo perfect for deployment in remote areas where internet connectivity might be an issue.
Privacy First: Training large language models on sensitive medical data can be a privacy concern. Apollo uses a clever technique called "proxy tuning" to avoid this. It essentially trains a smaller model on a different dataset, and then uses that model to improve the performance of a larger, more general LLM on medical tasks. This way, the sensitive medical data stays private.

Paper Summary

Medical knowledge is crucial for delivering healthcare services, but most existing medical LLMs are in English or Chinese.
This work introduces Apollo, a series of multilingual medical LLMs that supports English, Chinese, Hindi, Spanish, French, and Arabic.
Apollo is trained on a newly created dataset named ApolloCorpora, which includes medical information from various sources like books, clinical guidelines, and online forums.
The researchers propose a new domain adaptation method to train Apollo models efficiently.
Benchmarking results show that Apollo models achieve the best performance among models of equivalent size.
Apollo models are lightweight, which makes them suitable for deployment in areas with limited resources.
The release of Apollo contributes to democratizing medical AI by making it more accessible to a broader population.

Powering Progress: The ApolloCorpora and XMedBench

The Apollo project rests on two foundational pillars: the ApolloCorpora and the XMedBench.

The ApolloCorpora is a comprehensive multilingual medical dataset that serves as the training ground for the Apollo LLMs. This dataset meticulously curates high-quality medical texts across different languages, encompassing medical books, papers, and even doctor-patient dialogues. By incorporating such a rich tapestry of resources, the ApolloCorpora ensures that the LLMs are not only conversant in medical terminology but also attuned to the nuances of each language's medical discourse.

The XMedBench, on the other hand, functions as a rigorous testing ground for the Apollo LLMs. This benchmark utilizes multiple-choice questions to assess the models' grasp of medical concepts, reasoning abilities, and ability to draw inferences across languages. The XMedBench's role is paramount in establishing the efficacy of the Apollo models and ensuring they are up to snuff for real-world medical applications.

Shining a Light: The Accomplishments of Apollo

The Apollo project has yielded remarkable results. The Apollo LLM series, ranging from 0.5B to 7B parameters, has consistently outperformed models of similar size in the XMedBench. The Apollo-7B model, in particular, stands out as the state-of-the-art multilingual medical LLM, exceeding even models with up to 70 billion parameters.

These breakthroughs demonstrate the immense potential of lightweight models like Apollo. Their ability to deliver exceptional performance paves the way for the integration of advanced medical AI directly into healthcare systems, particularly in regions with limited resources.

A Glimpse into the Future: The Far-Reaching Impact of Apollo

The Apollo project's significance extends far beyond its impressive performance benchmarks. By making medical knowledge more accessible in various languages, Apollo serves as a powerful tool for democratizing medical AI. This democratization has the potential to significantly improve the quality of care and patient outcomes on a global scale, especially in under-resourced areas.

Looking ahead, the project opens doors for exciting future research endeavors. Optimizing data sampling techniques, refining Proxy Tuning methods, and exploring the fusion of different language models are just a few possibilities. The open-sourcing of the ApolloCorpora and Apollo models further fuels collaboration and innovation within the global research community, paving the way for a future where healthcare is not only accessible but also equitable for all.

The Apollo project is a beacon, illuminating a path towards a future where language is no longer a barrier to vital medical knowledge. As we continue to explore and improve multilingual medical AI, we inch closer to a world where everyone has the opportunity to be informed, empowered, and ultimately, healthy.