Tibetan Chatbot
To help learn a new language using chat interfaces, let's use Apertus to build a RAG based on a dataset of linguistic references.
The aim is to develop a user-friendly chat interface that facilitates learning the Tibetan language. By leveraging Apertus, the Swiss LLM, and building a Retrieval-Augmented Generation (RAG) system, we have created an interactive tool that enhances language acquisition through engaging conversations.

Screenshot of our prototype UI, running a chat interface with a question and answer in Tibetan.
Challenge
Current practices in language learning often rely on static resources and lack interactive elements. Foundation models in this area typically focus on translation and basic grammar, but there is a gap in creating dynamic, conversational learning experiences. The state-of-the-art involves using large language models for text generation, but integrating these with specific language datasets and creating a user-friendly interface remains a frontier.
Activities in this project may include:
- Gather and preprocess Tibetan language datasets, ensuring they are clean and structured for model training.
- Train the Swiss LLM on the Tibetan dataset and integrate it with a RAG system to enhance conversational capabilities.
- Develop a prototype of the chat interface, focusing on user experience and interaction design.
- Conduct user testing to gather feedback and iterate on the prototype, ensuring it meets the learning needs of users.
- Document the process, outcomes, and create a presentation for the hackathon.
Resources
- Datasets: Tibetan language corpora, including textbooks, dictionaries, and conversational scripts.
- AI Models: Swiss LLM for language generation and RAG for enhancing conversational accuracy.
- Infrastructure: Cloud-based computing resources for model training and deployment.
- Tools: Programming languages (Python), frameworks (TensorFlow, PyTorch), and design tools (Figma, Adobe XD).
Team
I am knowledgeable in Tibetan language structure and existing learning apps, and am looking to build a team to include people:
- Experienced in handling and preprocessing language datasets.
- Interested in training and integrating large language models for educaiton.
- A designer who could help us in prototyping an intuitive and engaging user interface.
- Someone to help manage the team, and ensure a smooth collaboration.
Outputs and Outcomes
This project will promote open science by making the chatbot interface and underlying datasets publicly available. It will also foster responsible AI practices by ensuring fairness and inclusivity in language learning. The outcomes will catalyze a larger project focused on enhancing language learning experiences through AI, aligning with the goals of the Swiss AI Initiative.
Geographic Relevance
The Tibetan community in Switzerland began forming in the early 1960s and is now the largest Tibetan diaspora group in Europe. [Wikipedia]. The project aligns with the goals of the Swiss AI Initiative by leveraging Swiss AI models and promoting language learning. It has strategic importance for Swiss society by providing a tool for cultural and linguistic preservation. The impact extends to Europe and the world, offering a unique approach to language learning that can be adapted for other languages and cultures.
Ethics and Regulatory Compliance
Ethical considerations include ensuring the chatbot provides accurate and respectful language learning experiences. Compliance with legal and regulatory guidelines will be maintained by adhering to data privacy laws and ensuring the chatbot does not perpetuate biases or misinformation.
🅰️ℹ️ Generated with help of MISTRAL24B

🖼️ Part of a manuscript copy of a Vijaya in Tibetan - Public Domain
Screencast - Demo
Previous
Hackathon Bern