Swiss Localised LLM
It is easy to spot an AI generated text. Just look for the Eszett: ß. Germans love it and Swiss won't have it. This is just one example of how Swiss localisation in LLMs not given. There are word frequencies that favor the high German language. This is because most LLMs are fed with a lot German texts. Is this actually an issue? How can we untrain, retraint, train LLMs to actually "speak" Swiss german.
Design tools, methods, or datasets to enable AI to accurately “speak” Swiss German—both in text and potentially in voice—without losing general German comprehension. Possible approaches could include:
- Detect and quantify High German bias in AI-generated text (e.g., spelling, word frequency, syntax patterns).
- Develop fine-tuning or retraining methods to adapt LLM outputs to Swiss German orthography and vocabulary.
- Build Swiss German–specific datasets (text, transcripts, or parallel corpora) while respecting privacy and copyright.
- Create evaluation metrics for measuring Swiss German accuracy and cultural authenticity in AI.
Deliverables
- Working prototype (e.g., web tool, LLM fine-tuning script, Swiss German spell-checker).
- An evaluation of trade-offs (e.g., risk of overfitting to dialect vs. preserving general German fluency).
- Documentation explaining your approach and datasets.
See also:
Previous
Hackathon Bern