Evaluating Open-Source Large Language Models for Technical Telecom Question Answering

Mon, 08 Dec 2025·
Arina Caraus
,
Alessio Buscemi
,
Sumit Kumar
Ion Turcanu
Ion Turcanu
· 0 min read
Abstract
Large Language Models (LLMs) have shown remarkable capabilities across various fields. However, their performance in technical domains such as telecommunications remains underexplored. This paper evaluates two open-source LLMs, Gemma 3 27B and DeepSeek R1 32B, on factual and reasoning-based questions derived from advanced wireless communications material. We construct a benchmark of 105 question–answer pairs and assess performance using lexical metrics, semantic similarity, and LLM-as-a-judge scoring. We also analyze consistency, judgment reliability, and hallucination through source attribution and score variance. Results show that Gemma excels in semantic fidelity and LLM-rated correctness, while DeepSeek demonstrates slightly higher lexical consistency. Additional findings highlight current limitations in telecom applications and the need for domain-adapted models to support trustworthy Artificial Intelligence (AI) assistants in engineering.
Type
Publication
IEEE Global Communications Conference Workshops (GLOBECOM Workshops 2025)