Evaluating Open-Source Large Language Models for Technical Telecom Question Answering

Mon, 08 Dec 2025·

Arina Caraus

Alessio Buscemi

Sumit Kumar

Ion Turcanu

· 0 min read

PDF Cite

Abstract

Large Language Models (LLMs) have shown remarkable capabilities across various fields. However, their performance in technical domains such as telecommunications remains underexplored. This paper evaluates two open-source LLMs, Gemma 3 27B and DeepSeek R1 32B, on factual and reasoning-based questions derived from advanced wireless communications material. We construct a benchmark of 105 question–answer pairs and assess performance using lexical metrics, semantic similarity, and LLM-as-a-judge scoring. We also analyze consistency, judgment reliability, and hallucination through source attribution and score variance. Results show that Gemma excels in semantic fidelity and LLM-rated correctness, while DeepSeek demonstrates slightly higher lexical consistency. Additional findings highlight current limitations in telecom applications and the need for domain-adapted models to support trustworthy Artificial Intelligence (AI) assistants in engineering.

Type

Conference paper

Publication

IEEE Global Communications Conference Workshops (GLOBECOM Workshops 2025)