Abstract. The continuous growth of unstructured textual data volumes in aircraft maintenance systems creates a demand for automated analysis methods. Traditional defect categorization using standard codes is often insufficiently detailed to reveal the true root causes of failures, as routine operations and critical malfunctions are frequently combined within a single system category. The subject of this study is the semantic structure of textual fault descriptions related to the exterior lighting system (ATA 33–40) of an aircraft fleet of a single model. The objective of the study is to develop and validate a method for the automatic identification of latent operational patterns and failure modes without the use of labeled data. The methodological foundation of the research is probabilistic topic modeling based on the Latent Dirichlet Allocation (LDA) algorithm. To improve model quality, a specialized text preprocessing procedure was implemented, including the expansion of industry-specific abbreviations and the removal of contextual noise. The optimal model configuration was determined through quantitative analysis of the topic coherence metric (Cv) and an assessment of topic semantic stability. Experimental results show that a six-topic model provides the highest level of interpretability. Analysis of the resulting clusters made it possible to identify design-related defect occurrence zones and to classify failures according to their manifestation type. Latent subgroups corresponding to electrical circuit failures and mechanical damage to structural components were automatically identified. The proposed approach enables the transformation of unstructured maintenance personnel records into detailed diagnostic information, thereby creating opportunities to improve maintenance programs and to transition toward predictive reliability management of specific aircraft subsystems.
Keywords: aircraft, maintenance, external lighting, textual descriptions, natural language processing, topic modeling, Latent Dirichlet Allocation.