Non-uniform Language Detection in Technical Writing
Abstract
Technical writing in professional environments, such as user manual authoring, requires uniform language. Non-uniform language detection is a novel task, which aims to guarantee the consistency for technical writing by detecting sentences in a document that are intended to have the same meaning within a similar context but use different words/writing style. This thesis proposes an approach that utilizes text similarity algorithms at lexical, syntactic, semantic and pragmatic levels. Different metrics are integrated by applying a machine learning classification method. We tested our method using smart phone user manuals, and compared the performance against the state-of-the-art methods in related area. The experiments demonstrate our approach is the most efficient solution to date.