This note is just for my “bookmark”, specially about information extraction. I take this from “Information Extraction: Algorithms and Prospects in a Retrieval Context” books by Marie-Francine Moens, p. 70 – 71.
A classification scheme describes the semantic distinctions that we want to assign to the information units and to the semantic relations between these units. The set can have the form of a straight list, for instance, when we define a list of named entity classes to be identified in a corpus. Or, the scheme can be characterized by its own internal structure. It might
represent the labels that can be assigned to entities or processes (the entity classes), the attribute labels of the entity classes, the subclasses and the semantic relations that might hold between instances of the classes, yielding a real semantic network. In addition, this scheme preferably also integrates the constraints on the allowable combinations and dependencies of the semantic labels.
Semantic labels range from generic labels to domain specific labels. One can define all kinds of semantic labels to be assigned to information found in a text that is useful in subsequent information processing tasks such as information retrieval, text summarization, data mining, etc. Their definition often relies on existing taxonomies that are drafted based on linguistic or cognitive theories or on natural relationships that exist between entities. In case of a domain specific framework of semantic concepts and their relations we often use the term ontology.