Theory, Model, Method

The collection of authentic primary data: Corpus Linguistics

The systematic collection actual linguistic data from texts, conversational situations in every day or institutional contexts intends to construct a highly reliable data basis of authentic (truly used and thus perceived) data. The result consists of data samples or a complete linguistic corpus.

Statements on the linguistic property of the forms or relations between them are possible on examining the whole corpus or sample. That is, a hypothesis is supported by the actual occurrences in that corpus, but not by a linguist’s judgment on the occurrences. The reliance on intuitions is replaced by the exclusive dependence on authentic data. One problem may be that now possible (non-authentic) and ill-formed (ungrammatical) linguistic forms are excluded from the linguistic view, because the restriction to authenticity as the only criterion of validity for hypothesis construction marks such data types as irrelevant, even not allowed in theory construction. It also pre-assumes that real inductive methods are possible, that is, one can do without pre-knowledge.

Therefore, recent approaches combine methods of elicited intuitions, observations of any kind with the construction of a corpus.

The most promising trend in corpus linguistics focuses on the use of electronic media in order to construct and analyse language. Therefore, the term corpus linguistics is often narrowed down to the use of electronic corpora.

