Theory, Model, Method

Corpus Linguistics

  • Definition
    Corpus Linguistics refers to computer-aided language description on the basis of text corpora (formerly known as Linguistic data processing on account of: corpora compilation through the use of computers).

  • Features
    • Main objects of study: concordance, thesaurus research, automatic content analysis and speech classification, collocations etc.
    • An exclusively quantitative approach, basically a method within unclear scientific-theoretical premises: purely data-oriented (more data = better data!), corpus-focused, based on observation.
    • No more context-isolated data
    • Hypotheses are only possible if based on data (the "impossible" data is missing. Corpus Linguistics does away with the ungrammatical sentences of Generative Linguistics, for example).

Corpus Linguistics deals with concrete Descriptivism associated with different compatible directions (thus different from the approach adopted by American Structuralism.)

It is rather a means than an autonomous approach to describe languages (no autonomous description languages such as syntactical or semantical categories or principles.)

Corpus Linguistics is close to Computational Linguistics: computer as a useful tool for data collection and research.
(Difference: Computational Linguistics as a scientific discipline of computer processing of language: parsing, data representation, artificial intelligence, machine translation etc.)

  • Text Corpora
    • Survey of English Usage
    • Lancaster-Oslo-Bergen Corpus
    • Brown Corpus
    • British National COrpus

  • Method