Estimation of English and non-English Language Use on the WWW

06/23/2000
by   Gregory Grefenstette, et al.
0

The World Wide Web has grown so big, in such an anarchic fashion, that it is difficult to describe. One of the evident intrinsic characteristics of the World Wide Web is its multilinguality. Here, we present a technique for estimating the size of a language-specific corpus given the frequency of commonly occurring words in the corpus. We apply this technique to estimating the number of words available through Web browsers for given languages. Comparing data from 1996 to data from 1999 and 2000, we calculate the growth of a number of European languages on the Web. As expected, non-English languages are growing at a faster pace than English, though the position of English is still dominant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2020

Mapping Languages: The Corpus of Global Language Use

This paper describes a web-based corpus of global language use with a fo...
research
05/08/2016

A corpus of preposition supersenses in English web reviews

We present the first corpus annotated with preposition supersenses, unle...
research
03/12/2020

It Means More if It Sounds Good: Yet Another Hypotheses Concerning the Evolution of Polysemous Words

This position paper looks into the formation of language and shows ties ...
research
12/21/2022

Universal versus system-specific features of punctuation usage patterns in major Western languages

The celebrated proverb that "speech is silver, silence is golden" has a ...
research
07/03/2017

The Fall of the Empire: The Americanization of English

As global political preeminence gradually shifted from the United Kingdo...
research
01/14/2020

Semi-automatic methods for adding words to the dictionary of VepKar corpus based on inflectional rules extracted from Wiktionary

The article describes a technique for using English Wiktionary inflectio...
research
03/30/2017

Neutral evolution and turnover over centuries of English word popularity

Here we test Neutral models against the evolution of English word freque...

Please sign up or login with your details

Forgot password? Click here to reset