Hostname: page-component-586b7cd67f-tf8b9 Total loading time: 0 Render date: 2024-12-01T00:51:31.799Z Has data issue: false hasContentIssue false

Roland Schäfer & Felix Bildhauer, Web Corpus Construction (Synthesis Lectures on Human Language Technologies 22). Morgan & Claypool, 2013. Pp. xv + 129.

Published online by Cambridge University Press:  19 November 2014

Mats Wirén*
Affiliation:
Department of Linguistics, Stockholm University, SE 106 91 Stockholm, Sweden. [email protected]
Get access

Abstract

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Book Review
Copyright
Copyright © Nordic Association of Linguistics 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Alpert, Jesse & Hajaj, Nissan. 2008. We knew the web was big. . . http://googleblog.blogspot.se/2008/07/we-knew-web-was-big.html.Google Scholar
Baroni, Marco, Bernardini, Silvia, Ferraresi, Adriano & Zanchetta, Eros. 2009. The WaCkyWide Web: A collection of very large linguistically processed webcrawled corpora. Language Resources & Evaluation 43, 209226.CrossRefGoogle Scholar
Biemann, Chris, Bildhauer, Felix, Evert, Stefan, Goldhahn, Dirk, Quasthoff, Uwe, Schäfer, Roland, Simon, Johannes, Swiezinski, Leonard & Zesch, Torsten. 2013. Scalable construction of high-quality web corpora. Journal for Language Technology and Computational Linguistics, 28 (2), 2359.CrossRefGoogle Scholar
Fletcher, William H. 2013. Corpus analysis of the World Wide Web. In Chapelle, Carol A. (ed.), The Encyclopedia of Applied Linguistics, vol. 3, 1339–1347. Oxford: Wiley-Blackwell.Google Scholar
Hundt, Marianne, Nesselhauf, Nadja & Biewer, Carolin (eds.). 2007. Corpus Linguistics and the Web. Amsterdam: Rodopi.CrossRefGoogle Scholar
Kilgarriff, Adam. 2001. Comparing corpora. International Journal of Corpus Linguistics 6 (1), 97133.CrossRefGoogle Scholar
Kilgarriff, Adam. 2007. Googleology is bad science. Computational Linguistics 33 (1), 147151.Google Scholar
Kilgarriff, Adam & Grefenstette, Gregory. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics 29 (3), 333347.CrossRefGoogle Scholar
Loftsson, Hrafn & Östling, Robert. 2013. Tagging a morphologically complex language using an Averaged Perceptron Tagger: The case of Icelandic. 19th Nordic Conference of Computational Linguistics (NODALIDA), 105119. Linköping: Linköping University Electronic Press.Google Scholar
Nivre, Joakim, Hall, Johan, Nilsson, Jens, Chanev, Atanas, Eryigit, Gülsen, Kübler, Sandra, Marinov, Svetoslav & Marsi, Erwin. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13, 95135.CrossRefGoogle Scholar
Östling, Robert. 2013. Stagger: An open-source part of speech tagger for Swedish. Northern European Journal of Language Technology (NEJLT) 3, 118.Google Scholar
Renouf, Antoinette, Kehoe, Andrew & Banerjee, Jayeeta. 2007. WebCorp: An integrated system for web text search. In Hundt et al. (eds.), 2007, 47–67.Google Scholar
Schäfer, Roland & Bildhauer, Felix. 2012. Building large corpora from the web using a New Efficient Tool Chain. The Eighth International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 486–493.Google Scholar
Suchomel, Vít & Pomikálek, Jan. 2012. Efficient web crawling for large text corpora. The Seventh Web as Corpus Workshop (WAC), Lyon, France, 39–43.Google Scholar