Download Natural Language Processing for Historical Texts (Synthesis by Michael Piotrowski PDF

By Michael Piotrowski

An increasing number of old texts have gotten on hand in electronic shape. Digitization of paper files is prompted by way of the purpose of retaining cultural history and making it extra obtainable, either to laypeople and students. As electronic photographs can't be sought for textual content, digitization tasks more and more attempt to create electronic textual content, which might be searched and differently immediately processed, as well as facsimiles. certainly, the rising box of electronic humanities seriously is dependent upon the supply of electronic textual content for its studies.

Together with the expanding availability of historic texts in electronic shape, there's a turning out to be curiosity in using ordinary language processing (NLP) tools and instruments to ancient texts. notwithstanding, the explicit linguistic houses of ancient texts—the loss of standardized orthography, in particular—pose designated demanding situations for NLP.

This ebook goals to provide an creation to NLP for ancient texts and an summary of the cutting-edge during this box. The publication starts off with an outline of equipment for the purchase of historic texts (scanning and OCR), discusses textual content encoding and annotation schemes, and provides examples of corpora of old texts in quite a few languages. The booklet then discusses particular tools, equivalent to growing part-of-speech taggers for old languages or dealing with spelling edition. a last bankruptcy analyzes the connection among NLP and the electronic humanities.

Certain lately rising textual genres, corresponding to SMS, social media, and chat messages, or newsgroup and discussion board postings proportion a couple of homes with old texts, for instance, nonstandard orthography and grammar, and profuse use of abbreviations. The equipment and methods required for the powerful processing of old texts are hence additionally of curiosity for examine in different domain names.

Show description

Read or Download Natural Language Processing for Historical Texts (Synthesis Lectures on Human Language Technologies) PDF

Best historical books

Death Echo

Big apple instances bestselling writer Elizabeth Lowell cuts a brand new aspect in suspense with an exhilarating story of ardour and foreign intrigue. Emma go deserted the blood, guilt, and tribal wars of CIA lifestyles for the elite safety consulting company St. Kilda's. Now she's monitoring the yacht Blackbird, believed to be wearing a deadly shipment that would smash a big American urban .

Methodological and Historical Essays in the Natural and Social Sciences

Modem philosophy of technology has became out to be a Pandora's field. as soon as opened, the confusing monsters seemed: not just used to be the neat constitution of classical physics significantly replaced, yet quite a few broader questions have been set free, referring to the character of medical inquiry and of human wisdom ordinarily.

A companion to the era of Andrew Jackson

A better half to the period of Andrew Jackson bargains a wealth of latest insights at the period of Andrew Jackson. This choice of essays by way of top students and historians considers quite a few facets of the existence, occasions, and legacy of the 7th president of the U.S.. offers an summary of Andrew Jackson's lifestyles and legacy, grounded within the newest scholarship and together with unique examine unfold throughout a couple of thematic parts gains 30 essays contributed by means of major students and historians Synthesizes the main updated scholarship at the political, financial, social, and cultural facets of the Age of Andrew Jackson

Extra info for Natural Language Processing for Historical Texts (Synthesis Lectures on Human Language Technologies)

Sample text

But digitization of historical texts is not only of interest to us because it provides NLP with the necessary “raw material,” but there are two further aspects. First, as the conversion from one medium to another is not lossless and bound to introduce errors, the digitization process and the quality of its results has a direct impact on subsequent natural language processing. com/community/site_blogs/b/search/archive/2008/05/23/book-search-winding-down. 1. DIGITIZATION OF HISTORICAL TEXTS 27 Second, NLP resources and tools may also play a role during the acquisition and preparation of historical texts, in particular in OCR and OCR post-processing.

FineReader has a built-in model for modern (monotonic) Greek, which can be used in conjunction with the user-trained model; it was found that using the built-in model increases the recognition accuracy for unaccented characters, but decreases the accuracy for characters with diacritics. Thus, FineReader was evaluated both with and without the built-in model. OCR output was then post-processed with two scripts. , when a Greek omicron is replaced by the nearly identical Latin letter o) or spaces followed by punctuation marks.

2009b, p. 198). , 2010, p. 64). through from the reverse page. This illustrates that there exists no “one-size-fits-all” OCR solution for historical documents. Based on the Australian National Library’s experiences with the digitization of historical newspapers, Holley (2009a) discusses 13 approaches for improving OCR results. , the use of grayscale vs. bilevel files for OCR or the deskewing of images, so that text lines are as horizontal as possible). Points 10 to 13 concern OCR and NLP technology; Holley considers using more than one OCR system and voting to pick the best results, using special dictionaries in the OCR process, manual correction of OCR-produced text, and using language modeling during or after OCR processing.

Download PDF sample

Rated 4.66 of 5 – based on 19 votes