Grundläggande textanalys
Poäng: 7,5 hp
Kursplan: 5LN443
Teachers: Marie Dubremetz,
Eva Pettersson
Nyheter
- Marie: Deadline för uppgift 1: den Torsdag 23 April. (2015-01-15)
- Marie: Choose which article you want to present before the 23rd April (we discuss it at the lab) (2015-04-13)
- Marie: Deadline för uppgift 2: den Torsdag 7 May. (2015-04-13)
- Marie: Update: FAQ lab1 and slides Morfologisk analys + HMM (2015-04-22)
- Marie: Update: Errata on Lab 2 (2015-04-29)
- Eva: Update: Lab 3 (2015-04-30)
- Marie:Lab 1 and Lab 2: send your correction by the 10th of June, more info in slides Management (2015-05-19)
Schema
Datum | Tid | Sal | Innehåll | Litteratur | |
---|---|---|---|---|---|
F1 |
1/4 |
10-12 |
Turing |
Introduktion (MD) |
|
F2 |
1/4 |
13-15 |
Turing |
Textsegmentering (MD) |
Mikheev |
F3 |
7/4 |
10-12 |
7-0015 |
Morfologisk analys (MD) |
J&M 3 |
L4 |
9/4 |
10-12 |
Chomsky |
Lab 1 |
|
F5 |
14/4 |
10-12 |
Turing |
N-gram-modeller (MD) |
J&M 4 |
F6 |
16/4 |
10-12 |
Turing |
Ordklasstaggning (MD) |
J&M 5 |
F7 |
21/4 |
10-12 |
Turing |
Markov-modeller (MD) | J&M 6 |
L8 |
23/4 |
10-12 |
Chomsky |
Lab 2 |
|
F9 |
5/5 |
10-12 |
Turing |
Språkgranskning 1 (EP) |
DB&M |
F10 |
7/5 |
10-12 |
Turing |
Språkgranskning 2 (EP) |
Knutsson, Birn |
L11 |
7/5 |
14-16 |
9-1070 |
Lab 3 |
|
F12 |
19/5 |
10-12 |
Turing |
Textanalys med XML + Management (MD) | Myer |
S13 |
21/5 |
10-12 |
Turing |
CANCELLED |
|
S15 |
28/5 |
10-12 |
Turing |
Redovisningar |
|
S16 |
28/5 |
13-15 |
Turing |
Redovisningar |
OBS: Chomsky = 9-2043, Turing = 9-2042
Innehåll
Kursen behandlar metoder för grundläggande textanalys upp till ordnivå, inklusive tokenisering, meningssegmentering, morfologisk analys och ordklasstaggning. Kursen tar också upp språkgranskning med tonvikt på stavningskontroll.Examination
Kursen examineras genom inlämingsuppgifter med både praktiska och teoretiska uppgifter samt ett muntligt och skriftligt referat. För betyget godkänt (G) krävs godkänt på samtliga uppgifter. För betyget väl godkänt (VG) krävs väl godkänt på minst två inlämningsuppgifter samt det skriftliga referatet.
Inlämingsuppgifter
Referat
Referatuppgiften består i att sammanfatta en vetenskaplig artikel muntligt på 5-10 minuter och skriftligt på 1-2 sidor. Artikeln väljs bland de förslag som finns under litteratur nedan. Det skriftliga referatet skickas som PDF till eva.pettersson@lingfil.uu.se senast den 5 juni.
Schema för muntlig redovisning:
- Torsdag 21/5 10-12: To be announced
- Torsdag 28/5 10-12: To be announced
- Torsdag 28/5 13-15: To be announced
Frågor att beakta i såväl det muntliga som det skriftliga referatet:
- Vilket språkteknologiskt problem behandlas i artikeln?
- Vilken metod används för att tackla detta problem?
- Hur förhåller sig metoden till tidigare arbeten på området?
- Hur visar författarna att metoden fungerar?
- Vad lärde du dig av att läsa artikeln?
- Skulle du rekommendera andra att läsa artikeln/använda metoden?
Litteratur
Gemensam litteratur:- Birn = Juhani Birn. 2000. Detecting Grammar Errors with Lingsoft's Swedish Grammar Checker. Proceedings of the Twelfth Nordic Conference in Computational Linguistics (NoDaLiDa), 28-40.
- DB&M = Markus Dickinson, Chris Brew and Detmar Meurers. 2013. Language and Computers. Wiley-Blackwell. 2013. Kapitel 2. [webbsida]
- J&M = Daniel Jurafsky and James H. Martin. 2009. Speech and Language Processing. Second Edition. Pearson Prentice-Hall. [webbsida]
- Knutsson = Ola Knutsson. 2001. Automatisk språkgranskning av svensk text. Licentiatavhandling, KTH. Kapitel 2: Utgångspunkter och angrepssätt för automatisk språkgranskning. Kapitel 3: Granskas regelspråk
- Mikheev = Andrei Mikheev. 2003. Text Segmentation. The Oxford Handbook of Computational Linguistics, 201-218
- Myer = Tom Myer. 2005. A Really, Really, Really Good Introduction to XML. Chapter 1. Excerpt from No Nonsense XML Web Development with PHP.
- [Erik F] Antti Arppe. 1999. Developing a Grammar Checker for Swedish. In NODALIDA '99: Proceedings from the 12th Nordic Conference on Computational Linguistics.
- [Shawnm] Kenneth R. Beesley. 1996. Arabic Finite-State Morphological Analysis and Generation. In Proceedings of the 16th International Conference on Computational Linguistics, 89-94.
- [Madeleine] Thorsten Brants. 2000. TnT - A Statistical Part-of-Speech Tagger. In Proceedings of the Sixth Conference on Applied Natural Language Processing, 224-231.
- [Erik P] Eric Brill. 1992. A Simple Rule-Based Part-of-Speech Tagger. In Proceedings of the Third Conference on Applied Computational Linguistics, 112-116.
- [] Stanley F. Chen and Joshua Goodman. 1996. An Empirical Study of Smoothing Techniques for Language Modeling. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, 310-318.
- [Ulrika] Rickard Domeij, Ola Knutsson, Johan Carlberger and Viggo Kann. 2000. Granska - an efficient hybrid system for Swedish grammar checking. In Proceedings of the 12th Nordic Conference on Computational Linguistics.
- [Sofia] Gregory Grefenstette and Pasi Tapanainen. 1994. What is a word, What is a sentence? Problems of Tokenization. In Proceedings of the 3rd Conference on Computational Lexicography and Text Research (COMPLEX '94).
- [Filip] Nizar Habash and Owen Rambow. 2005. Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, 573-580.
- [Mahin] Péter Halácsy, András Kornai and Csaba Oravecz. 2007. HunPos - An Open Source Trigram Tagger. In Proceedings of the ACL 2007 Demo and Poster Session, 209-2012.
- [] Fred Karlsson. 1990. Constraint Grammar as a Framework for Parsing Running Text. In Proceedings of the 13th International Conference on Computational Linguistics, 168-173.
- [Jennie] Hwee Tou Ng and Jin Kiat Low. 2004. Chinese Part-of-Speech Tagging: One-ata-Time or All-at-Once? Word-based or Character-Based?. In Proceedings of EMNLP 2004, 277-284.
- [] Patrizia Paggio. 2000. Spelling and Grammar Correction for Danish in SCARRIE. In Proceedings of the 6th Applied Natural Language Processing Conference, 255-261.
- [] Hasim Sak, Tunga Güngör and Murat Saraclar. 2009. A Stochastic Finite-State Morphological Parser for Turkish. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, 273–276.
- [] Helmut Schmid. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of the International Conference on New Methods in Language Processing.
- [Lisa] Mark Stevenson and Robert Gaizauskas. 2000. Experiments on Sentence Bounday Detection. In Proceedings of the 6th Applied Natural Language Processing Conference, 84-89.
- [Max] Heli Uibo. 2005. Finite-State Morphology of Estonian: Two-Levelness Extended. In Proceedings of Recent Advances in Natural Language Processing, 580-584.
- [Asa] Yue Zhang and Stephen Clark. 2008. Joint Word Segmentation and POS Tagging Using a Single Perceptron. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, 888-896.