Personal tools
You are here: Home Members rmalouf Courses Ling 681: Statistical Methods in Natural Language Processing (Fall 2006)
Document Actions

Ling 681: Statistical Methods in Natural Language Processing (Fall 2006)

by Rob Malouf last modified 2006-11-14 10:38
This course offers an introduction to statistical methods in computational linguistics. Through a combination of lectures, demonstrations, and hands-on exercises, this course will give students an introduction to the skills necessary for evaluating constructing statistical natural language processing applications and for evaluating their results. Topics to be covered include: basic probability and information theory, statistics for corpus analysis and hypothesis testing, Markov chains and sequence models, probabilistic context-free grammars, stochastic attribute value grammars, and machine learning algorithms.
Available resources
Course type Lecture
Instructor Rob Malouf
Time MWF 13:00–13:50
Location BA 412

Requirements

The final grade will be based on homework assignments (20%), a take-home midterm exam (30%), and a final project (50%).

Through the term, there will be occasional homework assignments to practice the techniques learned in class. Working in groups is encouraged, but please include the names of all coworkers on the assignment.

The final project for this course will be a group project to design, implement, document, and evaluate an NLP application based on the statistical methods covered in the course.

Readings

The required textbooks for this course are:

Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.  http://nlp.stanford.edu/fsnlp/
and
Ian H. Wittien and Eibe Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Second edition. Elsevier. http://www.cs.waikato.ac.nz/∼ml/weka/book.html

They are for sale in the campus bookstore and at Amazon, etc. Updates and corrections to the first book can be downloaded from the authors' websites.

Additional readings will be made available in class or via the "Resources" section of the course web page

Lab

For homework assignments and final projects, we will be using the computational linguistics lab, part of the Social Sciences Research Lab in the basement of the Professional Services and Fine Arts building. Information about how to use the lab will be made available before the first assignment.

Schedule

  • Week 1–3 Introduction
    Background · Mathematical background · Probability · Information Theory
  • Week 4–6 Statistics
    Descriptive statistics · Hypothesis testing · Corpus statistics
  • Week 7–8 Context-free grammars
    Probabilistic context free grammars · Inside-Outside algorithm · Treebank grammars · Dependency-based models
  • Week 9–10 Attribute-value grammars
    Unification · Maximum entropy · Parameter estimation · Parsing
  • Week 11–12 Machine learning
    Word sense disambiguation · Machine learning algorithms · Evaluation
  • Week 13–14 Text classification
    Clustering · Classification · Advanced algorithms · Rainbow · Weka

Prerequisites

Ling 571,581 or permission of instructor.

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: