DTS305TC Natural Language Processing
Coursework 2 (Individual Assessment)
Due: 5:00 pm China time (UTC+8 Beijing) on Fri. 19. Dec. 2025
Weighting: 60%
Maximum score: 100 marks (100% individual report)
Assessed learning outcomes:
C. Implement deep learning models and evaluate them based on performance metrics.
D. Develop skills of using NLP models and techniques in real-world applications.
Overview
Document classification is a core NLP task that involves automatically categorizing written content into a predefined set of classes or categories. This process is crucial for managing the vast amounts of textual data generated daily across various domains, including news, legal documents, medical records, and online content. The key aspects of document classification include: Text Representation, Feature Extraction, Model Selection, Deep Learning Approaches, Performance Evaluation, and so on.
This task faces challenges such as handling imbalanced datasets, dealing with the nuances of human language including sarcasm and context, and adapting to domain-specific vocabularies and terminologies.
Tasks
You are required to use the slides and Internet resources to learn the detailed knowledge of document classification problems, and use the Python programming language to complete one document classification report.
1. Background Knowledge (10 Marks)
Write the following content in text form. in the report.
(1) Please provide 3 real-life application scenarios that require document classification methods. (6 Marks)
(2) Please analyze why document classification methods, rather than other natural language processing methods (information retrieval, document clustering), are the most suitable for these 3 application scenarios. (4 Marks)
2. Algorithm Design (20 Marks)
Write the following content in text form. in the report.
(1) Provide two basic processes for a document classification system. (5 Marks/system x 2=10 Marks)
(2) Provide pseudocode for the algorithms (one machine learning and one deep learning) used in the document classification system in 2(1). (5 Marks/algorithm x 2=10 Marks)
3. System Implementation (40 Marks)
Use Python to implement the system described in Section 2 with the following functions:
(1) Main function: control the startup and flow of the entire document classification system. (5 Marks)
(2) User input function: allow continuous user input of text for classification via the console. (5 Marks)
(3) Database input function: read a local text library from a document folder. (5 Marks)
(4) Text preprocessing function: preprocess the read documents and use 80% as training samples and 20% as validation samples. (5 Marks)
(5) Classification algorithm 1: train Model 1 on the training samples using classification algorithm 1. (5 Marks)
(6) Classification algorithm 2: train Model 2 on the training samples using classification algorithm 2. (5 Marks)
(7) Classification algorithm performance: output the model metrics of Model 1 and Model 2 on the validation samples. (5 Marks)
(8) Output function: output the classification results of Model 1 and Model 2 for user input. (5 Marks)
4. Results Analysis (20 Marks)
Test your system and record the results; write the following content in text form. in the report.
(1) Test the developed document classification system using ten new text examples with your own labels. (5 Marks)
(2) Use recall to analyze the two different classification algorithm results (algorithm 1 and algorithm 2). (5 Marks)
(3) Use precision to analyze the two different classification algorithm results (algorithm 1 and algorithm 2). (5 Marks)
(4) Use F1 to analyze the two different classification algorithm results (algorithm 1 and algorithm 2). (5 Marks)
5. Conclusion (10 Marks)
Write the following content in text form. in the report.
(1) Describe how your designed and implemented document classification system addresses the three application scenarios in Section 1. (5 Marks)
(2) Report quality, including report format, code quality, and references. (5 Marks)
Submission
You must submit the following files:
l A PDF file named Student_ID.pdf containing a cover letter with your ID and name.
l A ZIP file named Student_ID.zip containing your program implementation and output files (e.g., dataset, DCS.py, precision.csv, recall.csv, F1.csv).
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。