联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2025-12-04 08:06

Assessment 1 — MATH38161 (Due on December 1, 2pm)

Total Marks: 30

November 17, 2025

Description of the Dataset and Reading the Data File

MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits, which is frequently used to train various machine learning algorithms. See https://en.wikipedia.org/wiki/MNIST_database for deatils. Here, we have an extract of total 3000 handwritten digit samples for digits 5,6 and 7 in 28×28 gray scale image (that is total of 784 pixels), resulting in 3000 observations where dimension of each observation is 784. The goal is to analyze this data set using various unsupervised learning algorithms.

The data is given as a 3000 × 785 data matrix where the first column corresponds to the observation label (i.e., which digit) and columns 2 through 785 represent pixel values. The data is available on CANVAS as the file digit.txt. The following code can be used to read the data in R:

read_mat<-as.matrix(read.table("digit.txt",sep=","))

dim(read_mat)

## [1] 3000 785

image_mat<-matrix(0,3000,784)

image_mat[,1:784]<-read_mat[,2:785]

label<-read_mat[,1]

The function for converting the length 784 vector of pixel values and a typical observed image is given below. Here, we show one observation for each of the digits 5, 6 and 7 as an example.

obs_5<-image_mat[100,]

obs_6<-image_mat[1100,]

obs_7<-image_mat[2100,]

image(1:28, 1:28, matrix(obs_5, nrow=28)[ , 28:1],

col = gray(seq(0, 1, 0.05)), xlab = "", ylab="")

image(1:28, 1:28, matrix(obs_6, nrow=28)[ , 28:1],

col = gray(seq(0, 1, 0.05)), xlab = "", ylab="")

image(1:28, 1:28, matrix(obs_7, nrow=28)[ , 28:1],

col = gray(seq(0, 1, 0.05)), xlab = "", ylab="")

Tasks

The goal is to analyze the dataset through reducing dimension and performing clustering. To this end, please complete the following.

1) Perform. a PCA and use the first 20 components of PCA transformation to reduce the dimension to 20 from 784. Plot principal components 1 and 2 with the data labels and the PCA-scree plot. 6 marks

2)

i) Based on the reduced dimension projection, perform. K-means clustering and various hier-archical clustering and comment on their miss-classifications. Then discuss and conclude which one performs best and why. 10 marks

ii) Plot the hierarchical clustering dendrogram and cluster and the miss-classifications for the best method. 2 marks

3)

i) For the eigen decomposition Σ = UΛU T , where Σ is the covariance matrix for the data matrix X(n×d dimensional), the principal component transformation is given by T = XU, then first show that the inverse transformation is given by X = T UT . 1 mark

ii) Then, using the reduced dimension we can have a reduced-order reconstruction of X, by Xˆ = TKUK T , where TK and UK are matrices based on first K principal components. Hence, we can store the reduced-order approximation of the data matrix in reduced memory space without storing the whole data matrix. Find and store reduced order approximated data matrix based on 1,2,5, and 10 principal components. 2 marks

iii) Select one observation randomly for digits 5, 6, and 7. Then plot reduced order approx-imated image based on 1,2,5, and 10 principal components, and discuss the accuracy of the approximation. 4 marks

Instruction for Full Report (5 marks for writing and style)

Write a combined report based on all the parts, with a small introduction of the problems and what methods have been used and a brief outline of the techniques. Different sections should be different tasks. Also, you can make subsections for sub-parts. Then add a conclusion at the end. The report should be in pdf format with embedded code.

Submission and Due Date

The report is due by Dec 01, 2 pm (strict deadline). Submission link will be available later this week.




版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp