联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2020-03-30 09:20

CS2034: Data Analytics Project: Building a sentiment classifier

Winter 2020

Out of / 80 (Tentative 18% of final grade)

In this project, you will select a dataset from the two available options. You will then train a machine

learning classifier to make predictions based on your features developed in VBA.

There are two datasets available: 1) Yelp reviews (project_yelp.txt) and 2) IMDB movie reviews

(project_imdb.txt).

Citation: [1]

Both of these datasets are in the format: <Review><TAB><Class>, where <Class> = 0 for negative

sentiment and 1 for positive sentiment.

You can select ONE of these datasets to engineer features for, using Excel and VBA. It is possible your

code might work well on both.

Submission Requirements: Submit an XLSM file <your_last_name>_project.xlsm and an accuracy.txt file

with the copied output for the LinearSVMBinary from Visual Studio (on Windows) or the custom Mac

software.

Project Requirements:

0. Import the CSV data into a macro-enabled Excel workbook. Give the first column a heading

called “REVIEW” and the second column, with the class labels, a heading called “SENTIMENT

CLASS” (0 marks).

1. Develop VBA features, implemented as Subs, to process text – requirement of 12. (4 * 12 = 48

marks).

(0 – Poor, 1, 2 – Marginal Quality, 2 – Acceptable Quality, 3 – Good Quality)

You will write 12 features, implemented as VBA subs, to process the text in this data to be fed

into a machine learning classifier.

Each feature will have its own column. The values for each feature MUST be numeric only.

Marks will be assigned based on: Code Quality, Originality, and suspected performance (ie. The

potential for the feature to improve the classifier accuracy). The features should be reasonably

distinct from each other.

IMPORTANT: Not all of these must be complex. The TA will take into consideration the balance

of complexity of your overall project. For instance, a high scoring project’s features might look

like: first 4 simple, next 4 moderate complexity, last 4 original and complex; might demonstrate

your creativity and ability to code in VBA.

IMPORTANT # 2: If you cannot come up with 12 features, the Instructor will assist you with

ideas.

IMPORTANT # 3: You can write more features if you want; you must clearly label the ones you

want us to mark.

2. A Sub Main, to call all of your features on the data. (6 marks).

3. Overall sufficient code comments in the Module, including a comment header with separate

lines consisting of your name, the course code, “Winter 2020”, and the Instructor name. There

should be good naming of Subs + the Module that contains your features (8 marks -

subjective).

4. Overall Good code organization (6 marks - subjective).

5. Good Accuracy scores (up to 12 marks, with a potential of 3 bonus for a maximum of 15

marks).

The instructor will release baseline scores for the classification task for the data. Those who

score below the baseline will get < 6 marks.

Baseline will get 6 marks.

Greater than baseline with be > 6 marks.

Training your ML Classifier (Mac):

The software to do this, including instructions, will be posted on OWL in a separate file in the project

assignment dropbox.

Training your ML Classifier (Windows Only):

1. Download Visual Studio Community 2019 for Windows (left) from:

https://visualstudio.microsoft.com/

2. In the Installer setup program, only select Desktop development.

3. Register for the COMMUNITY edition of the software (this is free for education use). You

might be able to use your UWO login.

4. Download and install the ML.NET Model Builder (you can OPEN this file and it will set

everything up for you) https://marketplace.visualstudio.com/items?itemName=MLNET.07

5. Export your XLSM feature data into a new CSV file. You should be able to just copy+paste it.

6. Select New -> Project -> Console Application

7.

Press “Create”

8. Right click on “ConsoleApp1” below “Solution ConsoleApp1” and hover over “Add” then go to

“Machine Learning” and click “Custom Scenario” (NOT sentiment analysis”

9. Test out your CSV file. Make sure the column to predict is the SENTIMENT CLASS label (0 –

negative sentiment or 1 – positive sentiment).

10. Train for 10 seconds under “binary-classification”

11. Then click “Evaluate” to get the accuracy score. Check what it is for the LinearSVMBinary in

the output window (if this is hidden click View - > Output)

Dataset References “From group to individual labels using deep

features,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery

and Data Mining, 2015, doi: 10.1145/2783258.2783380.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp