联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-05 10:51

Project 1 (20 Points Total)

Text Mining Twitter Data in R (using “tidytext”)


This is a two-week project spanning Weeks 2 and 3.

All parts are due at the end of Week 3.


Purpose

In this project you will use twitter data with the tidytext package in R to explore and analyze tweets. The goal is to dig deep into twitter data to learn more about a topic or event.


Assignment Due Date and Time

?Part 1 and 2 are both due in Week 3, Sunday at 11:59 p.m. ET.

You will need to install R and Twitter data in order to complete this project.

Part 1 (10 Points)


Twitter represents a fundamentally new instrument to make social measurements. Millions of people voluntarily express opinions across any topic. This data source is incredibly valuable for both research and business.


For example, here are some interesting applications of some twitter data analysis studies:


Twitter Study Tracks When We Are :) (twitter data shows biological rhythms)

https://www.nytimes.com/2011/09/30/science/30twitter.html


Twitter mood predicts the stock Market

https://arxiv.org/pdf/1010.3003&embedded=true


Thunderstorm Fest (plot a map of locations where thunder was mentioned in context of a storm in Summer 2012).

https://cliffmass.blogspot.com/2012/07/thunderstorm-fest.html


Researchers from Northeastern University and Harvard University studying the characteristics and dynamics of Twitter as a resource for learning more about how twitter can be used to analyze moods at a national scale.

http://www.ccs.neu.edu/home/amislove/twittermood/


Analyzing Tweets with R and tidytext (Trump and Obama tweet analysis)

https://medium.com/the-artificial-impostor/analyzing-tweets-with-r-92ff2ef990c6

Your Task


Come up with your own twitter analysis idea. Find something to compare on a theme of your choice. Decide on what data you want to use and what you are looking to find in the data. You can use your own data or data from strangers. You can use a generic theme or a specific one. You must decide on something you are interested in learning about. See the examples above for some ideas.


Write a 1-2 paragraph description of the analysis you will perform. Title this section, “Description.”


After you have performed the analysis in part 2 (below), provide a 2-3 paragraph description of your conclusion and results. Title this section “Conclusion.” In this section tell me what you discovered from the data? What did the data tell you? Was it what you expected or predicted? Did you learn anything interesting? What are your concluding thoughts on this analysis?


Save both sections together in a document labeled, “analysis.doc.”

Part 2 (10 Points)


Perform the analysis in R using tidytext. Your twitter data analysis must include (all steps outlined in chapter 7):


?Word Frequency Analysis

?Comparison of Word Usage

?Changes in Word Use Analysis

?Favorites and Retweets Analysis


Textbook 2. Chapter 7 will guide you through the steps. Save your R source code for the above steps.


Submission Instructions


Upload your part 1, “analysis.doc” and part 2, R source code files to the assignment submission area.


Grading Criteria

?The assignment is worth 20 points total, broken out as follows:

CriteriaNoviceNeeds ImprovementProficientExcellent

Part 1 Analysis

10 points0-5 points


An inappropriate topic was selected that didn’t make any sense or require any analysis or was capable of being analyzed by the dataset.

6-7 points


A good level of analysis was reported however there were areas where significant details and observations were missed.

8 points


The responses to all questions were reasonably correct however some of the reasoning contained unrealistic analysis or results.10 points


An appropriate topic was selected. The responses to the questions adequately analyzed and described the data descriptions as observed in the analysis.


The data showed interesting results that appeared to be appropriate given the analysis performed.

Part 2 Programming

10 points0-5 points


No working source code was created to address the proposed problem to be solved.

6-7 points


The source code that was created did not properly address the content of the questions although some of it may have worked to produce the correct results.

8 points


A majority of the answers were implemented properly, and the source code contained appropriate but not efficient solutions to address most of the questions.

10 points


All questions were implemented using efficient and correct R source code syntax. The functions were written properly, and they addressed the questions and provided an adequate response in all cases. The correct libraries were used.


Total0-10 points


0-60% (F - D)12-14 points


70% (C)16 points


80% (B)20 points


100% (A)


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp