{"id":451,"date":"2018-07-24T14:16:51","date_gmt":"2018-07-24T13:16:51","guid":{"rendered":"http:\/\/www.igfasouza.com\/blog\/?p=451"},"modified":"2021-04-27T10:51:40","modified_gmt":"2021-04-27T09:51:40","slug":"check-tweets-spelling","status":"publish","type":"post","link":"http:\/\/www.igfasouza.com\/blog\/check-tweets-spelling\/","title":{"rendered":"Check Tweets Spelling"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/trumptweets-1024x683.jpg\" alt=\"\" width=\"625\" height=\"417\" class=\"alignnone size-large wp-image-916\" srcset=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/trumptweets-1024x683.jpg 1024w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/trumptweets-300x200.jpg 300w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/trumptweets-768x512.jpg 768w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/trumptweets-624x416.jpg 624w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/trumptweets.jpg 1231w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/p>\n<p><b>Alright Boyo?<\/b><\/p>\n<p>Donald Trump has been forced to correct his tweet boasting about his writing ability after it was filled with spelling mistakes.<br \/>\n<a href=\"https:\/\/www.nytimes.com\/aponline\/2018\/07\/03\/us\/politics\/ap-us-trump-tweets.html\" rel=\"noopener\" target=\"_blank\">Here<\/a><\/p>\n<p>The US president posted on the social media website to defend his writing style and criticise the &#8220;Fake News&#8221; media for searching for mistakes in his tweets. <\/p>\n<p>The tweet itself had a few errors: Instead of &#8220;pore over&#8221; Mr Trump wrote &#8220;pour over&#8221; and instead of &#8220;bestselling&#8221; he wrote &#8220;best selling&#8221;.<\/p>\n<p>There are also question marks over how many books the former businessman has actually written.<\/p>\n<p><a href=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-1.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-1-300x270.jpg\" alt=\"\" width=\"300\" height=\"270\" class=\"alignnone size-medium wp-image-456\" srcset=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-1-300x270.jpg 300w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-1.jpg 384w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>The tweet received a number of mocking responses before it was deleted and reposted with the &#8220;pour over&#8221; error corrected. <\/p>\n<p>With this and because I already have several projects using his tweets, I came up with the idea to analyze all Donald trump Tweets and check all spelling mistakes.<\/p>\n<p>I work in a data analytics company and I decide to ask for suggestions about the idea.<br \/>\nTalking with colleagues here I decided to add a blog post.<\/p>\n<p>Here I add a big thanks to my colleagues who helped me to do this analysis.<br \/>\n<a href=\"http:\/\/linkedin.com\/in\/brian-sullivan-0b62648\" rel=\"noopener\" target=\"_blank\">Brian Sullivan<\/a> and <a href=\"http:\/\/linkedin.com\/in\/aishwaryamundalik\/\" rel=\"noopener\" target=\"_blank\">Aishwarya Mundalik<\/a><\/p>\n<p>I was collecting all Tweets from him even on fly. You can check in my <a href=\"https:\/\/github.com\/igfasouza\/All-Tweets\" rel=\"noopener\" target=\"_blank\">GitHub<\/a> a Python script to get all Tweets from a user account.<\/p>\n<p>Just a small change in the code to save a csv file with tweet_id and word columns:<\/p>\n<div class=\"codecolorer-container python blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/>6<br \/>7<br \/>8<br \/>9<br \/><\/div><\/td><td><div class=\"python codecolorer\">&nbsp; <span class=\"kw1\">with<\/span> <span class=\"kw2\">open<\/span><span class=\"br0\">&#40;<\/span><span class=\"st0\">'%s_tweets.csv'<\/span> % screen_name<span class=\"sy0\">,<\/span> <span class=\"st0\">'wb'<\/span><span class=\"br0\">&#41;<\/span> <span class=\"kw1\">as<\/span> f:<br \/>\n&nbsp; &nbsp; writer <span class=\"sy0\">=<\/span> <span class=\"kw3\">csv<\/span>.<span class=\"me1\">writer<\/span><span class=\"br0\">&#40;<\/span>f<span class=\"sy0\">,<\/span> delimiter<span class=\"sy0\">=<\/span><span class=\"st0\">'|'<\/span><span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; writer.<span class=\"me1\">writerow<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#91;<\/span><span class=\"st0\">&quot;id&quot;<\/span><span class=\"sy0\">,<\/span><span class=\"st0\">&quot;words&quot;<\/span><span class=\"br0\">&#93;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"kw1\">for<\/span> items <span class=\"kw1\">in<\/span> outtweets:<br \/>\n&nbsp; &nbsp; &nbsp; index <span class=\"sy0\">=<\/span> items<span class=\"br0\">&#91;<\/span><span class=\"nu0\">0<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; out<span class=\"sy0\">=<\/span>items<span class=\"br0\">&#91;<\/span><span class=\"nu0\">1<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; word_array<span class=\"sy0\">=<\/span>out.<span class=\"me1\">split<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; <span class=\"kw1\">for<\/span> word <span class=\"kw1\">in<\/span> word_array:<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; writer.<span class=\"me1\">writerow<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#40;<\/span>index <span class=\"sy0\">,<\/span> word<span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p>And the R code<\/p>\n<p>The code is using the <a href=\"https:\/\/github.com\/ropensci\/hunspell\" rel=\"noopener\" target=\"_blank\">Hunspell<\/a> R API to analyze the words.<\/p>\n<p>First I checked Donald Trump\u2019s tweets, and then I decided to compare against some others;<br \/>\nI chose Leo Varadkar and Fintan O&#8217;Toole because I\u2019m in Ireland and I choose J.K. Rowling because she is a writer and according to the news she was one of the people to make a lot of jokes about the case.<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/>6<br \/>7<br \/>8<br \/>9<br \/>10<br \/><\/div><\/td><td><div class=\"bash codecolorer\">Output -----<span class=\"sy0\">&gt;<\/span> <br \/>\n<br \/>\n<span class=\"sy0\">&gt;<\/span> mean<span class=\"br0\">&#40;<\/span>TT_Final<span class=\"re1\">$correct<\/span><span class=\"sy0\">\/<\/span>TT_Final<span class=\"re1\">$n<\/span><span class=\"br0\">&#41;<\/span> <span class=\"br0\">&#40;<\/span>Trunmp Tweets<span class=\"br0\">&#41;<\/span><br \/>\n<span class=\"br0\">&#91;<\/span><span class=\"nu0\">1<\/span><span class=\"br0\">&#93;<\/span> <span class=\"nu0\">0.9645933<\/span><br \/>\n<span class=\"sy0\">&gt;<\/span> mean<span class=\"br0\">&#40;<\/span>LT_Final<span class=\"re1\">$correct<\/span><span class=\"sy0\">\/<\/span>LT_Final<span class=\"re1\">$n<\/span><span class=\"br0\">&#41;<\/span> &nbsp;<span class=\"br0\">&#40;<\/span>Leo Tweets<span class=\"br0\">&#41;<\/span><br \/>\n<span class=\"br0\">&#91;<\/span><span class=\"nu0\">1<\/span><span class=\"br0\">&#93;<\/span> <span class=\"nu0\">0.9187365<\/span><br \/>\n<span class=\"sy0\">&gt;<\/span> mean<span class=\"br0\">&#40;<\/span>RT_Final<span class=\"re1\">$correct<\/span><span class=\"sy0\">\/<\/span>RT_Final<span class=\"re1\">$n<\/span><span class=\"br0\">&#41;<\/span> &nbsp;<span class=\"br0\">&#40;<\/span>J_K Rolling tweets<span class=\"br0\">&#41;<\/span><br \/>\n<span class=\"br0\">&#91;<\/span><span class=\"nu0\">1<\/span><span class=\"br0\">&#93;<\/span> <span class=\"nu0\">0.9338411<\/span><br \/>\n<span class=\"sy0\">&gt;<\/span> mean<span class=\"br0\">&#40;<\/span>FT_Final<span class=\"re1\">$correct<\/span><span class=\"sy0\">\/<\/span>FT_Final<span class=\"re1\">$n<\/span><span class=\"br0\">&#41;<\/span> &nbsp; <span class=\"br0\">&#40;<\/span>Fotoole Tweets<span class=\"br0\">&#41;<\/span><br \/>\n<span class=\"br0\">&#91;<\/span><span class=\"nu0\">1<\/span><span class=\"br0\">&#93;<\/span> <span class=\"nu0\">0.9212394<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p>The result was impressive, Donald Trump has the best value. This means that he has fewer mistakes than the others. The API just states whether the spelling is correct or not for each word.<br \/>\nUnfortunately, the API just checks words and not grammar or syntax-  and for some values like single characters, the result is \u2018true\u2019 when it may not make sense.<\/p>\n<p>Here we can see a sample result.<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/>6<br \/>7<br \/>8<br \/><\/div><\/td><td><div class=\"bash codecolorer\">ID &nbsp; WORD &nbsp; RESULT<br \/>\n<span class=\"nu0\">1017190186269184000<\/span> &nbsp; but &nbsp; TRUE<br \/>\n<span class=\"nu0\">1017190186269184000<\/span> &nbsp; it &nbsp; &nbsp;TRUE<br \/>\n<span class=\"nu0\">1017190186269184000<\/span> &nbsp; isn &nbsp; FALSE<br \/>\n<span class=\"nu0\">1017190186269184000<\/span> &nbsp; t &nbsp; &nbsp;TRUE<br \/>\n<span class=\"nu0\">1017190186269184000<\/span> &nbsp; nearly &nbsp; &nbsp;TRUE<br \/>\n<span class=\"nu0\">1017190186269184000<\/span> &nbsp; U &nbsp; &nbsp;TRUE<br \/>\n<span class=\"nu0\">1017190186269184000<\/span> &nbsp; S &nbsp; &nbsp;TRUE<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p>This is just a basic analysis and the result are completely dependent on the API.<br \/>\nIt would be really nice to do a grammar or syntax analysis as well.<\/p>\n<p>The website politwoops.eu follows some politicians on Twitter and they show a list of deleted tweets from each one. I manage to get the last 60 tweets from Donald Trump and the result is:<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"sy0\">&gt;<\/span> mean<span class=\"br0\">&#40;<\/span>DD_Final<span class=\"re1\">$correct<\/span><span class=\"sy0\">\/<\/span>DD_Final<span class=\"re1\">$n<\/span><span class=\"br0\">&#41;<\/span> &nbsp; <span class=\"br0\">&#40;<\/span>Deleted Donald Trump tweets<span class=\"br0\">&#41;<\/span><br \/>\n<span class=\"br0\">&#91;<\/span><span class=\"nu0\">1<\/span><span class=\"br0\">&#93;<\/span> <span class=\"nu0\">0.958324<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><a href=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-300x200.jpg\" alt=\"\" width=\"300\" height=\"200\" class=\"alignnone size-medium wp-image-457\" srcset=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-300x200.jpg 300w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-624x416.jpg 624w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing.jpg 683w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>This proves that he actually deleted the tweet and posted it again and that he makes some mistakes.<\/p>\n<p>I just analysed the last 3000 tweets for each account.<\/p>\n<p>For fun, I have a look at my tweets as well:<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"sy0\">&gt;<\/span> mean<span class=\"br0\">&#40;<\/span>Igor_Final<span class=\"re1\">$correct<\/span><span class=\"sy0\">\/<\/span>IG_Final<span class=\"re1\">$n<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<span class=\"br0\">&#91;<\/span><span class=\"nu0\">1<\/span><span class=\"br0\">&#93;<\/span> <span class=\"nu0\">0.6915032<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p>And here is my defence &#8230; hehehe, Apparent IT words are not correct.<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/>6<br \/>7<br \/>8<br \/>9<br \/>10<br \/>11<br \/><\/div><\/td><td><div class=\"bash codecolorer\">ID &nbsp; WORD &nbsp; RESULT<br \/>\n<span class=\"nu0\">956884676173553664<\/span> &nbsp; GDG &nbsp; FALSE<br \/>\n<span class=\"nu0\">956884676173553664<\/span> &nbsp; Hackathon &nbsp; FALSE<br \/>\n<span class=\"nu0\">950817131503013888<\/span> &nbsp; Hacktoberfest &nbsp; FALSE<br \/>\n<span class=\"nu0\">947550776913735680<\/span> &nbsp; Flume &nbsp; FALSE<br \/>\n<span class=\"nu0\">947550776913735680<\/span> &nbsp; Spark &nbsp; FALSE<br \/>\n<span class=\"nu0\">943640519502106624<\/span> &nbsp; <span class=\"kw2\">sudo<\/span> &nbsp; FALSE<br \/>\n<span class=\"nu0\">943640519502106624<\/span> &nbsp; init &nbsp; FALSE<br \/>\n<span class=\"nu0\">943640519502106624<\/span> &nbsp; Brazil2018 &nbsp; FALSE<br \/>\n<span class=\"nu0\">936178221153910784<\/span> &nbsp; O<span class=\"st_h\">'Reilly'<\/span>s &nbsp; FALSE<br \/>\n<span class=\"nu0\">936178221153910784<\/span> &nbsp; Hadoop &nbsp; FALSE<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><a href=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-2.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-2-300x185.jpg\" alt=\"\" width=\"300\" height=\"185\" class=\"alignnone size-medium wp-image-458\" srcset=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-2-300x185.jpg 300w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/07\/Untitled-drawing-2.jpg 419w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>I put everything in my <a href=\"https:\/\/github.com\/igfasouza\/check_tweets_spelling\" target=\"_blank\" rel=\"noopener\">Github<\/a>, so you can get the code and play with. Just change the Twitter account and check yourself.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Alright Boyo? Donald Trump has been forced to correct his tweet boasting about his writing ability after it was filled with spelling mistakes. Here The US president posted on the social media website to defend his writing style and criticise&hellip; <a href=\"http:\/\/www.igfasouza.com\/blog\/check-tweets-spelling\/\" class=\"more-link\">Continue Reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":456,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[30],"tags":[18,31],"class_list":["post-451","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","tag-python","tag-twitter"],"_links":{"self":[{"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/posts\/451","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/comments?post=451"}],"version-history":[{"count":10,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/posts\/451\/revisions"}],"predecessor-version":[{"id":1195,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/posts\/451\/revisions\/1195"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/media\/456"}],"wp:attachment":[{"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/media?parent=451"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/categories?post=451"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/tags?post=451"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}