During these days “The Sinquefield Cup” in Saint Louis and the “Julius Baer Generation Cup” are in the eye of the chess world. So, this text is for people who want to know the sentiments of people regarding Magnus Carlsen and Hans Moke Niemann, on twitter. But how to do a Sentimental Analysis? Well here is when Machine Learning comes to help us. In short Sentiment analysis is a technique that detects the underlying sentiment in a piece of text. It is the process of classifying text as either positive, negative, or neutral. Machine learning techniques are used to evaluate a piece of text and determine the sentiment behind it.
For scrapping an important amount of tweets we used sntwitter library, that makes our life easier. In this Analysis I gonna use some python libraries for this analisis like TextBlob, another interesting liberty that helps us to classify the words of the texts on subjectivity
and polarity depending on the sentiments that are expressed. In addition, we built a wordcloud to display the most relevant and repetitive words from each dataset. The script scraped 5000 tweets that contain the keywords @HansMokeNiemann and @MagnusCarlsen, respectively. But on each dataset ¿what is the most relevant word or words on those tweets? Let’s take a check.
@MagnusCarlsen Dataset
@HansMokeNiemann-Dataset
As we can see the words are: Magnus, Han, Cheating, Chess, tournament and Hikaru, on the cloudwords of Hans, on the other hands on the cloud of Magnus the words are pretty the same: Magnus, Han, Cheating, Chess, tournament and Play. Now we have a clear context about what the people are talking about. But how is the tendency regarding the context ? Users are expressing good or bad things to both Grandmasters of Chess? How to know it? The best approach is to measure the polarity and the subjectivity on a range of -1, 0 and 1 where -1 is the negative position 0 a neutral position and 1 a positive position. Let’s take a check:
@HansMokeNiemann-Plot
@MagnusCarlsen Plot
At a fast check we can say that the plots are pretty the same, on both graphs are populated on the center with a tendency to the top right of the graph . But seeing them more deeply and with the help of the correlation coefficient theory where the strength and the direction of the linear relationship market how hard or weak is the tendency. We know a fact: in both cases the correlation is weak and tries to be neutral . But which case is the stronger? Double check the graphs, check again @MagnusCarlsen Plot, the dark blue dots that are in the center of the graph vs the dark blue dots that are in the center of the graph of @HansMokeNiemann-Plot. Yes, we can see a variation, there is a stronger position of neutrality on the dataset Hans than on the Dataset on Magnus. But we can confirm this in a clear way? Yes, with the next histograms.
@MagnusCarlsen-histogram
@HansMokeNiemann-histogram
The Magnus histogram share to us a natural distribution something expected on natural statistics ; but in the other hand, Hans histogram share to us that is a solid neutrality and positive position; here is a detail according to a normal histogram distribution, more close to accomplish this is Magnus Carlsen than Hans Niemann. For some people this could be suspicious. We will use Benford Law to be sure that all of this is Legitimate.