#100DaysOfCode in Python Transcripts
Chapter: Days 58-60: Twitter data analysis with Python
Lecture: Build a Twitter wordcloud
0:00 Alright, our last part,
0:02 building a Wordcloud.
0:04 I left this in the notebook, although we prepped
0:07 and we have all the requirements installed.
0:10 One way you could also do it is to run
0:13 pip install inside the notebook using an exclamation mark,
0:17 but make sure that your virtual environment is enabled
0:21 to not install it in your global namespace.
0:24 But we have already Wordcloud,
0:25 so let's move on and get all the Tweets,
0:28 but this time we filter out all the retweets I mentioned.
0:31 And as the code is pretty similar as the last video,
0:33 I'm just going to copy-paste it here.
0:36 It looks over the tweets, and it excludes the retweets
0:39 that start with "RT" and with the at sign.
0:42 So, retweet and mention.
0:46 And that should give us a clean list for the Wordcloud.
0:49 Now, here's the wordcloud module.
0:52 It's a little Wordcloud generator in Python
0:54 and you can just feed it a bunch of text
0:57 and it comes up with this nice output.
1:00 You can put a mask on it to get the words
1:03 in the shape you want.
1:05 And I'm going to use that to put the words
1:08 in the shape of our PyBites logo.
1:10 So let's make the Wordcloud.
1:12 I'm going to type it out because it's a bit of code
1:15 and I come back and explain it line by line.
1:31 And this takes a while.
1:33 It's doing a lot of processing in the background.
1:35 So let's wait for it to come back.
1:38 We got a Wordcloud object.
1:40 Let me quickly highlight what happened.
1:42 First, we made a PyBites mask by doing an image.open
1:46 on a PyBites logo I have in my directory.
1:49 An image is from the Pillow library.
1:52 Then we make a set of stop words,
1:55 and stop words we imported in the beginning
1:58 which is part of the Wordcloud module.
2:00 I add, and that was basically by doing some
2:03 trial and error, I had to add co and https
2:07 because those were common tags.
2:09 They're false positives because those are
2:11 related to Twitter links, and, yeah.
2:13 We don't want to have these misrepresent
2:16 our Twitter word populations, so we add them
2:18 to the stop words.
2:19 Then we make the Wordcloud object.
2:22 We give it a white background, max words 2000,
2:25 you would have to try it on your own data set
2:27 what the best value is here.
2:29 We pass it in the mask and the stop words.
2:32 Then we generate the Wordcloud, passing in the string
2:35 of all the Tweets we defined earlier.
2:38 Next up, I want to show the Wordcloud in the browser.
2:42 And we're going to use a little bit
2:43 of matplotlib to do that.
2:51 This might take a bit as well.
2:58 That looks better.
3:00 And look at that!
3:02 We got the Wordcloud in the form of our PyBites logo.
3:05 By the way, this is our logo and mask,
3:09 so you see the similar shape.
3:12 And look at that.
3:13 I mean, what's cool about this is that you really
3:16 see what we're all about:
3:17 100 Days Of Code, Python, Code Challenge,
3:20 API, Django, PacktPub, Twitter.
3:25 So they're really things that stand out,
3:27 flask, of course, so very cool.
3:29 And nice that you can just import the module.
3:32 Three lines of code to create the object
3:34 and four lines of code to make the image, basically,
3:38 and you're set.
3:39 I mean, it's pretty impressive.
3:41 That's a wrap of this lesson.
3:42 I hope you like it and you got a taste of how
3:45 to get data from the Twitter API
3:47 and do a bit of analysis on that data.