#100DaysOfCode in Python Transcripts
Chapter: Days 73-75: Automate tasks with Selenium
Lecture: Demo 1: access my Packt ebook collection

Login or purchase this course to watch this video and the rest of the course contents.
0:00 All right, let the fun start.
0:03 Let's look at a more practical example.
0:05 You're probably familiar with packtpub.com.
0:07 They have a daily free eBook I've been collecting
0:10 the last months, and I got a bunch of eBooks on my account,
0:15 but my account obviously is behind a login.
0:17 So let's write a script to log in to Packt.
0:21 Reach out to my account details.
0:23 Go to my eBooks, this link here
0:25 and make a list of all the books it finds here.
0:28 Then, we retrieve the book titles and URLs,
0:31 so, let's get coding.
0:33 First of all, I don't want to store any password
0:37 and login into my script,
0:39 so we need to load them from the environment.
0:42 One way you can do that in Python is with import os,
0:45 os.environ get and let's say
0:50 we call it packt_user
0:54 and Packt_password.
0:56 We store them in user and password.
1:02 And you see, I already set them in the environment.
1:05 I will show you how to do that next,
1:07 so let's go back to the terminal
1:08 and make sure you have your virtual environment deactivated.
1:11 And go into venn/bin/activate.
1:16 And go to the end and do an export
1:20 of packt_user
1:25 and export packt_password.
1:31 And if you want to follow along, make those the values
1:34 of your login, save that.
1:37 Activate the virtual environment again,
1:39 and I'm using this alias, and now, you should have them
1:43 in your environment variables.
1:52 And it means they will be accessible to your script.
1:56 All right with the user and password set,
1:57 let's log in to the site.
1:59 So this is the login site
2:02 and let's initialize a driver.
2:10 And let's get the page,
2:14 then on the page, let's find
2:17 the actual login form which we can do with
2:21 find_element_by_id.
2:23 And first I looked at the page source
2:24 to see how the user and password fields are named.
2:27 And they have them named as edit name,
2:33 and you want to send the keys, basically sending data
2:37 into that form input fields, user.
2:40 Here we do the same for password,
2:44 and the password field is named pass,
2:47 and here we want to send it our password,
2:51 and importantly we want to make sure we hit enter
2:53 after that last value, so by running Selenium it
2:58 opens the browser and goes to the login page,
3:02 and there's my email and my password,
3:05 and click enter.
3:08 Look at that it logged into my account.
3:11 How cool is that?
3:15 Now we're logged into the page and move on
3:18 to find my eBooks.
3:20 As we saw there is a link on the page, My eBooks,
3:24 so we just need to find that link and click it.
3:31 Before running that cell let me show you
3:33 where we are now and what that page looks after clicking.
3:37 Now, we are in account details.
3:41 Click the cell.
3:46 Now we're in my eBooks. How cool is that?
3:48 I'm navigating this side through Selenium.
3:52 Let's move on and extract the books.
3:59 I'm going to use find_elements again,
4:02 but now by class names because I saw
4:06 that the books are in a class product-line
4:14 and that's in elements.
4:19 Right, couple of Selenium web elements, cool.
4:24 I can write a dictionary comprehension to
4:27 actually I extract the nid, N-I-D,
4:32 kind of the identifier, practice using and the title.
4:35 I'm going to store that in books.
4:41 I'm using the get_attribute,
4:45 nid as key,
4:49 and
4:53 title as value.
4:56 for e in elements.
5:02 Look at that all the books of my account.
5:05 Good I think we're done now, so let's close the driver,
5:10 and that actually closed the browser.
5:12 Alright, so and boom.
5:14 You cannot see it, but that closed my Chrome Browser
5:17 I had open.
5:18 Now that we have the data in a structure,
5:21 I can just write a little bit of code to get the book.
5:24 And to keep the focus on Selenium,
5:26 I'm just going to copy that code in.
5:29 We have to download URL which I extracted from HTML.
5:32 We have that id and the format of the book we want.
5:36 Possible formats are PDF, EPUB, and MOBI.
5:39 We write a function called get_books,
5:41 grabbing my books for a string and checks
5:45 if the book format is correct and then it just looks
5:48 through the titles.
5:49 Does a regular expression match on the title
5:52 and it gives me the title and URL.
5:54 The next step would then be to actually download
5:57 the book to my desktop, but that's out of the scope
5:59 of this lesson.
6:01 Let's try it out.
6:06 As just a regular expression I can get a regular expression
6:10 like searches.
6:11 I want all the MOBI files for Python Data Books.
6:15 Nice.
6:18 I want the books for machine learning
6:24 and I want the format of PDF.
6:27 It should also work in uppercase.
6:29 There you go.
6:31 A little useful script.
6:32 I don't spend too much time on them here
6:34 because I want to really focus on Selenium,
6:36 but the point is that once Selenium loaded your data
6:39 into a structure or you can dump it to your
6:42 database table or whatever then it's just easy to write
6:45 a function to work with that data.