This post I am going to share the basics of setting up and running a simple but working Selenium WebDriver example. The hypothetical requirement this example is trying to accomplish is following:
Print out all addresses on rightmove.co.uk for 3 bedroom properties for sale in Reading under £175K
Requirements looks simple. Lets get started.
Print out all addresses on rightmove.co.uk for 3 bedroom properties for sale in Reading under £175K
Requirements looks simple. Lets get started.
Setup Java project using Maven
Assuming you have maven installed on your system, run the following command to setup a generic java project.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=com.clearqa.app -DartifactId=rightmove-scraper |
Update pom.xml to
- Include selenium-firefox-driver and selenium-support dependencies.
- Generate a executable jar file. This example pom.xml assumes that your main class would be named PropertyAddressFinder.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0"?> | |
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | |
<modelVersion>4.0.0</modelVersion> | |
<groupId>com.clearqa.app</groupId> | |
<artifactId>rightmove-scraper</artifactId> | |
<version>1.0-SNAPSHOT</version> | |
<packaging>jar</packaging> | |
<name>rightmove-scraper</name> | |
<url>http://maven.apache.org</url> | |
<properties> | |
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | |
</properties> | |
<build> | |
<plugins> | |
<plugin> | |
<groupId>org.apache.maven.plugins</groupId> | |
<artifactId>maven-shade-plugin</artifactId> | |
<version>2.0</version> | |
<executions> | |
<execution> | |
<phase>package</phase> | |
<goals> | |
<goal>shade</goal> | |
</goals> | |
<configuration> | |
<transformers> | |
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> | |
<manifestEntries> | |
<Main-Class>com.clearqa.app.PropertyAddressFinder</Main-Class> | |
</manifestEntries> | |
</transformer> | |
</transformers> | |
</configuration> | |
</execution> | |
</executions> | |
</plugin> | |
</plugins> | |
</build> | |
<dependencies> | |
<dependency> | |
<groupId>org.seleniumhq.selenium</groupId> | |
<artifactId>selenium-firefox-driver</artifactId> | |
<version>2.31.0</version> | |
</dependency> | |
<dependency> | |
<groupId>org.seleniumhq.selenium</groupId> | |
<artifactId>selenium-support</artifactId> | |
<version>2.31.0</version> | |
</dependency> | |
<dependency> | |
<groupId>junit</groupId> | |
<artifactId>junit</artifactId> | |
<version>3.8.1</version> | |
<scope>test</scope> | |
</dependency> | |
</dependencies> | |
</project> |
Selenium
Import this maven project in your favourite IDE and add PropertyAddressFinder class. Java code below should be self explanatory. It uses Selenium FirefoxDriver to open the website and search for properties .
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.clearqa.app; | |
import java.util.List; | |
import org.openqa.selenium.By; | |
import org.openqa.selenium.WebDriver; | |
import org.openqa.selenium.WebElement; | |
import org.openqa.selenium.firefox.FirefoxDriver; | |
import org.openqa.selenium.support.ui.Select; | |
public class PropertyAddressFinder | |
{ | |
public static void main( String[] args ) | |
{ | |
System.out.println( "RightMove Scraper begins..." ); | |
WebDriver driver = new FirefoxDriver(); | |
driver.get("http://www.rightmove.co.uk"); | |
// Enter search term and click buy | |
WebElement search = driver.findElement(By.id("searchLocation")); | |
search.clear(); | |
search.sendKeys("Reading, Berkshire"); | |
driver.findElement(By.id("buy")).click(); | |
// Setup additional filters | |
new Select(driver.findElement(By.id("maxPrice"))).selectByVisibleText("175,000"); | |
new Select(driver.findElement(By.id("minBedrooms"))).selectByVisibleText("3"); | |
driver.findElement(By.id("submit")).click(); | |
boolean next_page = true; | |
while(next_page) { | |
// Print out addresses | |
List<WebElement> addresses = driver.findElements(By.className("displayaddress")); | |
for (WebElement a : addresses) { | |
System.out.println(a.getText()); | |
} | |
if(driver.findElements(By.linkText("next")).size() == 0) { | |
next_page = false; | |
} else { | |
// Go to next page | |
driver.findElement(By.linkText("next")).click(); | |
} | |
} | |
driver.close(); | |
} | |
} |
Run the Project
Now that we have the bits and pieces in place, we can let maven do the magic and make us a executable jar file, refer pom.xml's build section above to see how we achieve this. Executing the jar file via command line will open a firefox browser and print out whatever properties it finds.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ mvn clean compile package | |
$ java -jar target/rightmove-scraper-1.0-SNAPSHOT.jar | |
RightMove Scraper begins... | |
Western Road, Reading, Berkshire | |
Shirley Avenue, Reading | |
Salisbury Road, Reading, RG30 | |
Salisbury Road, Reading, Berkshire | |
Southcote Parade, Reading, Berkshire | |
Loddon Bridge Road, Woodley, Reading | |
Coniston Drive, Tilehurst, Reading | |
Catherine Street, Reading, Berkshire | |
Lincoln Road, Reading, RG2 | |
Belmont Road, Reading, RG30 | |
Ambrook Road Reading | |
Little Johns Lane, Reading, RG30 | |
Salcombe Road, Reading | |
Tilehurst, Reading, Berkshire | |
Westerham Walk, Reading, Berkshire | |
Crockhamwell Road, Woodley, Reading, Wokingham, RG5 | |
Caversham |
Conclusion
Above blog demonstrates a very basic usage of selenium webdriver to automate web based tasks that cannot be otherwise achieved via well known api's.
No comments:
Post a Comment