Introduction
I was strugling with the apache commons email validator. This litle baby was not working according the specs they gave me. After a while I gave up on that and descided against my own rules to build it myself. What I mean with against my own rules is: Dont build if it is out there in a proper library. But in this case I had an exception.
The best way to solve this issue was to use a regular expression. Not my favorite but mighty handy when you need them.
Solution:
The scary part:
For those who have no experience with R.E. this might look like cursing in a comic :).
"^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]+)$";
Lets analyze this:First lets have a look at the structure and later on we spill some information about the details.
The first thing you might notice in the structure are blocks like these [_A-Za-z0-9-\\+]+the A-Z, a-z or 0-9 this means that it contains an character in capitals (A-Z), or in non-capitals(a-z)or digits from 0-9. if the notation is like:[_A-Za-z0-9] It will only search for one character each.if you add a + it means iterate till you can find no more.if you add {2} or {2,5} instead of a + it means only 2 characters or from 2 to 5 characters.
In the case of the email validator you also need one .(dot) and one @ as you can see they are not followed by the annotations I described to iterate for more.
Some of the details:
^ the carrot sign before the brackets means "Starts with" So in this case It can be a multiple underscores, characters or numbers.
The ^ inside the brackets means not like [^A-Z] everything but capitals.
The * is the same as + occurs 1 or more times (iteration).
The $ checks if the end of line is following.
Conclusion:
It still looks like cursing in a comic or something that gives you an headache. But it is the way forward if you want to check a string on patterns. This whole exercise is useless without execution. So you still need to use the Pattern and Matcher class (Examples all over the internet).
Or use the String.matches method. Will do two.
Have fun!
dinsdag 24 juni 2014
maandag 9 juni 2014
scraping html with selenium server
Introduction:
In this case I tried to create a server that is capable of getting the values from an html file. The first part what is needed I describe here.
The first part is of course to get the html from the url. This part you can do with Selenium Server, this is an java library which integrates selenium into the virtual machine. This is actual not the hardest part. The hardest part I am working currently namely to get crappy html a bit xml compliant.
The example:
public class Scraper {
private static final String BROWSER = "*firefox";
private static final String SERVER = "localhost";
private static final int PORT = 4444;
private static final String SLASHES = "//";
private static final String SLASH = "/";
private DefaultSelenium selenium;
private String url;
public Scraper(String url) {
this.url = url;
}
public void startScraper() {
//This is to separate the server from the page where you want to start
int startPosition = url.indexOf(SLASHES) + SLASHES.length();
//this turns http://google.nl/search into google.nl/search
String tmpUrl = url.substring(startPosition);
int endPosition = tmpUrl.indexOf(SLASH);
//this turns http://google.nl/search into google.nl
String baseUrl = url.substring(0, endPosition + startPosition);
//this turns http://google.nl/search into /search
String pageUrl = url.substring(endPosition + startPosition, url.length());
//instanciate selenium with a port, your browser and the baseUrl
selenium = new DefaultSelenium(SERVER, PORT, BROWSER, baseUrl);
try {
//instanciate the selenium server
SeleniumServer server = new SeleniumServer();
server.start();
selenium.start();
//open the page you like search in this example
selenium.open(pageUrl);
} catch (Exception e) {
e.printStackTrace();
}
}
//This method is the first part for xml compliancy.
//In my case the data I want is allways in the body.
public String createHTmlOutPutStream () {
String htmlSource = selenium.getHtmlSource();
int start = htmlSource.indexOf("<body");
int stop = htmlSource.indexOf("</body>") + 7 ;
htmlSource = htmlSource.substring(start,stop);
return htmlSource;
}
// This method fills every textfield if you know the name
public void populateTextField(String fieldName, String text) {
selenium.type(fieldName, text);
}
// This method clicks the button for you
public void clickButton(String buttonName) {
selenium.click(buttonName);
}
// This method runs through every option in a dropdown list.
public void chooseDropDownAll(String dropDownName) {
String[] options = selenium.getSelectOptions("dropDownName");
for (String option : options) {
selenium.select("dropDownName", option);
}
}
// This method only picks one option in the dropdown list.
public void chooseDropDownSpecific(String dropDownName, String optionName) {
selenium.select("dropDownName", optionName);
}
}
Conclusion:
With those two libraries:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-firefox-driver</artifactId>
<version>2.42.0</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-server</artifactId>
<version>2.42.0</version>
you will be capable of doing this trick. This only a small set of possibilities to discover.
You should try it, the next thing what I want to do is to write a test that does the login and the first search for me on the project I am working on right now. This because I like programming and not logging in :).
Have fun!
In this case I tried to create a server that is capable of getting the values from an html file. The first part what is needed I describe here.
The first part is of course to get the html from the url. This part you can do with Selenium Server, this is an java library which integrates selenium into the virtual machine. This is actual not the hardest part. The hardest part I am working currently namely to get crappy html a bit xml compliant.
The example:
public class Scraper {
private static final String BROWSER = "*firefox";
private static final String SERVER = "localhost";
private static final int PORT = 4444;
private static final String SLASHES = "//";
private static final String SLASH = "/";
private DefaultSelenium selenium;
private String url;
public Scraper(String url) {
this.url = url;
}
public void startScraper() {
//This is to separate the server from the page where you want to start
int startPosition = url.indexOf(SLASHES) + SLASHES.length();
//this turns http://google.nl/search into google.nl/search
String tmpUrl = url.substring(startPosition);
int endPosition = tmpUrl.indexOf(SLASH);
//this turns http://google.nl/search into google.nl
String baseUrl = url.substring(0, endPosition + startPosition);
//this turns http://google.nl/search into /search
String pageUrl = url.substring(endPosition + startPosition, url.length());
//instanciate selenium with a port, your browser and the baseUrl
selenium = new DefaultSelenium(SERVER, PORT, BROWSER, baseUrl);
try {
//instanciate the selenium server
SeleniumServer server = new SeleniumServer();
server.start();
selenium.start();
//open the page you like search in this example
selenium.open(pageUrl);
} catch (Exception e) {
e.printStackTrace();
}
}
//This method is the first part for xml compliancy.
//In my case the data I want is allways in the body.
public String createHTmlOutPutStream () {
String htmlSource = selenium.getHtmlSource();
int start = htmlSource.indexOf("<body");
int stop = htmlSource.indexOf("</body>") + 7 ;
htmlSource = htmlSource.substring(start,stop);
return htmlSource;
}
// This method fills every textfield if you know the name
public void populateTextField(String fieldName, String text) {
selenium.type(fieldName, text);
}
// This method clicks the button for you
public void clickButton(String buttonName) {
selenium.click(buttonName);
}
// This method runs through every option in a dropdown list.
public void chooseDropDownAll(String dropDownName) {
String[] options = selenium.getSelectOptions("dropDownName");
for (String option : options) {
selenium.select("dropDownName", option);
}
}
// This method only picks one option in the dropdown list.
public void chooseDropDownSpecific(String dropDownName, String optionName) {
selenium.select("dropDownName", optionName);
}
}
Conclusion:
With those two libraries:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-firefox-driver</artifactId>
<version>2.42.0</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-server</artifactId>
<version>2.42.0</version>
you will be capable of doing this trick. This only a small set of possibilities to discover.
You should try it, the next thing what I want to do is to write a test that does the login and the first search for me on the project I am working on right now. This because I like programming and not logging in :).
Have fun!
Abonneren op:
Posts (Atom)