How to simulate a web form ?
It's possible with the great java api httpclient to simulate the navigation of a user in Internet. We can :
- retrieve a page from a weblink,
- search in this page (for instance a link, mail, …),
- and important, we can simulate a validation of form (With get of post Method). It's then possible to retrieve all information of the world.
Httpclient is not a browser but a great part to interact with a web page and to simulate programmatically a web navigation.
I recommand you to read this great article Client HTTP Programming Primer for ForAbsoluteBeginners. You will find a lot of good basis information on the web, a explanation of the difference between Httpclient and a browser (see below HttpClient in black and the rest of the browser in blue) and a text that talk about how to perform a connection with a login form.
You have only a text and not a real example. You can read in this article : “So this document is all bla-bla, and you will have to work out the details - all the details - yourself. Such is life.”
It's why I have done and you can find below a simple example from a form login. It connect to my wordpress admin page (version 2.6.3) and retrieve the theme page.
To have HttpClient working and the code below, you must first download this two JAR :
and to add them to your java path.
package com.gerardnico.httpclient; import java.io.BufferedReader; import java.io.InputStreamReader; import org.apache.commons.httpclient.Cookie; import org.apache.commons.httpclient.Header; import org.apache.commons.httpclient.HttpClient; import org.apache.commons.httpclient.HttpStatus; import org.apache.commons.httpclient.NameValuePair; import org.apache.commons.httpclient.cookie.CookiePolicy; import org.apache.commons.httpclient.cookie.CookieSpec; import org.apache.commons.httpclient.methods.GetMethod; import org.apache.commons.httpclient.methods.PostMethod; public class FormLoginDemo { static final String LOGON_SITE = "gerardnico.com"; static final int LOGON_PORT = 80; public FormLoginDemo() { super(); } public static void main(String[] args) throws Exception { HttpClient client = new HttpClient(); client.getHostConfiguration().setHost(LOGON_SITE, LOGON_PORT, "http"); client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); // 'developer.java.sun.com' has cookie compliance problems // Their session cookie's domain attribute is in violation of the RFC2109 // We have to resort to using compatibility cookie policy GetMethod authget = new GetMethod("/wp-login.php"); client.executeMethod(authget); System.out.println("Login form get: " + authget.getStatusLine().toString()); // release any connection resources used by the method authget.releaseConnection(); // See if we got any cookies CookieSpec cookiespec = CookiePolicy.getDefaultSpec(); Cookie[] initcookies = cookiespec.match( LOGON_SITE, LOGON_PORT, "/", false, client.getState().getCookies()); System.out.println("Initial set of cookies:"); if (initcookies.length == 0) { System.out.println("None"); } else { for (int i = 0; i < initcookies.length; i++) { System.out.println("- " + initcookies[i].toString()); } } PostMethod authpost = new PostMethod("/wp-login.php"); // Prepare login parameters NameValuePair submit = new NameValuePair("wp-submit", "Log In"); NameValuePair url = new NameValuePair("redirect_to", "http://gerardnico.com/wp-admin/themes.php"); NameValuePair userid = new NameValuePair("log", "User_Login"); <- You have to change this. NameValuePair password = new NameValuePair("pwd", "User_Pwd"); <- and that. NameValuePair rememberme = new NameValuePair("rememberme", "forever"); NameValuePair cookie = new NameValuePair("testcookie", "1"); authpost.setRequestBody( new NameValuePair[] {submit, url, userid, password, rememberme, cookie}); client.executeMethod(authpost); System.out.println("Login form post: " + authpost.getStatusLine().toString()); // release any connection resources used by the method authpost.releaseConnection(); // See if we got any cookies // The only way of telling whether logon succeeded is // by finding a session cookie Cookie[] logoncookies = cookiespec.match( LOGON_SITE, LOGON_PORT, "/", false, client.getState().getCookies()); System.out.println("Logon cookies:"); if (logoncookies.length == 0) { System.out.println("None"); } else { for (int i = 0; i < logoncookies.length; i++) { System.out.println("- " + logoncookies[i].toString()); } } // Usually a successful form-based login results in a redicrect to // another url int statuscode = authpost.getStatusCode(); if ((statuscode == HttpStatus.SC_MOVED_TEMPORARILY) || (statuscode == HttpStatus.SC_MOVED_PERMANENTLY) || (statuscode == HttpStatus.SC_SEE_OTHER) || (statuscode == HttpStatus.SC_TEMPORARY_REDIRECT)) { Header header = authpost.getResponseHeader("location"); if (header != null) { String newuri = header.getValue(); if ((newuri == null) || (newuri.equals(""))) { newuri = "/"; } System.out.println("Redirect target: " + newuri); GetMethod redirect = new GetMethod(newuri); client.executeMethod(redirect); System.out.println("Redirect: " + redirect.getStatusLine().toString()); BufferedReader br = new BufferedReader(new InputStreamReader(redirect.getResponseBodyAsStream())); String readLine; while(((readLine = br.readLine()) != null)) { System.out.println(" 1 - " + readLine); } // release any connection resources used by the method redirect.releaseConnection(); } else { System.out.println("Invalid redirect"); System.exit(1); } } } }
Very great, no ? Well, I spend a lot of time to try to get around this code and I realized that when you download the archive from httpclient, you have very good examples in the directory “commons-httpclient-3.1\src\examples”. The code above come from the FormLoginDemo.java file.
Support
Cookie rejected
Nov 17, 2008 11:47:15 AM org.apache.commons.httpclient.HttpMethodBase processCookieHeaders WARNING: Cookie rejected: "$Version=0; wordpress_55ddaa0a24a40c041e4b5cb342cec90a=Nico%7C1228128437%7C679c8500a0a977d85955d370cd2e32f5; $Path=/wp-content/plugins". Illegal path attribute "/wp-content/plugins". Path of origin: "/wp-login.php" Nov 17, 2008 11:47:15 AM org.apache.commons.httpclient.HttpMethodBase processCookieHeaders WARNING: Cookie rejected: "$Version=0; wordpress_55ddaa0a24a40c041e4b5cb342cec90a=Nico%7C1228128437%7C679c8500a0a977d85955d370cd2e32f5; $Path=/wp-admin". Illegal path attribute "/wp-admin". Path of origin: "/wp-login.php"
HttpClient do this because it's in the RFC2109 but a lot of browser accept this cookie with bad path. You have may be the possibilities to work around this issue with the IGNORE_COOKIES policy. What I have done, it's to download the source and comment this instruction in the CookieSpecBase.java file :
// another security check... we musn't allow the server to give us a // cookie that doesn't match this path // if (!path.startsWith(cookie.getPath())) { // throw new MalformedCookieException( // "Illegal path attribute \"" + cookie.getPath() // + "\". Path of origin: \"" + path + "\""); // }
See this very good thread for more explanation.
Log
To initiate the header log, just add this lines in the head of your program.
System.setProperty("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.SimpleLog"); System.setProperty("org.apache.commons.logging.simplelog.showdatetime", "true"); System.setProperty("org.apache.commons.logging.simplelog.log.httpclient.wire.header", "debug"); System.setProperty("org.apache.commons.logging.simplelog.log.org.apache.commons.httpclient", "debug");
HttpClient in OC4J
Oracle Application Server and OC4J 10g (10.1.3) provide the HTTPClient Java package as a complete HTTP client library. It currently implements most of the relevant parts of the HTTP/1.0 and HTTP/1.1 protocols, including the request methods HEAD, GET, POST and PUT, and automatic handling of authorization, redirection requests, and cookies. Furthermore the included Codecs class contains coders and decoders for the base64, quoted-printable, URL-encoding, chunked and the multipart/form-data encodings.
This how-to illustrates a few basic features of the HTTPClient package with different JSPs, like the GET method and cookies.
Examples
- You have also very good examples in the directory “commons-httpclient-3.1\src\examples” from the Jar File.
