1. Overview
In this tutorial, we’ll take a look at how to add proxy support to Jsoup.
2. Common Reasons To Use a Proxy
There are two main reasons we might want to use a proxy with Jsoup.
2.1. Usage Behind an Organization Proxy
It’s common for organizations to have proxies controlling Internet access. If we try to access Jsoup through a proxied local network, we’ll get an exception:
java.net.SocketTimeoutException: connect timed out
When we see this error, we need to set a proxy for Jsoup before trying to access any URL outside of the network.
2.2. Preventing IP Blocking
Another common reason to use a proxy with Jsoup is to prevent websites from blocking IP addresses.
In other words, using a proxy (or multiple rotating proxies) allows us to parse HTML more reliably, reducing the chance that our code stops working due to a block or ban of our IP address.
3. Setup
When using Maven, we need to add the Jsoup dependency to our pom.xml:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.17.2</version>
</dependency>
In Gradle, we have to declare our dependency in build.gradle:
compile 'org.jsoup:jsoup:1.17.2'
4. Adding Proxy Support Through Host and Port Properties
Adding proxy support to Jsoup is pretty simple. All we need to do is to call the proxy(String, int) method when building the Connection object:
Jsoup.connect("https://spring.io/blog")
.proxy("127.0.0.1", 1080)
.get();
Here we set the HTTP proxy to use for this request, with the first argument representing the proxy hostname and the second the proxy port.
5. Adding Proxy Support Through Proxy Object
Or, to add the proxy to Jsoup using the Proxy class, we call the proxy(java.net.Proxy) method of the Connection object:
Proxy proxy = new Proxy(Proxy.Type.HTTP,
new InetSocketAddress("127.0.0.1", 1080));
Jsoup.connect("https://spring.io/blog")
.proxy(proxy)
.get();
This method takes a Proxy object consisting of a proxy type, typically the HTTP type, and an InetSocketAddress – a class that wraps the proxy’s hostname and port, respectively.
6. Conclusion
In this tutorial, we’ve explored two different ways of adding proxy support to Jsoup.
First, we learned how to do it with the Jsoup method that takes the host and port properties. Second, we learned how to achieve the same result using a Proxy object as a parameter.
As always, the code samples are available over on GitHub.