jsoup is a Java library that makes it easy to work with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, ...
HTML is easy to get started with, but hard to get right. There are several hundred element kinds, element attributes, and deeply nested hierachies - with some relationships even being conditional on ...