Page tree
Skip to end of metadata
Go to start of metadata

Syntax

ELEMENTS_BY_SELECTOR_QUERY(<string [containing HTML elements]>;<selector query>)

Description

Returns all elements that match the selector query as a list.

For more information on selector queries, see jsoup.org 

Example

Download the example file: HTML_File_Example.html

Given the following excerpt from the HTML file:

<table border="1" rules="groups">
	<thead>
		<tr>
			<th>Association 1</th>
			<th>Association 2</th>
			<th>Association 3</th>
		</tr>
	</thead>
	<tfoot>
		<tr>
			<td><i>affected:<br>4 Million People</i></td>
			<td><i>affected:<br>2 Million People</i></td>
			<td><i>affected:<br>1 Million People</i></td>
		</tr>
	</tfoot>
	<tbody>
		<tr>
			<td>New York</td>
			<td>San Francisco</td>
			<td>Atlanta</td>
		</tr>
		<tr>
			<td>Bread</td>
			<td>Biscuits</td>
			<td>Rolls</td>
		</tr>
		<tr>
			<td>Sandwich</td>
			<td>Soup</td>
			<td>Salad</td>
		</tr>
	</tbody>
</table>

The goal is to extract only the table data content that is located in the table body. Looking at the jsoup documentation on defining queries, a possible query to use is:

ancestor child: child elements that descend from ancestor

In this case, first extract the ancestor table body and then the child table data.

tbody td

The results are the table data <td> elements that are located in the table body <tbody> tag.

[<td>New York</td>, <td>San Francisco</td>, <td>Atlanta</td>, <td>Bread</td>, <td>Biscuits</td>, <td>Rolls</td>, <td>Sandwich</td>, <td>Soup</td>, <td>Salad</td>]



  • No labels