In this section, we'll work through some challenging XPath problems using a complex HTML structure. These exercises will help you apply the concepts we've discussed and improve your XPath skills.
8.1 The HTML Structure
First, let's look at the HTML we'll be working with. Save this as advanced-xpath-practice.html
and open it in your browser:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Advanced XPath Practice</title>
<style>
body { font-family: Arial, sans-serif; line-height: 1.6; padding: 20px; }
table { border-collapse: collapse; width: 100%; margin-bottom: 20px; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #f2f2f2; }
.highlight { background-color: #ffffd0; }
.level-1 { border: 1px solid #ddd; padding: 10px; margin-bottom: 10px; }
.level-2 { border: 1px solid #bbb; padding: 8px; margin: 5px 0; }
.level-3 { border: 1px solid #999; padding: 6px; margin: 3px 0; }
.target { font-weight: bold; color: blue; }
.exclude { opacity: 0.5; }
.event { border: 1px solid #ddd; padding: 10px; margin-bottom: 10px; }
.special { font-style: italic; }
</style>
</head>
<body>
<div id="current-date" data-value="2023-09-15">Current Date: September 15, 2023</div>
<h2>Complex Table</h2>
<table id="data">
<thead>
<tr>
<th>Status</th>
<th>Name</th>
<th>Department</th>
<th>Salary</th>
<th>Start Date</th>
</tr>
</thead>
<tbody>
<tr>
<td>Active</td>
<td>John Doe</td>
<td>IT</td>
<td>$75,000</td>
<td>2021-03-15</td>
</tr>
<tr class="highlight">
<td>Inactive</td>
<td>Jane Smith</td>
<td>HR</td>
<td>$65,000</td>
<td>2019-07-22</td>
</tr>
<tr>
<td>Active</td>
<td>Bob Johnson</td>
<td>Sales</td>
<td>$80,000</td>
<td>2022-11-30</td>
</tr>
<tr class="highlight">
<td>Active</td>
<td>Alice Brown</td>
<td>Marketing</td>
<td>$70,000</td>
<td>2023-01-10</td>
</tr>
<tr>
<td>Inactive</td>
<td>Charlie Wilson</td>
<td>Finance</td>
<td>$90,000</td>
<td>2018-05-03</td>
</tr>
</tbody>
</table>
<h2>Nested Structure</h2>
<div class="level-1">
<h3>Level 1 - A</h3>
<div class="level-2">
<h4>Level 2 - A1</h4>
<div class="level-3">
<h5>Level 3 - A1a</h5>
<span class="target">Target Content 1</span>
</div>
<div class="level-3 exclude">
<h5>Level 3 - A1b</h5>
<span class="target">Excluded Target Content</span>
</div>
<div class="level-3">
<h5>Level 3 - A1c</h5>
<span class="target">Target Content 2</span>
</div>
</div>
</div>
<div class="level-1">
<h3>Level 1 - B</h3>
<div class="level-2">
<h4>Level 2 - B1</h4>
<div class="level-3">
<h5>Level 3 - B1a</h5>
<span class="target">Target Content 3</span>
</div>
<div class="level-3">
<h5>Level 3 - B1b</h5>
<span class="target">Target Content 4</span>
</div>
</div>
</div>
<h2>Dynamic Content</h2>
<div id="event-list">
<div class="event" data-date="2023-07-15">
<h3>Conference A</h3>
<p class="special">Special Event</p>
<span>Date: July 15, 2023</span>
</div>
<div class="event" data-date="2023-08-22">
<h3>Workshop B</h3>
<span>Date: August 22, 2023</span>
</div>
<div class="event" data-date="2023-10-05">
<h3>Seminar C</h3>
<p class="special">Special Event</p>
<span>Date: October 5, 2023</span>
</div>
<div class="event" data-date="2023-11-18">
<h3>Webinar D</h3>
<span>Date: November 18, 2023</span>
</div>
<div class="event" data-date="2024-01-20">
<h3>Symposium E</h3>
<p class="special">Special Event</p>
<span>Date: January 20, 2024</span>
</div>
</div>
</body>
</html>
8.2 XPath Challenges
Now, let's tackle some challenging XPath problems using this HTML structure. Try to solve these on your own before looking at the solutions.
Challenge 1: Complex Table Navigation
Find all 'Active' employees who started after June 2022 and have a salary greater than $70,000.
Challenge 2: Nested Structure Navigation
Select the second 'target' span that is not within an 'exclude' class and is in the third 'level-3' div.
Challenge 3: Dynamic Content Filtering
Find all special events that occur after the current date (2023-09-15 in this example) and before the year 2024.
Challenge 4: Attribute Manipulation
Find all events where the month in the data-date attribute is an odd number.
Challenge 5: Complex Conditional Selection
Select all table rows where the employee is either:
Active with a salary above $75,000, or
Inactive with a start date before 2020
8.3 Solutions and Explanations
Solution 1: Complex Table Navigation
//table[@id='data']//tr[td[1][text()='Active'] and
number(translate(substring(td[5], 1, 4), '-', '')) >= 2022 and
number(translate(substring(td[5], 6, 2), '-', '')) > 06 and
number(translate(td[4], '$,', '')) > 70000]
Explanation:
Starts with the table having id 'data'
Selects rows where:
First column (status) is 'Active'
Start date year is 2022 or later
Start date month is after June
Salary (4th column) is greater than 70,000
Solution 2: Nested Structure Navigation
(//div[@class='level-3'][not(contains(@class, 'exclude'))])[3]//span[@class='target'][2]
Explanation:
Selects the third 'level-3' div that doesn't have the 'exclude' class
Within that div, selects the second span with class 'target'
Solution 3: Dynamic Content Filtering
//div[@class='event'][p[@class='special'] and
translate(substring(@data-date, 1, 10), '-', '') > '20230915' and
translate(substring(@data-date, 1, 4), '-', '') < '2024']
Explanation:
Selects 'event' divs that:
Have a paragraph with class 'special'
Have a date after 2023-09-15
Have a date before 2024
Solution 4: Attribute Manipulation
//div[@class='event'][number(substring(@data-date, 6, 2)) mod 2 = 1]
Explanation:
Selects 'event' divs where:
- The month (characters 6-7 in the date) is odd
Solution 5: Complex Conditional Selection
//table[@id='data']//tr[
(td[1][text()='Active'] and number(translate(td[4], '$,', '')) > 75000) or
(td[1][text()='Inactive'] and number(translate(substring(td[5], 1, 4), '-', '')) < 2020)
]
Explanation:
Selects rows where either:
Status is 'Active' and salary is above $75,000, or
Status is 'Inactive' and start date is before 2020
8.4 Testing Your XPath Expressions
To test these XPath expressions, you can use browser developer tools or online XPath testers. Here's how to use Chrome DevTools:
Open the HTML file in Chrome
Right-click and select "Inspect"
In the Console tab, use
$x("your_xpath_here")
to test your XPathThe matching elements will be returned in an array
Example:
$x("//table[@id='data']//tr[td[1][text()='Active']]")
This will return all rows with 'Active' status.
8.5 Conclusion
These challenges demonstrate the power and flexibility of XPath in handling complex document structures and conditions. Practice with these examples to improve your XPath skills, and don't hesitate to experiment with variations of these expressions to deepen your understanding.
Remember, while complex XPath expressions can be powerful, they can also be hard to maintain. In real-world scenarios, strive for a balance between specificity and readability in your locators.