Heritrix is the Internet Archive-s web archival software, essentially a web crawling bot that takes a list of web sites, and saves them as ARC/WARC files in order to create a web archive like the one at archive.org.
Sometimes, like every other piece of software, it can produce error messages that might not be trivial.
One of them is the following:
Caused by: java.nio.file.FileSystemException: /path/to/file: Stale file handle
Other than the exception, you might face the following problems:
- The REST API returns empty responses for certain jobs, instead of their status.
- The web UI shows a long chain of exception (including Stale file handle FileSystemException as the root cause) when navigating to the job’s status page
Cause:
One possible cause this issue is that Heritrix has a file open that is on a remote filesystem, and during Heritrix’s run the connection to that filesystem broke due to a network outage for example.
Solution:
- Safely shut down Heritrix’s other jobs ( pause, checkpoint )
- Restart Heritrix
After the restart if you continue the jobs they will be fine, and the error is gone.