We have a digital object repository called DSpace at work, and we use the SWORDv2 protocol to deposit digital object into it. DSpace GUI and it’s SWORDv2 endpoint runs as servlets in a Tomcat container, and it’s all behind Nginx acting as a reverse proxy.
The other day one of my co-workers wanted to deposit a larger digital object package ( 8 GB ) into the repository, but unfortunately it failed because the servlet kept throwing SocketTimeoutException while it was reading the data being deposited, so I had to investigate and solve the problem.
java.net.SocketTimeoutException: Read timed out
I read the Tomcat and DSpace logs but it revealed nothing. I noticed that DSpace had some interrupted deposits in it’s upload directory. All of the files were of size 2 GB, which was suspicious but I couldn’t figure out why at first, because I couldn’t see and find any limit that would explain why it should die at just 2 gigs.
I am not an Nginx expert, but I enabled debug logging and started reading logs. Unfortunately at first sight it didn’t reveal anything, I saw no errors, only that Tomcat returned 500 while depositing, that’s when the SocketTimeoutException was raised. However some lines caught my attention anyways.
2018/07/18 09:27:55 [debug] 4273#4273: *1 sendfile: @0 2147479552
2018/07/18 09:27:55 [debug] 4273#4273: *1 sendfile: 2147479552 of 2147479552 @0
That big integer was quite suspicous, and after doing some simple math I figured that 2147479552 twice divided by 1024 is 2048. Which means this could be a byte count. This made me start thinking. After sending this much data and some wait Tomcat sent 500 with that exception, so I figured it’s worth looking into. I started digging in Nginx’s source code and found a comment block and a constant below it:
* On Linux up to 2.4.21 sendfile() (syscall #187) works with 32-bit
* offsets only, and the including <sys/sendfile.h> breaks the compiling,
* if off_t is 64 bit wide. So we use own sendfile() definition, where offset
* parameter is int32_t, and use sendfile() for the file parts below 2G only,
* see src/os/unix/ngx_linux_config.h
* Linux 2.4.21 has the new sendfile64() syscall #239.
* On Linux up to 2.6.16 sendfile() does not allow to pass the count parameter
* more than 2G-1 bytes even on 64-bit platforms: it returns EINVAL,
* so we limit it to 2G-1 bytes.
#define NGX_SENDFILE_MAXSIZE 2147483647L
After some further digging I realized that this sendfile() call is the default network I/O implementation of Nginx, but it can be turned off by setting
in the http scope of the Nginx config file. As I suspected this solved the problem, and we could deposit the packages without problems. Now as a short summary here’s what this is about and what happened:
sendfile() is an I/O call that transfers data between file descriptors without having to first read the data into RAM, therefore it’s faster than the traditional solution of reading from the source, storing in RAM then writing to the destination. This is by default enabled in Nginx and this is what among other solutions makes Nginx a fast web server. However it has a limit of 2 GB. So when my co-worker was depositing his package, Nginx accepted the deposit, and sent it to Tomcat. The trouble was that it wouldn’t send all data. When it finished with the 2GB part of the 8 GB size file it just stopped, while Tomcat was still waiting for the rest of the data. After a short while it timed out, and returned an HTTP code of 500 to Nginx. Turning off sendfile() fixes this, as Nginx can now send all the data, however this makes network I/O slower.