Use select() in Python Socket Programming at Your Own Risk

When one reads the Python Socket Programming HOWTO, non-blocking sockets are mentioned along with select. This is a tale of when select can cause issues if you're not careful.

Start off by reading How to increase filedescriptor's range in python select(). Pay special attention to this part:

Strictly speaking, select() is limited in the highest file descriptor it can support, as opposed to the number of them in a given call - see the start of the Notes section of the select() manpage.

Let's get the relevant information from man pages on macOS and Linux.

$ man -S 2 select | tail -n 22 | head -n 10
BUGS
    Although the provision of getdtablesize(2) was intended to allow user
    programs to be written independent of the kernel limit on the number of
    open files, the dimension of a sufficiently large bit field for select
    remains a problem.  The default size FD_SETSIZE (currently 1024) is some-
    what smaller than the current kernel limit to the number of open files.
    However, in order to accommodate programs which might potentially use a
    larger number of open files with select, it is possible to increase this
    size within a program by providing a larger definition of FD_SETSIZE
    before the inclusion of <sys/types.h>.

$ curl -s http://man7.org/linux/man-pages/man2/select.2.html#BUGS | grep -A 15 'href="#BUGS"></a>BUGS'
<h2><a id="BUGS" href="#BUGS"></a>BUGS  &nbsp; &nbsp; &nbsp; &nbsp; <a href="#top_of_page"><span class="top-link">top</span></a></h2><pre>
    POSIX allows an implementation to define an upper limit, advertised
    via the constant <b>FD_SETSIZE</b>, on the range of file descriptors that
    can be specified in a file descriptor set.  The Linux kernel imposes
    no fixed limit, but the glibc implementation makes <i>fd_set</i> a fixed-
    size type, with <b>FD_SETSIZE </b>defined as 1024, and the <b>FD_*</b>() macros
    operating according to that limit.  To monitor file descriptors
    greater than 1023, use <a href="../man2/poll.2.html">poll(2)</a> instead.

    According to POSIX, <b>select</b>() should check all specified file
    descriptors in the three file descriptor sets, up to the limit
    <i>nfds-1</i>.  However, the current implementation ignores any file
    descriptor in these sets that is greater than the maximum file
    descriptor number that the process currently has open.  According to
    POSIX, any such file descriptor that is specified in one of the sets
    should result in the error <b>EBADF</b>.

In simpler words, select can only handle file descriptors whose ID is less than or equal to the value of FD_SETSIZE at the time Python was built. By default it is 1024. One could pass a single socket to select and it would raise an exception -- ValueError: filedescriptor out of range in select() -- if the fileno of the socket is greater than the value of FD_SETSIZE.

This is a long standing issue and there are multiple workarounds:

  • Rebuild Python after increasing the value of FD_SETSIZE in sys/types.h C file
  • Do not pass in non-blocking socket whose fileno is less than 1024
  • Do not use select with non-blocking sockets

Repro

Let's look at some examples that reproduce this issue.

Here's an example that works.

$ python3.6
Python 3.6.1 (default, May  1 2017, 22:40:40)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.24.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> import select
>>> s = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM)
>>> s.fileno()
3
>>> s.connect(('127.0.0.1', 9090))
>>> r, w, err = select.select([], [s], [])
>>> r
[]
>>> w
[<socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 52739), raddr=('127.0.0.1', 9090)>]
>>> err
[]
>>> s.shutdown(socket.SHUT_RDWR)
>>> s.close()
>>>

Following is functionally the same as above.

$ python3.6
Python 3.6.1 (default, May  1 2017, 22:40:40)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.24.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> import select
>>> s = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM)
>>> s.fileno()
3
>>> s.timeout
>>> s.setblocking(1)
>>> s.settimeout(None)
>>> s.connect(('127.0.0.1', 9090))
>>> r, w, err = select.select([], [s], [])
>>> r
[]
>>> w
[<socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 52748), raddr=('127.0.0.1', 9090)>]
>>> err
[]
>>> s.shutdown(socket.SHUT_RDWR)
>>> s.close()

Now let's make the socket non-blocking and demonstrate it still works.

$ python3.6
Python 3.6.1 (default, May  1 2017, 22:40:40)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.24.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> import select
>>> s = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM)
>>> s.fileno()
3
>>> s.timeout
>>> s.setblocking(0)
>>> s.settimeout(5)
>>> s.timeout
5.0
>>> s.connect(('127.0.0.1', 9090))
>>> r, w, err = select.select([], [s], [])
>>> r
[]
>>> w
[<socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 52774), raddr=('127.0.0.1', 9090)>]
>>> err
[]
>>> s.shutdown(socket.SHUT_RDWR)
>>> s.close()

Let's reproduce the exception ValueError: filedescriptor out of range in select().

$ python3.6
Python 3.6.1 (default, May  1 2017, 22:40:40)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.24.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> import select
>>> sockets = []
>>> for i in range(1024):
...     s = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM)
...     s.setblocking(False)
...     s.settimeout(5)
...     s.connect(('127.0.0.1', 9090))
...     sockets.append(s)
...     r, w, err = select.select([], sockets, [])
...
Traceback (most recent call last):
File "<stdin>", line 6, in <module>
BlockingIOError: [Errno 36] Operation now in progress
>>> s = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM)
>>> s.fileno()
1025
>>> s.timeout
>>> s.setblocking(False)
>>> s.settimeout(5)
>>> s.timeout
5.0
>>> s.connect(('127.0.0.1', 9090))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BlockingIOError: [Errno 36] Operation now in progress
>>> r, w, err = select.select([], [s], [], 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: filedescriptor out of range in select()
>>> s.shutdown(socket.SHUT_RDWR)
>>> s.close()
>>> for s in sockets:
...     try:
...         s.shutdown(socket.SHUT_RDWR)
...     except:
...         pass
...     try:
...         s.close()
...     except:
...         pass
...
>>>

Workaround

With a lot of open sockets in a process, opening a new connect raises the exception BlockingIOError: [Errno 36] Operation now in progress. A solution that worked for me was to sleep for a few seconds and try connect again. The second attempt raises a different exception. If the second exception is OSError: [Errno 56] Socket is already connected then the socket is ready and you can use it as intended.

$ python3.6
Python 3.6.1 (default, May  1 2017, 22:40:40)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.24.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> import select
>>> sockets = []
>>> for i in range(1024):
...     s = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM)
...     s.setblocking(False)
...     s.settimeout(5)
...     s.connect(('127.0.0.1', 9090))
...     sockets.append(s)
...     r, w, err = select.select([], sockets, [])
...
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
BlockingIOError: [Errno 36] Operation now in progress
>>> import time
>>> time.sleep(5)
>>> s = socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM)
>>> s.fileno()
1025
>>> s.connect(('127.0.0.1', 9090))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 56] Socket is already connected
>>> try:
...     s.connect(('127.0.0.1', 9090))
... except OSError as e:
...     if "[Errno 56] Socket is already connected" in str(e):
...         print("All is well")
...     else:
...         print("Something went horribly wrong")
...         raise e
...
All is well
>>> s.shutdown(socket.SHUT_RDWR)
>>> s.close()
>>> for s in sockets:
...     try:
...         s.shutdown(socket.SHUT_RDWR)
...     except:
...         pass
...     try:
...         s.close()
...     except:
...         pass
...
>>>

Conclusion

Using socket.socket.setblocking(False) or socket.socket.settimeout(0) sets the socket to be non-blocking. Using such a socket with select when its fileno is greater than FD_SETSIZE will raise the exception ValueError: filedescriptor out of range in select().

Avoid this case by not using select on non-blocking sockets when a single process could create a lot of sockets.