fix: Handle symbolic linking race condition#3796
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces standard ln -s calls with ln -sfn in the stage 1 bootstrap template to force symlink creation. However, using ln -sfn is not atomic and can introduce race conditions in concurrent environments due to the underlying unlink-then-symlink sequence, potentially causing transient 'No such file or directory' errors. Additionally, the -n flag is non-standard POSIX. It is recommended to use an atomic fallback pattern instead, such as ln -s ... 2>/dev/null || [[ -L ... ]].
| # The bin/ directory may not exist if it is empty. | ||
| mkdir -p "$(dirname $python_exe)" | ||
| ln -s "$symlink_to" "$python_exe" | ||
| ln -sfn "$symlink_to" "$python_exe" |
There was a problem hiding this comment.
Using ln -sfn is not atomic and can introduce a different race condition.\n\nUnder the hood, ln -sf performs an unlink() followed by a symlink(). If multiple processes run this bootstrap script concurrently, one process might delete the symlink created by another process, leading to a transient window where the symlink does not exist. Any concurrent process attempting to execute or read the file during this window will fail with a No such file or directory error. Additionally, the -n flag is not standard POSIX and may behave differently or be unsupported on some platforms.\n\nA safer, fully atomic approach that avoids deleting existing symlinks is to attempt the creation with ln -s and fallback to verifying the symlink's existence if it fails due to already existing:\n\nbash\nln -s \"$symlink_to\" \"$python_exe\" 2>/dev/null || [[ -L \"$python_exe\" ]]\n\n\nSince symlink() is an atomic system call, this guarantees that the symlink is either created successfully or already exists, without ever unlinking it.
| ln -sfn "$symlink_to" "$python_exe" | |
| ln -s "$symlink_to" "$python_exe" 2>/dev/null || [[ -L "$python_exe" ]] |
There was a problem hiding this comment.
gotta love seeing \n\n in a comment
|
|
||
| mkdir -p "$venv/bin" | ||
| ln -s "$python_exe_actual" "$python_exe" | ||
| ln -sfn "$python_exe_actual" "$python_exe" |
There was a problem hiding this comment.
As mentioned above, ln -sfn is not atomic and can cause transient No such file or directory errors for concurrent processes due to the unlink and symlink sequence.\n\nUsing ln -s ... 2>/dev/null || [[ -L ... ]] provides an atomic alternative that avoids deleting the symlink if it already exists.
| ln -sfn "$python_exe_actual" "$python_exe" | |
| ln -s "$python_exe_actual" "$python_exe" 2>/dev/null || [[ -L "$python_exe" ]] |
| if [[ ! -e "$venv_site_packages" ]]; then | ||
| mkdir -p $(dirname $venv_site_packages) | ||
| ln -s "$runfiles_venv_site_packages" "$venv_site_packages" | ||
| ln -sfn "$runfiles_venv_site_packages" "$venv_site_packages" |
There was a problem hiding this comment.
Using ln -sfn here can also cause race conditions under concurrent execution. We should use the atomic ln -s ... || [[ -L ... ]] pattern instead.
| ln -sfn "$runfiles_venv_site_packages" "$venv_site_packages" | |
| ln -s "$runfiles_venv_site_packages" "$venv_site_packages" 2>/dev/null || [[ -L "$venv_site_packages" ]] |
|
|
||
| if [[ ! -e "$venv/pyvenv.cfg" ]]; then | ||
| ln -s "$runfiles_venv/pyvenv.cfg" "$venv/pyvenv.cfg" | ||
| ln -sfn "$runfiles_venv/pyvenv.cfg" "$venv/pyvenv.cfg" |
There was a problem hiding this comment.
Using ln -sfn here can also cause race conditions under concurrent execution. We should use the atomic ln -s ... || [[ -L ... ]] pattern instead.
| ln -sfn "$runfiles_venv/pyvenv.cfg" "$venv/pyvenv.cfg" | |
| ln -s "$runfiles_venv/pyvenv.cfg" "$venv/pyvenv.cfg" 2>/dev/null || [[ -L "$venv/pyvenv.cfg" ]] |
|
lmk if you'd rather I look into whatever the heck the bot is yelling about, but we've had zero problems with race conditions since patching this in |
I ran into a problem when I packed a py_binary into an image and the users were launching it multiple times simultaneously. This resulted in a race condition for the
lncalls during the venv bootstrapping.My solution was just to add
-fnflags to ln.I'm still on 1.9 so I am targeting that release branch. Let me know if I am missing something obvious or if I should change anything else.