From 58320848d2bc320955d7a7c683f2b680c884ee3e Mon Sep 17 00:00:00 2001 From: Laine Stump Date: Mon, 29 Oct 2012 18:05:41 -0400 Subject: [PATCH] util: do a better job of matching up pids with their binaries This patch resolves: https://bugzilla.redhat.com/show_bug.cgi?id=871201 If libvirt is restarted after updating the dnsmasq or radvd packages, a subsequent "virsh net-destroy" will fail to kill the dnsmasq/radvd process. The problem is that when libvirtd restarts, it re-reads the dnsmasq and radvd pidfiles, then does a sanity check on each pid it finds, including checking that the symbolic link in /proc/$pid/exe actually points to the same file as the path used by libvirt to execute the binary in the first place. If this fails, libvirt assumes that the process is no longer alive. But if the original binary has been replaced, the link in /proc is set to "$binarypath (deleted)" (it literally has the string " (deleted)" appended to the link text stored in the filesystem), so even if a new binary exists in the same location, attempts to resolve the link will fail. In the end, not only is the old dnsmasq/radvd not terminated when the network is stopped, but a new dnsmasq can't be started when the network is later restarted (because the original process is still listening on the ports that the new process wants). The solution is, when the initial "use stat to check for identical inodes" check for identity between /proc/$pid/exe and $binpath fails, to check /proc/$pid/exe for a link ending with " (deleted)" and if so, truncate that part of the link and compare what's left with the original binarypath. A twist to this problem is that on systems with "merged" /sbin and /usr/sbin (i.e. /sbin is really just a symlink to /usr/sbin; Fedora 17+ is an example of this), libvirt may have started the process using one path, but /proc/$pid/exe lists a different path (indeed, on F17 this is the case - libvirtd uses /sbin/dnsmasq, but /proc/$pid/exe shows "/usr/sbin/dnsmasq"). The further bit of code to resolve this is to call virFileResolveAllLinks() on both the original binarypath and on the truncated link we read from /proc/$pid/exe, and compare the results. The resulting code still succeeds in all the same cases it did before, but also succeeds if the binary was deleted or replaced after it was started. (cherry picked from commit 7bafe009d93f8b26330d52dc3289643699cf74f0) --- cfg.mk | 2 +- src/util/virpidfile.c | 96 +++++++++++++++++++++++++++++++++---------- 2 files changed, 76 insertions(+), 22 deletions(-) diff --git a/cfg.mk b/cfg.mk index d3c96bafc4..50e6a50c69 100644 --- a/cfg.mk +++ b/cfg.mk @@ -386,7 +386,7 @@ sc_prohibit_sprintf: $(_sc_search_regexp) sc_prohibit_readlink: - @prohibit='readlink *\(' \ + @prohibit='\ deletedTextLen) + procLink[procLinkLen - deletedTextLen] = 0; + + if ((ret = virFileResolveAllLinks(binPath, &resolvedBinPath)) < 0) + goto cleanup; + if ((ret = virFileResolveAllLinks(procLink, &resolvedProcLink)) < 0) + goto cleanup; + + ret = STREQ(resolvedBinPath, resolvedProcLink) ? 0 : -1; + +cleanup: + VIR_FREE(procPath); + VIR_FREE(procLink); + VIR_FREE(resolvedProcLink); + VIR_FREE(resolvedBinPath); + + /* return the originally set pid of -1 unless we proclaim success */ + if (ret == 0) + *pid = retPid; + return ret; }