logging: lockdown the systemd service configuration

The 'systemd-analyze security' command looks at the unit file
configuration and reports on any settings which increase the
attack surface for the daemon. Since most systemd units are
fairly minimalist, this is generally informing us about settings
that we never put any thought into using before.

In its current configuration it reports

  # systemd-analyze security virtlogd.service
  ...snip...
  → Overall exposure level for virtlogd.service: 9.6 UNSAFE 😨

which is pretty terrible as a score.

If we apply all of the recommendations that appear possible
without (knowingly) breaking functionality it reports:

  # systemd-analyze security virtlogd.service
  ...snip...
  → Overall exposure level for virtlogd.service: 2.2 OK 🙂

which is a pretty decent improvement.

Some of the settings we would like to enable require a systemd
version that is newer than that available in our oldest distro
target - RHEL-8 at v239.

NB, RestrictSUIDSGID is technically newer than 239, but RHEL-8
backported it, and other distros we target have it by default.

Remaining recommendations are

✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)

  We block FOWNER/IPC_OWNER, but can't block the two DAC
  capabilities. Historically apps/users might point QEMU
  to log files in $HOME, pre-created with their own user
  ID.

✗ IPAddressDeny=

  Not required since RestrictAddressFamilies blocks IP
  usage. Ignoring this avoids the overhead of creating
  a traffic filter than will never be used.

✗ NoNewPrivileges=

  Highly desirable, but cannot enable it yet, because it
  will block the ability to transition to the virtlogd_t
  SELinux domain during execve. The SELinux policy needs
  fixing to permit this transition under NNP first.

✗ PrivateTmp=

  There is a decent chance people have VMs configured
  with a serial port logfile pointing at /tmp. We would
  cause a regression to use private /tmp for logging

✗ PrivateUsers=

  This would put virtlogd inside a user namespace where
  its root is in fact unprivileged. Same problem as the
  User= setting below

✗ ProcSubset=

  Libraries we link to might read certain non-PID related
  files from /proc

✗ ProtectClock=

  Requires v245

✗ ProtectHome=

  Same problem as PrivateTmp=. There's a decent chance
  that someone has a VM configured to write a logfile
  to /home

✗ ProtectHostname=

  Requires v241

✗ ProtectKernelLogs

  Requires v244

✗ ProtectProc

  Requires v247

✗ ProtectSystem=

  We only set it to 'full', as 'strict' is not viable for
  our required usage

✗ RootDirectory=/RootImage=

  We are not capable of running inside a custom chroot
  given needs to write log files to arbitrary places

✗ RestrictAddressFamilies=~AF_UNIX

  We need AF_UNIX to communicate with other libvirt daemons

✗ SystemCallFilter=~@resources

  We link to libvirt.so which links to libnuma.so which has
  a constructor that calls set_mempolicy. This is highly
  undesirable todo during a constructor.

✗ User=/DynamicUser=

  This is highly desirable, but we currently read/write
  logs as root, and directories we're told to write into
  could be anywhere. So using a non-root user would have
  a major risk of regressions for applications and also
  have upgrade implications

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This commit is contained in:
Daniel P. Berrangé 2022-03-04 11:59:23 +00:00
parent bfcf4be172
commit e7facdca25

View File

@ -14,6 +14,100 @@ EnvironmentFile=-@initconfdir@/virtlogd
ExecStart=@sbindir@/virtlogd $VIRTLOGD_ARGS
ExecReload=/bin/kill -USR1 $MAINPID
CapabilityBoundingSet=~CAP_AUDIT_CONTROL
CapabilityBoundingSet=~CAP_AUDIT_READ
CapabilityBoundingSet=~CAP_AUDIT_WRITE
CapabilityBoundingSet=~CAP_BLOCK_SUSPEND
CapabilityBoundingSet=~CAP_CHOWN
# Mgmt app/user might have pre-created log files that we're
# told to open and write to, or be storing them in otherwise
# inaccessible locations like $HOME. So we need to ignore
# DAC permission checks.
#CapabilityBoundingSet=~CAP_DAC_OVERRIDE
#CapabilityBoundingSet=~CAP_DAC_READ_SEARCH
CapabilityBoundingSet=~CAP_FOWNER
CapabilityBoundingSet=~CAP_FSETID
CapabilityBoundingSet=~CAP_IPC_LOCK
CapabilityBoundingSet=~CAP_IPC_OWNER
CapabilityBoundingSet=~CAP_KILL
CapabilityBoundingSet=~CAP_LEASE
CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE
CapabilityBoundingSet=~CAP_MAC_ADMIN
CapabilityBoundingSet=~CAP_MAC_OVERRIDE
CapabilityBoundingSet=~CAP_MKNOD
CapabilityBoundingSet=~CAP_NET_ADMIN
CapabilityBoundingSet=~CAP_NET_BIND_SERVICE
CapabilityBoundingSet=~CAP_NET_BROADCAST
CapabilityBoundingSet=~CAP_NET_RAW
CapabilityBoundingSet=~CAP_SETFCAP
CapabilityBoundingSet=~CAP_SETPCAP
CapabilityBoundingSet=~CAP_SETGID
CapabilityBoundingSet=~CAP_SETUID
CapabilityBoundingSet=~CAP_SYSLOG
CapabilityBoundingSet=~CAP_SYS_ADMIN
CapabilityBoundingSet=~CAP_SYS_BOOT
CapabilityBoundingSet=~CAP_SYS_CHROOT
CapabilityBoundingSet=~CAP_SYS_MODULE
CapabilityBoundingSet=~CAP_SYS_NICE
CapabilityBoundingSet=~CAP_SYS_PACCT
CapabilityBoundingSet=~CAP_SYS_PTRACE
CapabilityBoundingSet=~CAP_SYS_RAWIO
CapabilityBoundingSet=~CAP_SYS_RESOURCE
CapabilityBoundingSet=~CAP_SYS_TIME
CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG
CapabilityBoundingSet=~CAP_WAKE_ALARM
LockPersonality=true
MemoryDenyWriteExecute=true
# Cannot enable this as it prevents transitioning to
# the confined SELinux virtlogd_t domain on execve
# unless we modify the policy to allow this.
#NoNewPrivileges=true
PrivateDevices=true
PrivateMounts=true
PrivateNetwork=true
# XXX someone could configure QEMU to log a serial port to an
# arbitrary directory, including /tmp, even if this is ill-advised
#PrivateTmp=true
# Not until oldest build target has systemd >= v245
#ProtectClock=true
ProtectControlGroups=true
# Not until oldest build target has systemd >= v241
#ProtectHostname=true
# Not until oldest build target has systemd >= v244
#ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
# Not until oldest build target has systemd >= v247
#ProtectProc=invisible
ProtectSystem=full
RestrictAddressFamilies=AF_UNIX
RestrictNamespaces=~cgroup
RestrictNamespaces=~ipc
RestrictNamespaces=~mnt
RestrictNamespaces=~net
RestrictNamespaces=~pid
RestrictNamespaces=~user
RestrictNamespaces=~uts
RestrictRealtime=true
RestrictSUIDSGID=true
SystemCallArchitectures=native
SystemCallFilter=~@clock
SystemCallFilter=~@debug
SystemCallFilter=~@module
SystemCallFilter=~@mount
SystemCallFilter=~@raw-io
SystemCallFilter=~@reboot
SystemCallFilter=~@swap
SystemCallFilter=~@privileged
# Unfortunately we link to libnuma via libvirt.so which
# has a constructor that runs unconditionally that invokes
# set_mempolicy()
#SystemCallFilter=~@resources
SystemCallFilter=~@cpu-emulation
SystemCallFilter=~@obsolete
UMask=077
[Install]
WantedBy=multi-user.target
Also=virtlogd.socket