210 Commits

Author SHA1 Message Date
Sergio Lopez
5c06b7f862 vhost_user_block: Implement optional static polling
Actively polling the virtqueue significantly reduces the latency of
each I/O operation, at the expense of using more CPU time. This
features is specially useful when using low-latency devices (SSD,
NVMe) as the backend.

This change implements static polling. When a request arrives after
being idle, vhost_user_block will keep checking the virtqueue for new
requests, until POLL_QUEUE_US (50us) has passed without finding one.

POLL_QUEUE_US is defined to be 50us, based on the current latency of
enterprise SSDs (< 30us) and the overhead of the emulation.

This feature is enabled by default, and can be disabled by using the
"poll_queue" parameter of "block-backend".

This is a test using null_blk as a backend for the image, with the
following parameters:

 - null_blk gb=20 nr_devices=1 irqmode=2 completion_nsec=0 no_sched=1

With "poll_queue=false":

fio --ioengine=sync --bs=4k --rw randread --name randread --direct=1
--filename=/dev/vdb --time_based --runtime=10

randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.14
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=169MiB/s][r=43.2k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=433: Tue Feb 18 11:12:59 2020
  read: IOPS=43.2k, BW=169MiB/s (177MB/s)(1688MiB/10001msec)
    clat (usec): min=17, max=836, avg=21.64, stdev= 3.81
     lat (usec): min=17, max=836, avg=21.77, stdev= 3.81
    clat percentiles (nsec):
     |  1.00th=[19328],  5.00th=[19840], 10.00th=[20352], 20.00th=[21120],
     | 30.00th=[21376], 40.00th=[21376], 50.00th=[21376], 60.00th=[21632],
     | 70.00th=[21632], 80.00th=[21888], 90.00th=[22144], 95.00th=[22912],
     | 99.00th=[28544], 99.50th=[30336], 99.90th=[39168], 99.95th=[42752],
     | 99.99th=[71168]
   bw (  KiB/s): min=168440, max=188496, per=100.00%, avg=172912.00, stdev=3975.63, samples=19
   iops        : min=42110, max=47124, avg=43228.00, stdev=993.91, samples=19
  lat (usec)   : 20=5.90%, 50=94.08%, 100=0.02%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=0.01%
  cpu          : usr=10.35%, sys=25.82%, ctx=432417, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=432220,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=169MiB/s (177MB/s), 169MiB/s-169MiB/s (177MB/s-177MB/s), io=1688MiB (1770MB), run=10001-10001msec

Disk stats (read/write):
  vdb: ios=427867/0, merge=0/0, ticks=7346/0, in_queue=0, util=99.04%

With "poll_queue=true" (default):

fio --ioengine=sync --bs=4k --rw randread --name randread --direct=1
--filename=/dev/vdb --time_based --runtime=10

randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.14
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=260MiB/s][r=66.7k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=422: Tue Feb 18 11:14:47 2020
  read: IOPS=68.5k, BW=267MiB/s (280MB/s)(2674MiB/10001msec)
    clat (usec): min=10, max=966, avg=13.60, stdev= 3.49
     lat (usec): min=10, max=966, avg=13.70, stdev= 3.50
    clat percentiles (nsec):
     |  1.00th=[11200],  5.00th=[11968], 10.00th=[11968], 20.00th=[12224],
     | 30.00th=[12992], 40.00th=[13504], 50.00th=[13760], 60.00th=[13888],
     | 70.00th=[14016], 80.00th=[14144], 90.00th=[14272], 95.00th=[14656],
     | 99.00th=[20352], 99.50th=[23936], 99.90th=[35072], 99.95th=[36096],
     | 99.99th=[47872]
   bw (  KiB/s): min=265456, max=296456, per=100.00%, avg=274229.05, stdev=13048.14, samples=19
   iops        : min=66364, max=74114, avg=68557.26, stdev=3262.03, samples=19
  lat (usec)   : 20=98.84%, 50=1.15%, 100=0.01%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=0.01%
  cpu          : usr=8.24%, sys=21.15%, ctx=684669, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=684611,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=267MiB/s (280MB/s), 267MiB/s-267MiB/s (280MB/s-280MB/s), io=2674MiB (2804MB), run=10001-10001msec

Disk stats (read/write):
  vdb: ios=677855/0, merge=0/0, ticks=7026/0, in_queue=0, util=99.04%

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-19 17:13:47 +00:00
Sergio Lopez
0e4e27ea9d vhost_user_block: Make use of the EVENT_IDX feature
Now that vhost_user_backend and vm-virtio do support EVENT_IDX, use it
in vhost_user_block to reduce the number of notifications sent between
the driver and the device.

This is specially useful when using active polling on the virtqueue,
as it'll be implemented by a future patch.

This is a snapshot of kvm_stat while generating ~60K IOPS with fio on
the guest without EVENT_IDX:

 Event                                         Total %Total CurAvg/s
 kvm_entry                                    393454   20.3    62494
 kvm_exit                                     393446   20.3    62494
 kvm_apic_accept_irq                          378146   19.5    60268
 kvm_msi_set_irq                              369720   19.0    58881
 kvm_fast_mmio                                370497   19.1    58817
 kvm_hv_timer_state                            10197    0.5     1715
 kvm_msr                                        8770    0.5     1443
 kvm_wait_lapic_expire                          7018    0.4     1118
 kvm_apic                                       2768    0.1      538
 kvm_pv_tlb_flush                               2028    0.1      360
 kvm_vcpu_wakeup                                1453    0.1      278
 kvm_apic_ipi                                   1384    0.1      269
 kvm_fpu                                        1148    0.1      164
 kvm_pio                                         574    0.0	  82
 kvm_userspace_exit                              574    0.0	  82
 kvm_halt_poll_ns                                 24    0.0	   3

And this is the snapshot while doing the same thing with EVENT_IDX:

 Event                                         Total %Total CurAvg/s
 kvm_entry                                     35506   26.0     3873
 kvm_exit                                      35499   26.0     3873
 kvm_hv_timer_state                            14740   10.8     1672
 kvm_apic_accept_irq                           13017    9.5     1438
 kvm_msr                                       12845    9.4     1421
 kvm_wait_lapic_expire                         10422    7.6     1118
 kvm_apic                                       3788    2.8      502
 kvm_pv_tlb_flush                               2708    2.0      340
 kvm_vcpu_wakeup                                1992    1.5      258
 kvm_apic_ipi                                   1894    1.4      251
 kvm_fpu                                        1476    1.1      164
 kvm_pio                                         738    0.5       82
 kvm_userspace_exit                              738    0.5	  82
 kvm_msi_set_irq                                 701    0.5	  69
 kvm_fast_mmio                                   238    0.2        4
 kvm_halt_poll_ns                                 50    0.0        1
 kvm_ple_window_update                            28    0.0        0
 kvm_page_fault                                    4    0.0        0

It can be clearly appreciated how the number of vm exits per second,
specially the ones related to notifications (kvm_fast_mmio and
kvm_msi_set_irq) is drastically lower.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-19 17:13:47 +00:00
Sergio Lopez
1ef6996207 vhost_user_backend: Add helpers for EVENT_IDX
Add helpers to Vring and VhostUserSlaveReqHandler for EVENT_IDX, so
consumers of this crate can make use of this feature.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-19 17:13:47 +00:00
Rob Bradford
c33c38bd96 vhost_user_block: Port to new exit event strategy
Implement the newly added exit_event() method on the VhostUserBackend
trait to allow the backend to provide an EventFD for triggering an exit
of the worker thread.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-11 15:21:07 +01:00
Rob Bradford
7ca691fc2f vhost_user_block: Implement and use worker shutdown
Add a kill EventFD and use this to shutdown the worker thread when the
backend terminates.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-10 09:16:49 +01:00
Rob Bradford
710394b872 vhost_user_block: Forward the error from unexpected event
This is consistent with vhost-user-net and vhost-user-fs.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-10 09:16:49 +01:00
Rob Bradford
4f4c3d3ebe vhost_user_block: Make Error behave like net and fs versions
Make the Error struct behave more like the net and fs equivalents.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-10 09:16:49 +01:00
Yang Zhong
99da1dff90 vhost-user-blk: Add MQ support in backend
Adding the num_queues parameter for vhost-user-blk backend, which
can enable MQ support in the backend.

This patch has enabled the MQ support from handle_event, and the
vhost-user-backend crate will enable multiple threads to call this
handle_event to handle read/write operations.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
2020-02-03 09:49:27 +01:00
Rob Bradford
7f73eebbdb vhost_user_block: Split launching backend into its own function
Split the basic launching functionality into its own function in the
newly added vhost_user_block crate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-23 10:30:06 +00:00
Rob Bradford
1dd2451895 vhost_user_block: Refactor vhost_user_block backend code into a new crate
Extract the majority of the code that provides the vhost-user-block
backend into its own crate and port the binary to use it.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-23 10:30:06 +00:00