Discussion:
Google Comments on Virtio Draft Spec
Andrew Thornton
2014-04-29 01:37:16 UTC
Permalink
Hi All,

Included below is the Google feedback on “Virtual I/O Device (VIRTIO)
Version 1.0 Committee Specification Draft 01 03 December 2013”. This is
feedback collected
​ ​
​from
​ ​
a number of people
​ ​
but communications will be channeled through me.

The feedback is ordered by spec section numbers:

*General Virtio*

*3.1 Driver Initialization*

A desc​ription of
​ ​
the shutdown process is missing. Specifically the driver *must *reset the
device or disarm the virtio queues or bus mastering before allowing the
memory backing the virtio queues to be reused. It would probably be good
to specify a preferred mechanism.

*4.1.1 PCI Device Discovery*

This use of PCI device ID’s is an unusual way to handle PCI identification
and plays badly with Windows. Could we instead require the device ID be
0x10xx where xx is the already defined sub-vendor code. This reflects the
current reality of implementation requirements.

*4.1.2.1 Common Configuration structure layout*

*device_status*

​The spec states "​
Writing 0 resets the device
​". H
ow does the driver know when its complete. Proposal
​:​
the resetting driver will poll this register for a non-zero value until the
reset is complete after which DMA and interrupts will not occur.

*queue_size*

Is there a way to find the maximum queue size without resetting the whole
device?

*queue_enable*

What happens to the state of the queue when queue_enable transitions from
non-zero to zero?

*queue_desc/queue_avail/queue_used​*

Does a zero address indicate a disabled queue? This is how current devices
work but spec clarification would be useful.

*4.1.3.1 Device Initialization*

The driver must enable PCI bus mastering before programming any virtio
queues. (Otherwise the device shouldn’t be accessing memory but this has
been historically ignored). What happens when the bus master enable bit is
cleared when virtio is running?

*4.1.3.1.1 Virtio Device Configuration Layout Detection*

Instead of using PCI_CAP_ID_VNDR could a unique capability be allocated
from the PCI SIG? It would be useful to be able to identify the VIRTIO
capability without special driver knowledge.

*4.1.3.1.3 Virtio Configuration*

What happens when an invalid size is written to the queue_size register?

*4.1.3.4 Notification of Device Configuration Changes*

What portion of configuration space should be re-examined? We assume just
the device specific fields and not BARs or other generic configuration.
The spec should clarify this.

In the MSI-X enablement case no mention of setting an interrupt status
bit. In the case where a unique MSI isn’t allocated to this table entry
how does the driver detect this?

*Clarification of runtime device error handling*

When an error occurs such as invalid descriptor or a DMA to an invalid
address what does the device do? i.e. does it continue processing the
current descriptor or does it halt (our preference). How does the driver
recover the device? Is a virtio reset sufficient? What about inflight
DMAs?

*Virtio Net*

*5.1.6.4 Control Virtqueue*

Could the virtio_net_ctrl
​ ​
​structure ​be split into an request and response structure
​ ​
with​ command specific data present in both giving a space for the device
to respond with data instead of just a boolean ack. Could a vendor
specific command be added to allow for extensions?

*Virtio SCSI*

*max_channel, max_target, max_lun*

Are comparisons here less-than or less-than-or-equal? Both interpretations
are in the fields.

Could we standardize on returning the VIRTIO_SCSI_S_TRANSPORT_FAILURE code
for in flight commands when the device is unplugged.

The spec states “all task attributes may be mapped to SIMPLE by the
device” - this makes S_ORDERED, S_HEAD_OF_QUEUE or S_ACA commands
impossible to issue - plus some commands have implicit head of queue
attribute conflicting with this statement.

VIRTIO_SCSI_S_OVERRUN - could the length referred to be clarified - the
virtio buffers or the allocation_length in the CBD?

VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET - what is the expected behavior when
commands are in flight? What about an incomplete TMF_ABORT?


That's all folks - t
hanks in advance for your consideration,

Andy​
​
​ Thornton​
--
*"Debugging is twice as hard as writing the code in the first place.*
*Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."*
(Brian W. Kernighan)
Rusty Russell
2014-04-29 03:39:40 UTC
Permalink
Post by Andrew Thornton
Hi All,
Included below is the Google feedback on “Virtual I/O Device (VIRTIO)
Version 1.0 Committee Specification Draft 01 03 December 2013”. This is
feedback collected
Hi Andrew,

The second draft was released more recently, but most of your
comments still apply.

I've assigned it the following:
https://tools.oasis-open.org/issues/browse/VIRTIO-100

And I'll break it into subtasks by section, for various people to
address.

Thanks!
Rusty.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-***@lists.oasis-open.org
Unsubscribe: virtio-comment-***@lists.oasis-open.org
List help: virtio-comment-***@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
Rusty Russell
2014-06-05 07:05:31 UTC
Permalink
OK, we've resolved/closed all these issues now. Below is a summary
in conveniently quotable email form. Some responses were via
the mailing list, but this lists all the actual spec changes which
resulted:

VIRTIO-101: Virtio General
Closed (draft 2 has a shutdown section)

VIRTIO-102: PCI discovery
Resolved in r376:
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=376

VIRTIO-103: PCI Common configuration layout
Resolved in r364 and r365:
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=364
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=365

VIRTIO-104: PCI Operation
Resolved in r375:
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=375

VIRTIO-105: Virtio NET
Closed
(See https://lists.oasis-open.org/archives/virtio-comment/201404/msg00013.html )

VIRTIO-106: Virtio SCSI
Resolved in r372:
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=372

We don't seem to have closed the loop on getting responses from you, so
we plan on releasing a third and final draft in two weeks.

Cheers,
Rusty.


This publicly archived list offers a means to provide input to the

OASIS Virtual I/O Device (VIRTIO) TC.



In order to verify user consent to the Feedback License terms and

to minimize spam in the list archive, subscription is required

before posting.



Subscribe: virtio-comment-***@lists.oasis-open.org

Unsubscribe: virtio-comment-***@lists.oasis-open.org

List help: virtio-comment-***@lists.oasis-open.org

List archive: https://lists.oasis-open.org/archives/virtio-comment/

Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/

Join OASIS: https://www.oasis-open.org/join/
Andrew Thornton
2014-06-05 07:45:56 UTC
Permalink
Hi Rusty,

I send a couple of replies that seems to have gone astray.

I have been told there have been some issues with posting to lists from an @
google.com account so I'd be grateful if you could confirm receipt of this
mail.


VIRTIO-101, VIRTIO-102, VIRTIO-103 and VIRTIO-104 all look good.

VIRTIO-105 - we retract the request.

VIRTIO-106 - This is the reply I sent a couple of weeks ago that seems to
'Configuration'
Device configuration layout
1. max_sectors and cmd_per_lun are described as 'hints'

1.1. Can these become hard limits rather than 'hints'? (IE
devices can reject commands above the cmd_per_lun limit
or the max_sectors limit). If so, can we select a specific
error to return in that case?

2. cmd_per_lun describes 'the actual value to be used is the
minimum of cmd_per_lun and the virtqueue size'.

2.1. Does this mean that devices can reject concurrent commands
above min(cmd_per_lun, virtqueue_size)?

2.2. Do you really mean 'virtqueue_size'? At minimum a command
requires at least 2 entries in the virtqueue. Should this
minimum be virtqueue_size / 2?
Device operation: requestq
1. When a transport returns VIRTIO_SCSI_S_BUSY, can we specify that a
guest should retry the request? This would simplify device
implementations
in the face of resource limitations and would allow guests to control
I/O queueing.

2. When a target is hotunplugged with I/O inflight, can we specify which
error
response will be returned for the now-terminated I/Os?
Device operation: controlq
The ordering of Task Management Function completion with
respect to requests they are acting on is unspecified. However
SCSI midlayers require TMF commands complete _after_ the command(s)
they are aborting/reseting.

EX:
requestq controlq
1: REQUEST A
2: TMF ABORT A
3: COMPLETE A
(S_ABORTED)
4: COMPLETION TMF ABORT A
[good ordering]

requestq controlq
1: REQUEST A
2: TMF ABORT A
3: COMPLETION TMF ABORT A
4: COMPLETION A
[bad ordering! surprise completion may corrupt guest memory]

This requires a device ensure ordering between the controlq and
requestq processing; for TMF RESET, this means a reset must
drain all the request queues (searching for undispatched
commands; QEMU does not do this currently and can corrupt guest
memory in the worst case).

1. If we could have a feature flag (VIRTIO_SCSI_F_TMF_ON_REQUESTQ)
that allowed TMF commands to be sent down the requestqueue,ordering
would be naturally enforced and devices would save a lot of complexity.

2. If that is not possible, a guest driver can cycle a
no-op command through request queue(s) before aborting/resetting
a command. To do this, we need to codify a safe no-op command.

We could use a command w/ lun[0] = 0x0 as a safe no-op command.
This is currently the case for QEMU, vhost-scsi, and GCE. We would
like to have this formalized.

Thanks,

Andy
OK, we've resolved/closed all these issues now. Below is a summary
in conveniently quotable email form. Some responses were via
the mailing list, but this lists all the actual spec changes which
VIRTIO-101: Virtio General
Closed (draft 2 has a shutdown section)
VIRTIO-102: PCI discovery
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=376
VIRTIO-103: PCI Common configuration layout
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=364
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=365
VIRTIO-104: PCI Operation
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=375
VIRTIO-105: Virtio NET
Closed
(See
https://lists.oasis-open.org/archives/virtio-comment/201404/msg00013.html
)
VIRTIO-106: Virtio SCSI
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=372
We don't seem to have closed the loop on getting responses from you, so
we plan on releasing a third and final draft in two weeks.
Cheers,
Rusty.
--
*"Debugging is twice as hard as writing the code in the first place.*
*Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."*
(Brian W. Kernighan)
Andrew Thornton
2014-06-06 00:59:12 UTC
Permalink
Comments below;

'Configuration'
Device configuration layout
Post by Paolo Bonzini
Post by Andrew Thornton
1. max_sectors and cmd_per_lun are described as 'hints'
1.1. Can these become hard limits rather than 'hints'? (IE
devices can reject commands above the cmd_per_lun limit
or the max_sectors limit). If so, can we select a specific
error to return in that case?
First of all, I think it's not necessary to select an error for these
cases. These issues are not specific to virtio-scsi and the command
will succeed at the virtio-scsi level; for cmd_per_lun the the status
could be BUSY (not VIRTIO_SCSI_S_BUSY!) or TASK SET FULL, for transfer
length the SCSI standard says you get INVALID FIELD IN CDB. These
status or sense codes are defined in the appropriate SCSI standards.
The configuration limits are imposed by the hypervisor, so transfer
lengths or queue depths higher than the values in the configuration
should cause an error. The reason these are hints is because the issue
is quite complex if you do SCSI passthrough, and in that case a transfer
length or queue depth lower than the limit could trigger an error.
For example, each target or LUN could actually have its own transfer
limit, that is lower than max_sectors. In this case the initiator
should look for the block limits VPD page anyway.
As to cmd_per_lun, you could obey cmd_per_lun and still get TASK SET
FULL responses from the target if the host or other guests are using it
at the same time. Perhaps the hypervisor could change that to BUSY
(again, not VIRTIO_SCSI_S_BUSY), but this is again a generic SCSI target
implementation issue, not specific to virtio-scsi.
If you treat max_sectors and cmd_per_lun as transport-specific limits and
the INQUIRY data/Block Limits VPDs as SCSI-layer limits, the differences
become clearer -- for transfer that fail because of limitations from the
target (INQUIRY/BLOCK LIMITS VPD levels), the target's SCSI-layer response
is the correct one. For transfers that fail because they hit the transport
limit (max_sectors/cmd_per_lun), a transport-level response would make more
sense.
Post by Paolo Bonzini
Post by Andrew Thornton
2. When a target is hotunplugged with I/O inflight, can we specify which
error response will be returned for the now-terminated I/Os?
Either I/O is completed, or it is already documented to be ILLEGAL
REQUEST/LOGICAL UNIT NOT SUPPORTED.
When an I/O is already running against a target and it is hotremoved,
ILLEGAL REQUEST/LOGICAL UNIT NOT SUPPORTED is not a correct SCSI response.
That response would be appropriate if the I/O reached the target after it
was removed, but not for the case where the command was already running.

Also, consider long-running operations -- once a guest has already seen
them running on a target (ex: long running operation in progress), it would
not be correct to return ILLEGAL REQUEST/LOGICAL UNIT NOT SUPPORTED.

We would like to see a transport-layer response here, for example
VIRTIO_SCSI_S_TRANSPORT_FAILURE. Hot-unplugging a disk is analogous to
breaking the link to the target and Linux will reflect this as a
DID_TRANSPORT_DISRUPTED host byte.
Post by Paolo Bonzini
Post by Andrew Thornton
2. If that is not possible, a guest driver can cycle a
no-op command through request queue(s) before aborting/resetting
a command. To do this, we need to codify a safe no-op command.
We could use a command w/ lun[0] = 0x0 as a safe no-op command.
This is currently the case for QEMU, vhost-scsi, and GCE. We would
like to have this formalized.
I think it is also too late for this. It is a safer and smaller change,
but I'm not sure what the properties of the no-op command would be (e.g.
with respect to ordering) so I'm afraid of missing some important
detail.
1. REPORT LUNS to the well-known address would work. No device emulation
currently supports it to the well-known address, however.

2. Can we specify that it is not an error (the device will not enter a
broken state) to send a command with either an invalid LUN or a command
that is too short? (ex: a 1 byte command?). Even an invalid command
would serve as a fine no-op, if it doesn't wedge the device.
Post by Paolo Bonzini
'Configuration'
Post by Andrew Thornton
Device configuration layout
1. max_sectors and cmd_per_lun are described as 'hints'
1.1. Can these become hard limits rather than 'hints'? (IE
devices can reject commands above the cmd_per_lun limit
or the max_sectors limit). If so, can we select a specific
error to return in that case?
First of all, I think it's not necessary to select an error for these
cases. These issues are not specific to virtio-scsi and the command will
succeed at the virtio-scsi level; for cmd_per_lun the the status could be
BUSY (not VIRTIO_SCSI_S_BUSY!) or TASK SET FULL, for transfer length the
SCSI standard says you get INVALID FIELD IN CDB. These status or sense
codes are defined in the appropriate SCSI standards.
The configuration limits are imposed by the hypervisor, so transfer
lengths or queue depths higher than the values in the configuration should
cause an error. The reason these are hints is because the issue is quite
complex if you do SCSI passthrough, and in that case a transfer length or
queue depth lower than the limit could trigger an error.
For example, each target or LUN could actually have its own transfer
limit, that is lower than max_sectors. In this case the initiator should
look for the block limits VPD page anyway.
As to cmd_per_lun, you could obey cmd_per_lun and still get TASK SET FULL
responses from the target if the host or other guests are using it at the
same time. Perhaps the hypervisor could change that to BUSY (again, not
VIRTIO_SCSI_S_BUSY), but this is again a generic SCSI target implementation
issue, not specific to virtio-scsi.
2. cmd_per_lun describes 'the actual value to be used is the
Post by Andrew Thornton
minimum of cmd_per_lun and the virtqueue size'.
2.1. Does this mean that devices can reject concurrent commands
above min(cmd_per_lun, virtqueue_size)?
Yes (with TASK SET FULL status). Though if virtqueue_size < cmd_per_lun,
the driver actually won't have room to queue more than virtqueue_size items.
2.2. Do you really mean 'virtqueue_size'? At minimum a command
Post by Andrew Thornton
requires at least 2 entries in the virtqueue. Should this
minimum be virtqueue_size / 2?
Not if you use indirect descriptors.
Device operation: requestq
Post by Andrew Thornton
1. When a transport returns VIRTIO_SCSI_S_BUSY, can we specify that a
guest should retry the request? This would simplify device
implementations in the face of resource limitations and would
allow guests to control I/O queueing.
That usually makes sense, but it does not have to be that way. For
example, under Linux you can mark a request as "failfast" and avoid the
retry.
2. When a target is hotunplugged with I/O inflight, can we specify which
Post by Andrew Thornton
error response will be returned for the now-terminated I/Os?
Either I/O is completed, or it is already documented to be ILLEGAL
REQUEST/LOGICAL UNIT NOT SUPPORTED.
Device operation: controlq
Post by Andrew Thornton
The ordering of Task Management Function completion with
respect to requests they are acting on is unspecified. However
SCSI midlayers require TMF commands complete _after_ the command(s)
they are aborting/reseting.
I and Venkatesh sorted this out on the upstream linux-scsi mailing list
for the abort case.
The ordering of completing TMFs vs. requests are now documented, but the
Linux driver messed up this case.
This requires a device ensure ordering between the controlq and
Post by Andrew Thornton
requestq processing; for TMF RESET, this means a reset must
drain all the request queues (searching for undispatched
commands; QEMU does not do this currently and can corrupt guest
memory in the worst case).
I think this cannot happen on QEMU if the commands are undispatched _and_
the doorbell register has been written, since QEMU is basically
single-threaded. If the doorbell register has not been written to, the
driver is probably buggy (sending a reset and a command at the same time is
probably not a good idea).
1. If we could have a feature flag (VIRTIO_SCSI_F_TMF_ON_REQUESTQ)
Post by Andrew Thornton
that allowed TMF commands to be sent down the requestqueue,ordering
would be naturally enforced and devices would save a lot of complexity.
This would be too late for 1.0. I'm also not convinced it is a good idea,
for if the request queue is full you cannot send TMFs to abort commands.
Also, the virtio-scsi standard does not document how you use multiple
request queues, and multiple request queues would have the same ordering
problems as the separate control queue.
2. If that is not possible, a guest driver can cycle a
Post by Andrew Thornton
no-op command through request queue(s) before aborting/resetting
a command. To do this, we need to codify a safe no-op command.
We could use a command w/ lun[0] = 0x0 as a safe no-op command.
This is currently the case for QEMU, vhost-scsi, and GCE. We would
like to have this formalized.
I think it is also too late for this. It is a safer and smaller change,
but I'm not sure what the properties of the no-op command would be (e.g.
with respect to ordering) so I'm afraid of missing some important detail.
A REPORT LUNS command should work well as a no-op. Or we could document
that the target SHOULD implement the REPORT LUNS well-known LUN (C1/01),
and then use a TEST UNIT READY command to that LUN.
The C1/01 well-known LUN would be allowed in addition to 01/tgt/xx/yy
format. Since it's a SHOULD, it doesn't even require a feature bit. I
sent a patch for that.
Paolo
--
*"Debugging is twice as hard as writing the code in the first place.*
*Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."*
(Brian W. Kernighan)
Paolo Bonzini
2014-06-06 07:54:38 UTC
Permalink
Post by Andrew Thornton
Post by Paolo Bonzini
As to cmd_per_lun, you could obey cmd_per_lun and still get TASK SET
FULL responses from the target if the host or other guests are using it
at the same time. Perhaps the hypervisor could change that to BUSY
(again, not VIRTIO_SCSI_S_BUSY), but this is again a generic SCSI target
implementation issue, not specific to virtio-scsi.
If you treat max_sectors and cmd_per_lun as transport-specific limits and
the INQUIRY data/Block Limits VPDs as SCSI-layer limits, the differences
become clearer -- for transfer that fail because of limitations from the
target (INQUIRY/BLOCK LIMITS VPD levels), the target's SCSI-layer response
is the correct one. For transfers that fail because they hit the transport
limit (max_sectors/cmd_per_lun), a transport-level response would make more
sense.
Ok, then that would be the generic VIRTIO_SCSI_S_FAILURE (DID_ERROR).
Post by Andrew Thornton
Post by Paolo Bonzini
Post by Andrew Thornton
2. When a target is hotunplugged with I/O inflight, can we specify which
error response will be returned for the now-terminated I/Os?
Either I/O is completed, or it is already documented to be ILLEGAL
REQUEST/LOGICAL UNIT NOT SUPPORTED.
When an I/O is already running against a target and it is hotremoved,
ILLEGAL REQUEST/LOGICAL UNIT NOT SUPPORTED is not a correct SCSI response.
That response would be appropriate if the I/O reached the target after it
was removed, but not for the case where the command was already running.
Also, consider long-running operations -- once a guest has already seen
them running on a target (ex: long running operation in progress), it would
not be correct to return ILLEGAL REQUEST/LOGICAL UNIT NOT SUPPORTED.
We would like to see a transport-layer response here, for example
VIRTIO_SCSI_S_TRANSPORT_FAILURE. Hot-unplugging a disk is analogous to
breaking the link to the target and Linux will reflect this as a
DID_TRANSPORT_DISRUPTED host byte.
I see what you mean. I agree that VIRTIO_SCSI_S_TRANSPORT_FAILURE is fine.
Post by Andrew Thornton
1. REPORT LUNS to the well-known address would work. No device emulation
currently supports it to the well-known address, however.
2. Can we specify that it is not an error (the device will not enter a
broken state) to send a command with either an invalid LUN or a command
that is too short? (ex: a 1 byte command?). Even an invalid command
would serve as a fine no-op, if it doesn't wedge the device.
Hmm, no, I'd be a bit wary of doing that.

Regarding the failure codes, I think they're a fairly obvious match.
Since there is no "implementation hints" section, I'd rather defer the
change past 1.0.

I guess we could still do the well-known LUN change if I get a review.

Paolo

This publicly archived list offers a means to provide input to the

OASIS Virtual I/O Device (VIRTIO) TC.



In order to verify user consent to the Feedback License terms and

to minimize spam in the list archive, subscription is required

before posting.



Subscribe: virtio-comment-***@lists.oasis-open.org

Unsubscribe: virtio-comment-***@lists.oasis-open.org

List help: virtio-comment-***@lists.oasis-open.org

List archive: https://lists.oasis-open.org/archives/virtio-comment/

Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/

Join OASIS: https://www.oasis-open.org/join/

Rusty Russell
2014-06-06 03:13:13 UTC
Permalink
Post by Andrew Thornton
Hi Rusty,
I send a couple of replies that seems to have gone astray.
google.com account so I'd be grateful if you could confirm receipt of this
mail.
Sure. List is also publicly archived:

https://lists.oasis-open.org/archives/virtio-comment/
Post by Andrew Thornton
VIRTIO-106 - This is the reply I sent a couple of weeks ago that seems to
'Configuration'
Device configuration layout
1. max_sectors and cmd_per_lun are described as 'hints'
1.1. Can these become hard limits rather than 'hints'? (IE
devices can reject commands above the cmd_per_lun limit
or the max_sectors limit). If so, can we select a specific
error to return in that case?
2. cmd_per_lun describes 'the actual value to be used is the
minimum of cmd_per_lun and the virtqueue size'.
2.1. Does this mean that devices can reject concurrent commands
above min(cmd_per_lun, virtqueue_size)?
2.2. Do you really mean 'virtqueue_size'? At minimum a command
requires at least 2 entries in the virtqueue. Should this
minimum be virtqueue_size / 2?
Indirect descriptors allow this.

The virtqueue_size limit is more a "note: you can't have more than
virtqueue_size descriptors outstanding anyway", rather than something
which needs to be spelled out IMHO.

Leaving the rest to the experts (Paolo cc'd).

Thanks,
Rusty.
Post by Andrew Thornton
Device operation: requestq
1. When a transport returns VIRTIO_SCSI_S_BUSY, can we specify that a
guest should retry the request? This would simplify device
implementations
in the face of resource limitations and would allow guests to control
I/O queueing.
2. When a target is hotunplugged with I/O inflight, can we specify which
error
response will be returned for the now-terminated I/Os?
Device operation: controlq
The ordering of Task Management Function completion with
respect to requests they are acting on is unspecified. However
SCSI midlayers require TMF commands complete _after_ the command(s)
they are aborting/reseting.
requestq controlq
1: REQUEST A
2: TMF ABORT A
3: COMPLETE A
(S_ABORTED)
4: COMPLETION TMF ABORT A
[good ordering]
requestq controlq
1: REQUEST A
2: TMF ABORT A
3: COMPLETION TMF ABORT A
4: COMPLETION A
[bad ordering! surprise completion may corrupt guest memory]
This requires a device ensure ordering between the controlq and
requestq processing; for TMF RESET, this means a reset must
drain all the request queues (searching for undispatched
commands; QEMU does not do this currently and can corrupt guest
memory in the worst case).
1. If we could have a feature flag (VIRTIO_SCSI_F_TMF_ON_REQUESTQ)
that allowed TMF commands to be sent down the requestqueue,ordering
would be naturally enforced and devices would save a lot of complexity.
2. If that is not possible, a guest driver can cycle a
no-op command through request queue(s) before aborting/resetting
a command. To do this, we need to codify a safe no-op command.
We could use a command w/ lun[0] = 0x0 as a safe no-op command.
This is currently the case for QEMU, vhost-scsi, and GCE. We would
like to have this formalized.
Thanks,
Andy
OK, we've resolved/closed all these issues now. Below is a summary
in conveniently quotable email form. Some responses were via
the mailing list, but this lists all the actual spec changes which
VIRTIO-101: Virtio General
Closed (draft 2 has a shutdown section)
VIRTIO-102: PCI discovery
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=376
VIRTIO-103: PCI Common configuration layout
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=364
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=365
VIRTIO-104: PCI Operation
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=375
VIRTIO-105: Virtio NET
Closed
(See
https://lists.oasis-open.org/archives/virtio-comment/201404/msg00013.html
)
VIRTIO-106: Virtio SCSI
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/?rev=372
We don't seem to have closed the loop on getting responses from you, so
we plan on releasing a third and final draft in two weeks.
Cheers,
Rusty.
This publicly archived list offers a means to provide input to the

OASIS Virtual I/O Device (VIRTIO) TC.



In order to verify user consent to the Feedback License terms and

to minimize spam in the list archive, subscription is required

before posting.



Subscribe: virtio-comment-***@lists.oasis-open.org

Unsubscribe: virtio-comment-***@lists.oasis-open.org

List help: virtio-comment-***@lists.oasis-open.org

List archive: https://lists.oasis-open.org/archives/virtio-comment/

Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/

Join OASIS: https://www.oasis-open.org/join/
Paolo Bonzini
2014-06-05 09:42:16 UTC
Permalink
Post by Andrew Thornton
'Configuration'
Device configuration layout
1. max_sectors and cmd_per_lun are described as 'hints'
1.1. Can these become hard limits rather than 'hints'? (IE
devices can reject commands above the cmd_per_lun limit
or the max_sectors limit). If so, can we select a specific
error to return in that case?
First of all, I think it's not necessary to select an error for these
cases. These issues are not specific to virtio-scsi and the command
will succeed at the virtio-scsi level; for cmd_per_lun the the status
could be BUSY (not VIRTIO_SCSI_S_BUSY!) or TASK SET FULL, for transfer
length the SCSI standard says you get INVALID FIELD IN CDB. These
status or sense codes are defined in the appropriate SCSI standards.

The configuration limits are imposed by the hypervisor, so transfer
lengths or queue depths higher than the values in the configuration
should cause an error. The reason these are hints is because the issue
is quite complex if you do SCSI passthrough, and in that case a transfer
length or queue depth lower than the limit could trigger an error.

For example, each target or LUN could actually have its own transfer
limit, that is lower than max_sectors. In this case the initiator
should look for the block limits VPD page anyway.

As to cmd_per_lun, you could obey cmd_per_lun and still get TASK SET
FULL responses from the target if the host or other guests are using it
at the same time. Perhaps the hypervisor could change that to BUSY
(again, not VIRTIO_SCSI_S_BUSY), but this is again a generic SCSI target
implementation issue, not specific to virtio-scsi.
Post by Andrew Thornton
2. cmd_per_lun describes 'the actual value to be used is the
minimum of cmd_per_lun and the virtqueue size'.
2.1. Does this mean that devices can reject concurrent commands
above min(cmd_per_lun, virtqueue_size)?
Yes (with TASK SET FULL status). Though if virtqueue_size <
cmd_per_lun, the driver actually won't have room to queue more than
virtqueue_size items.
Post by Andrew Thornton
2.2. Do you really mean 'virtqueue_size'? At minimum a command
requires at least 2 entries in the virtqueue. Should this
minimum be virtqueue_size / 2?
Not if you use indirect descriptors.
Post by Andrew Thornton
Device operation: requestq
1. When a transport returns VIRTIO_SCSI_S_BUSY, can we specify that a
guest should retry the request? This would simplify device
implementations in the face of resource limitations and would
allow guests to control I/O queueing.
That usually makes sense, but it does not have to be that way. For
example, under Linux you can mark a request as "failfast" and avoid the
retry.
Post by Andrew Thornton
2. When a target is hotunplugged with I/O inflight, can we specify which
error response will be returned for the now-terminated I/Os?
Either I/O is completed, or it is already documented to be ILLEGAL
REQUEST/LOGICAL UNIT NOT SUPPORTED.
Post by Andrew Thornton
Device operation: controlq
The ordering of Task Management Function completion with
respect to requests they are acting on is unspecified. However
SCSI midlayers require TMF commands complete _after_ the command(s)
they are aborting/reseting.
I and Venkatesh sorted this out on the upstream linux-scsi mailing list
for the abort case.

The ordering of completing TMFs vs. requests are now documented, but the
Linux driver messed up this case.
Post by Andrew Thornton
This requires a device ensure ordering between the controlq and
requestq processing; for TMF RESET, this means a reset must
drain all the request queues (searching for undispatched
commands; QEMU does not do this currently and can corrupt guest
memory in the worst case).
I think this cannot happen on QEMU if the commands are undispatched
_and_ the doorbell register has been written, since QEMU is basically
single-threaded. If the doorbell register has not been written to, the
driver is probably buggy (sending a reset and a command at the same time
is probably not a good idea).
Post by Andrew Thornton
1. If we could have a feature flag (VIRTIO_SCSI_F_TMF_ON_REQUESTQ)
that allowed TMF commands to be sent down the requestqueue,ordering
would be naturally enforced and devices would save a lot of complexity.
This would be too late for 1.0. I'm also not convinced it is a good
idea, for if the request queue is full you cannot send TMFs to abort
commands. Also, the virtio-scsi standard does not document how you use
multiple request queues, and multiple request queues would have the same
ordering problems as the separate control queue.
Post by Andrew Thornton
2. If that is not possible, a guest driver can cycle a
no-op command through request queue(s) before aborting/resetting
a command. To do this, we need to codify a safe no-op command.
We could use a command w/ lun[0] = 0x0 as a safe no-op command.
This is currently the case for QEMU, vhost-scsi, and GCE. We would
like to have this formalized.
I think it is also too late for this. It is a safer and smaller change,
but I'm not sure what the properties of the no-op command would be (e.g.
with respect to ordering) so I'm afraid of missing some important detail.

A REPORT LUNS command should work well as a no-op. Or we could document
that the target SHOULD implement the REPORT LUNS well-known LUN (C1/01),
and then use a TEST UNIT READY command to that LUN.

The C1/01 well-known LUN would be allowed in addition to 01/tgt/xx/yy
format. Since it's a SHOULD, it doesn't even require a feature bit. I
sent a patch for that.

Paolo

This publicly archived list offers a means to provide input to the

OASIS Virtual I/O Device (VIRTIO) TC.



In order to verify user consent to the Feedback License terms and

to minimize spam in the list archive, subscription is required

before posting.



Subscribe: virtio-comment-***@lists.oasis-open.org

Unsubscribe: virtio-comment-***@lists.oasis-open.org

List help: virtio-comment-***@lists.oasis-open.org

List archive: https://lists.oasis-open.org/archives/virtio-comment/

Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/

Join OASIS: https://www.oasis-open.org/join/
Loading...