ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Bart Van Assche <bart.vanassche@sandisk.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@infradead.org>
Cc: linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	ksummit-discuss@lists.linuxfoundation.org
Subject: Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity
Date: Wed, 15 Jul 2015 08:41:05 -0700	[thread overview]
Message-ID: <55A67F11.1030709@sandisk.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1507151408570.18576@nanos>

On 07/15/2015 05:12 AM, Thomas Gleixner wrote:
> On Wed, 15 Jul 2015, Christoph Hellwig wrote:
>> Many years ago we decided to move setting of IRQ to core affnities to
>> userspace with the irqbalance daemon.
>>
>> These days we have systems with lots of MSI-X vector, and we have
>> hardware and subsystem support for per-CPU I/O queues in the block
>> layer, the RDMA subsystem and probably the network stack (I'm not too
>> familar with the recent developments there).  It would really help the
>> out of the box performance and experience if we could allow such
>> subsystems to bind interrupt vectors to the node that the queue is
>> configured on.
>>
>> I'd like to discuss if the rationale for moving the IRQ affinity setting
>> fully to userspace are still correct in todays world any any pitfalls
>> we'll have to learn from in irqbalanced and the old in-kernel affinity
>> code.
>
> I think setting an initial affinity is not going to create the horror
> of the old in-kernel irq balancer again. It still could be changed
> from user space and does not try to be smart by moving interrupts
> around in circles all the time.

Thanks Thomas for your feedback. But no matter whether IRQ balancing 
happens in user space or in the kernel, the following issues need to be 
addressed and have not yet been addressed today:
* irqbalanced is not aware of the relationship between MSI-X vectors.
   If e.g. two kernel drivers each allocate 24 MSI-X vectors for the
   PCIe interfaces they control irqbalanced could e.g. decide to
   associate all MSI-X vectors for the first PCIe interface with a first
   set of CPUs and the MSI-X vectors of the second PCIe interface with a
   second set of CPUs. This will result in suboptimal performance if
   these two PCIe interfaces are used alternatingly instead of
   simultaneously.
* With blk-mq and scsi-mq optimal performance can only be achieved if
   the relationship between MSI-X vector and NUMA node does not change
   over time. This is necessary to allow a blk-mq/scsi-mq driver to
   ensure that interrupts are processed on the same NUMA node as the
   node on which the data structures for a communication channel have
   been allocated. However, today there is no API that allows
   blk-mq/scsi-mq drivers and irqbalanced to exchange information
   about the relationship between MSI-X vector ranges and NUMA nodes.
   The only approach I know of that works today to define IRQ affinity
   for blk-mq/scsi-mq drivers is to disable irqbalanced and to run a
   custom script that defines IRQ affinity (see e.g. the
   spread-mlx4-ib-interrupts attachment of 
http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/21312/focus=98409).

Bart.

  reply	other threads:[~2015-07-15 15:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-15 12:07 Christoph Hellwig
2015-07-15 12:12 ` Thomas Gleixner
2015-07-15 15:41   ` Bart Van Assche [this message]
2015-07-15 17:19     ` Keith Busch
2015-07-15 17:25       ` Jens Axboe
2015-07-15 18:24         ` Sagi Grimberg
2015-07-15 18:48         ` Matthew Wilcox
2015-07-16  6:13           ` Michael S. Tsirkin
2015-07-17 15:51           ` Thomas Gleixner
2015-07-15 14:38 ` Christoph Lameter
2015-07-15 14:56 ` Marc Zyngier
2015-07-15 16:05 ` Michael S. Tsirkin
2015-10-12 16:09 ` Theodore Ts'o
2015-10-12 18:41   ` Christoph Hellwig
2015-10-14 15:56     ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55A67F11.1030709@sandisk.com \
    --to=bart.vanassche@sandisk.com \
    --cc=hch@infradead.org \
    --cc=ksummit-discuss@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox