Re: [LSF/MM ATTEND] HMM (heterogeneous memory manager) and GPU

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Oded Gabbay <oded.gabbay@gmail.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Jerome Glisse <j.glisse@gmail.com>,
	lsf-pc@lists.linux-foundation.org,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Joerg Roedel <joro@8bytes.org>
Subject: Re: [LSF/MM ATTEND] HMM (heterogeneous memory manager) and GPU
Date: Wed, 3 Feb 2016 13:41:53 +0200	[thread overview]
Message-ID: <CAFCwf12hkEzLbdop760Vuc6t-J71Vb2pu=y-8GPYLPFoguFRbw@mail.gmail.com> (raw)
In-Reply-To: <1454499350.4788.170.camel@infradead.org>

On Wed, Feb 3, 2016 at 1:35 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Wed, 2016-02-03 at 13:07 +0200, Oded Gabbay wrote:
>> > Another, perhaps trivial, question.
>> > When there is an address fault, who handles it ? the SVM driver, or
>> > each device driver ?
>> >
>> > In other words, is the model the same as (AMD) IOMMU where it binds
>> > amd_iommu driver to the IOMMU H/W, and that driver (amd_iommu/v2) is
>> > the only one which handles the PPR events ?
>> >
>> > If that is the case, then with SVM, how will the device driver be made
>> > aware of faults, if the SVM driver won't notify him about them,
>> > because it has already severed the connection between PASID and
>> > process ?
>
> In the ideal case, there's no need for the device driver to get
> involved at all. When a page isn't found in the page tables, the IOMMU
> code calls handle_mm_fault() and either populates the page and sends a
> a 'success' response, or sends an 'invalid fault' response back.
>
> To account for broken hardware, we *have* added a callback into the
> device driver when these faults happen. Ideally it should never be
> used, of course.
>
> In the case where the process has gone away, the PASID is still
> assigned and we still hold mm_count on the MM, just not mm_users. This
> callback into the device driver still occurs if a fault happens during
> process exit between the exit_mm() and exit_files() stage.
>
>> And another question, if I may, aren't you afraid of "false positive"
>> prints to dmesg ? I mean, I'm pretty sure page faults / pasid faults
>> errors will be logged somewhere, probably to dmesg. Aren't you
>> concerned of the users seeing those errors and thinking they may have
>> a bug, while actually the errors were only caused by process
>> termination ?
>
> If that's the case, it's easy enough to silence them. We are already
> explicitly testing for the 'defunct mm' case in our fault handler, to
> prevent us from faulting more pages into an obsolescent MM after its
> mm_users reaches zero and its page tables are supposed to have been
> torn down. That's the 'if(!atomic_inc_not_zere(&svm->mm->mm_users))
> goto bad_req;' part.
>
>> Or in that case you say that the application is broken, because if it
>> still had something running in the H/W, it should not have closed
>> itself ?
>
> That's also true but it's still nice to avoid confusion. Even if only
> to disambiguate cause and effect — we don't want people to see PASID
> faults which were caused by the process crashing, and to think that
> they might be involved in *causing* that process to crash...

Yes, that's why in our model, we aim to kill all running waves
*before* the amd_iommu_v2 driver unbinds the PASID.

>
> --
> David Woodhouse                            Open Source Technology Centre
> David.Woodhouse@intel.com                              Intel Corporation
>


It seems you have most of your bases covered. I'll stop harassing you now :)
But in seriousness, its interesting to see the different approaches
taken to handling pretty much the same type of H/W (IOMMU).

Thanks for your patience in answering my questions.

Oded

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-02-03 11:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-28 17:55 Jerome Glisse
2016-01-29  9:50 ` Kirill A. Shutemov
2016-01-29 13:35   ` Jerome Glisse
2016-02-01 15:46 ` Aneesh Kumar K.V
2016-02-02 23:03   ` Jerome Glisse
2016-02-03  0:40 ` David Woodhouse
2016-02-03  8:13   ` Oded Gabbay
2016-02-03  8:40     ` David Woodhouse
2016-02-03  9:21       ` Oded Gabbay
2016-02-03 10:15         ` David Woodhouse
2016-02-03 11:01           ` Oded Gabbay
2016-02-03 11:07             ` Oded Gabbay
2016-02-03 11:35               ` David Woodhouse
2016-02-03 11:41                 ` David Woodhouse
2016-02-03 11:41                 ` Oded Gabbay [this message]
2016-02-03 12:22                   ` David Woodhouse
2016-02-25 13:49   ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFCwf12hkEzLbdop760Vuc6t-J71Vb2pu=y-8GPYLPFoguFRbw@mail.gmail.com' \
    --to=oded.gabbay@gmail.com \
    --cc=dwmw2@infradead.org \
    --cc=j.glisse@gmail.com \
    --cc=joro@8bytes.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox