From: Oded Gabbay <oded.gabbay@gmail.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Jerome Glisse <j.glisse@gmail.com>,
lsf-pc@lists.linux-foundation.org,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Joerg Roedel <joro@8bytes.org>
Subject: Re: [LSF/MM ATTEND] HMM (heterogeneous memory manager) and GPU
Date: Wed, 3 Feb 2016 13:41:53 +0200 [thread overview]
Message-ID: <CAFCwf12hkEzLbdop760Vuc6t-J71Vb2pu=y-8GPYLPFoguFRbw@mail.gmail.com> (raw)
In-Reply-To: <1454499350.4788.170.camel@infradead.org>
On Wed, Feb 3, 2016 at 1:35 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Wed, 2016-02-03 at 13:07 +0200, Oded Gabbay wrote:
>> > Another, perhaps trivial, question.
>> > When there is an address fault, who handles it ? the SVM driver, or
>> > each device driver ?
>> >
>> > In other words, is the model the same as (AMD) IOMMU where it binds
>> > amd_iommu driver to the IOMMU H/W, and that driver (amd_iommu/v2) is
>> > the only one which handles the PPR events ?
>> >
>> > If that is the case, then with SVM, how will the device driver be made
>> > aware of faults, if the SVM driver won't notify him about them,
>> > because it has already severed the connection between PASID and
>> > process ?
>
> In the ideal case, there's no need for the device driver to get
> involved at all. When a page isn't found in the page tables, the IOMMU
> code calls handle_mm_fault() and either populates the page and sends a
> a 'success' response, or sends an 'invalid fault' response back.
>
> To account for broken hardware, we *have* added a callback into the
> device driver when these faults happen. Ideally it should never be
> used, of course.
>
> In the case where the process has gone away, the PASID is still
> assigned and we still hold mm_count on the MM, just not mm_users. This
> callback into the device driver still occurs if a fault happens during
> process exit between the exit_mm() and exit_files() stage.
>
>> And another question, if I may, aren't you afraid of "false positive"
>> prints to dmesg ? I mean, I'm pretty sure page faults / pasid faults
>> errors will be logged somewhere, probably to dmesg. Aren't you
>> concerned of the users seeing those errors and thinking they may have
>> a bug, while actually the errors were only caused by process
>> termination ?
>
> If that's the case, it's easy enough to silence them. We are already
> explicitly testing for the 'defunct mm' case in our fault handler, to
> prevent us from faulting more pages into an obsolescent MM after its
> mm_users reaches zero and its page tables are supposed to have been
> torn down. That's the 'if(!atomic_inc_not_zere(&svm->mm->mm_users))
> goto bad_req;' part.
>
>> Or in that case you say that the application is broken, because if it
>> still had something running in the H/W, it should not have closed
>> itself ?
>
> That's also true but it's still nice to avoid confusion. Even if only
> to disambiguate cause and effect — we don't want people to see PASID
> faults which were caused by the process crashing, and to think that
> they might be involved in *causing* that process to crash...
Yes, that's why in our model, we aim to kill all running waves
*before* the amd_iommu_v2 driver unbinds the PASID.
>
> --
> David Woodhouse Open Source Technology Centre
> David.Woodhouse@intel.com Intel Corporation
>
It seems you have most of your bases covered. I'll stop harassing you now :)
But in seriousness, its interesting to see the different approaches
taken to handling pretty much the same type of H/W (IOMMU).
Thanks for your patience in answering my questions.
Oded
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-02-03 11:42 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-28 17:55 Jerome Glisse
2016-01-29 9:50 ` Kirill A. Shutemov
2016-01-29 13:35 ` Jerome Glisse
2016-02-01 15:46 ` Aneesh Kumar K.V
2016-02-02 23:03 ` Jerome Glisse
2016-02-03 0:40 ` David Woodhouse
2016-02-03 8:13 ` Oded Gabbay
2016-02-03 8:40 ` David Woodhouse
2016-02-03 9:21 ` Oded Gabbay
2016-02-03 10:15 ` David Woodhouse
2016-02-03 11:01 ` Oded Gabbay
2016-02-03 11:07 ` Oded Gabbay
2016-02-03 11:35 ` David Woodhouse
2016-02-03 11:41 ` David Woodhouse
2016-02-03 11:41 ` Oded Gabbay [this message]
2016-02-03 12:22 ` David Woodhouse
2016-02-25 13:49 ` Joerg Roedel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFCwf12hkEzLbdop760Vuc6t-J71Vb2pu=y-8GPYLPFoguFRbw@mail.gmail.com' \
--to=oded.gabbay@gmail.com \
--cc=dwmw2@infradead.org \
--cc=j.glisse@gmail.com \
--cc=joro@8bytes.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox