linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Jiajun Xie <jiajun.xie.sh@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1] mm: fix unmap_mapping_range high bits shift bug
Date: Thu, 21 Dec 2023 14:08:23 -0800	[thread overview]
Message-ID: <20231221140823.2908189514c0081ae9efbda8@linux-foundation.org> (raw)
In-Reply-To: <CADOw95fss1AY_xuQV+4iOLZOA0ofYAaK7uCJHPiuVVLZDZBa6A@mail.gmail.com>

On Thu, 21 Dec 2023 13:40:11 +0800 Jiajun Xie <jiajun.xie.sh@gmail.com> wrote:

> > (obviously bad, but it's good to spell it out) and under what
> > circumstances it occurs?
> 
> Thanks for the quick reply.
> 
> The issue happens in Heterogeneous computing, where the
> device(e.g. gpu) and host share the same virtual address space.
> 
> A simple workflow pattern which hit the issue is:
>         /* host */
>     1. userspace first mmap a file backed VA range with specified offset.
>                         e.g. (offset=0x800..., mmap return: va_a)
>     2. write some data to the corresponding sys page
>                          e.g. (va_a = 0xAABB)
>         /* device */
>     3. gpu workload touches VA, triggers gpu fault and notify the host.
>         /* host */
>     4. reviced gpu fault notification, then it will:
>             4.1 unmap host pages and also takes care of cpu tlb
>                   (use unmap_mapping_range with offset=0x800...)
>             4.2 migrate sys page to device
>             4.3 setup device page table and resolve device fault.
>         /* device */
>     5. gpu workload continued, it accessed va_a and got 0xAABB.
>     6. gpu workload continued, it wrote 0xBBCC to va_a.
>         /* host */
>     7. userspace access va_a, as expected, it will:
>             7.1 trigger cpu vm fault.
>             7.2 driver handling fault to migrate gpu local page to host.
>     8. userspace then could correctly get 0xBBCC from va_a
>     9. done
> 
> But in step 4.1, if we hitted the bug this patch mentioned, then user space
> would never trigger cpu fault, and still get the old value: 0xAABB.

Thanks.  Based on the above, I added cc:stable to the changelog so the
fix will be backported into earlier kernels (it looks like that's 20+
years worth!).  And I pasted the above text into that changelog.



      reply	other threads:[~2023-12-21 22:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-20  5:28 jiajun.xie
2023-12-20 17:53 ` Andrew Morton
2023-12-21  5:40   ` Jiajun Xie
2023-12-21 22:08     ` Andrew Morton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231221140823.2908189514c0081ae9efbda8@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=jiajun.xie.sh@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox