linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,  stable <stable@vger.kernel.org>,
	linux-s390 <linux-s390@vger.kernel.org>
Subject: Re: [PATCH v2 2/2] mm/hugetlb: support write-faults in shared mappings
Date: Mon, 15 Aug 2022 20:03:20 +0200	[thread overview]
Message-ID: <CADFyXm40iiz-xFpLK4qGgHGh5Qp+98G9qxnqC20c8qtRiKt9_A@mail.gmail.com> (raw)
In-Reply-To: <20220815175929.303774fd@thinkpad>

On Mon, Aug 15, 2022 at 5:59 PM Gerald Schaefer
<gerald.schaefer@linux.ibm.com> wrote:
>
> On Mon, 15 Aug 2022 17:07:32 +0200
> David Hildenbrand <david@redhat.com> wrote:
>
> > On Mon, Aug 15, 2022 at 3:36 PM Gerald Schaefer
> > <gerald.schaefer@linux.ibm.com> wrote:
> > >
> > > On Thu, 11 Aug 2022 11:59:09 -0700
> > > Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >
> > > > On 08/11/22 12:34, David Hildenbrand wrote:
> > > > > If we ever get a write-fault on a write-protected page in a shared mapping,
> > > > > we'd be in trouble (again). Instead, we can simply map the page writable.
> > > > >
> > > > <snip>
> > > > >
> > > > > Reason is that uffd-wp doesn't clear the uffd-wp PTE bit when
> > > > > unregistering and consequently keeps the PTE writeprotected. Reason for
> > > > > this is to avoid the additional overhead when unregistering. Note
> > > > > that this is the case also for !hugetlb and that we will end up with
> > > > > writable PTEs that still have the uffd-wp PTE bit set once we return
> > > > > from hugetlb_wp(). I'm not touching the uffd-wp PTE bit for now, because it
> > > > > seems to be a generic thing -- wp_page_reuse() also doesn't clear it.
> > > > >
> > > > > VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE
> > > > > indicates that MAP_SHARED handling was at least envisioned, but could never
> > > > > have worked as expected.
> > > > >
> > > > > While at it, make sure that we never end up in hugetlb_wp() on write
> > > > > faults without VM_WRITE, because we don't support maybe_mkwrite()
> > > > > semantics as commonly used in the !hugetlb case -- for example, in
> > > > > wp_page_reuse().
> > > >
> > > > Nit,
> > > > to me 'make sure that we never end up in hugetlb_wp()' implies that
> > > > we would check for condition in callers as opposed to first thing in
> > > > hugetlb_wp().  However, I am OK with description as it.
> > >
> >
> > Hi Gerald,
> >
> > > Is that new WARN_ON_ONCE() in hugetlb_wp() meant to indicate a real bug?
> >
> > Most probably, unless I am missing something important.
> >
> > Something triggers FAULT_FLAG_WRITE on a VMA without VM_WRITE and
> > hugetlb_wp() would map the pte writable.
> > Consequently, we'd have a writable pte inside a VMA that does not have
> > write permissions, which is dubious. My check prevents that and bails
> > out.
> >
> > Ordinary (!hugetlb) faults have maybe_mkwrite() (e.g., for FOLL_FORCE
> > or breaking COW) semantics such that we won't be mapping PTEs writable
> > if the VMA does not have write permissions.
> >
> > I suspect that either
> >
> > a) Some write fault misses a protection check and ends up triggering a
> > FAULT_FLAG_WRITE where we should actually fail early.
> >
> > b) The write fault is valid and some VMA misses proper flags (VM_WRITE).
> >
> > c) The write fault is valid (e.g., for breaking COW or FOLL_FORCE) and
> > we'd actually want maybe_mkwrite semantics.
> >
> > > It is triggered by libhugetlbfs testcase "HUGETLB_ELFMAP=R linkhuge_rw"
> > > (at least on s390), and crashes our CI, because it runs with panic_on_warn
> > > enabled.
> > >
> > > Not sure if this means that we have bug elsewhere, allowing us to
> > > get to the WARN in hugetlb_wp().
> >
> > That's what I suspect. Do you have a backtrace?
>
> Sure, forgot to send it with initial reply...
>
> [   82.574749] ------------[ cut here ]------------
> [   82.574751] WARNING: CPU: 9 PID: 1674 at mm/hugetlb.c:5264 hugetlb_wp+0x3be/0x818
> [   82.574759] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc uvdevice s390_trng vfio_ccw mdev vfio_iommu_type1 eadm_sch vfio zcrypt_cex4 sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4
> [   82.574785] CPU: 9 PID: 1674 Comm: linkhuge_rw Kdump: loaded Not tainted 5.19.0-next-20220815 #36
> [   82.574787] Hardware name: IBM 3931 A01 704 (LPAR)
> [   82.574788] Krnl PSW : 0704c00180000000 00000006c9d4bc6a (hugetlb_wp+0x3c2/0x818)
> [   82.574791]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [   82.574794] Krnl GPRS: 000000000227c000 0000000008640071 0000000000000000 0000000001200000
> [   82.574796]            0000000001200000 00000000b5a98090 0000000000000255 00000000adb2c898
> [   82.574797]            0000000000000000 00000000adb2c898 0000000001200000 00000000b5a98090
> [   82.574799]            000000008c408000 0000000092fd7300 000003800339bc10 000003800339baf8
> [   82.574803] Krnl Code: 00000006c9d4bc5c: f160000407fe        mvo     4(7,%r0),2046(1,%r0)
>            00000006c9d4bc62: 47000700           bc      0,1792
>           #00000006c9d4bc66: af000000           mc      0,0
>           >00000006c9d4bc6a: a7a80040           lhi     %r10,64
>            00000006c9d4bc6e: b916002a           llgfr   %r2,%r10
>            00000006c9d4bc72: eb6ff1600004       lmg     %r6,%r15,352(%r15)
>            00000006c9d4bc78: 07fe               bcr     15,%r14
>            00000006c9d4bc7a: 47000700           bc      0,1792
> [   82.574814] Call Trace:
> [   82.574842]  [<00000006c9d4bc6a>] hugetlb_wp+0x3c2/0x818
> [   82.574846]  [<00000006c9d4c62e>] hugetlb_no_page+0x56e/0x5a8
> [   82.574848]  [<00000006c9d4cac2>] hugetlb_fault+0x45a/0x590
> [   82.574850]  [<00000006c9d06d4a>] handle_mm_fault+0x182/0x220
> [   82.574855]  [<00000006c9a9d70e>] do_exception+0x19e/0x470
> [   82.574858]  [<00000006c9a9dff2>] do_dat_exception+0x2a/0x50
> [   82.574861]  [<00000006ca668a18>] __do_pgm_check+0xf0/0x1b0
> [   82.574866]  [<00000006ca677b3c>] pgm_check_handler+0x11c/0x170
> [   82.574870] Last Breaking-Event-Address:
> [   82.574871]  [<00000006c9d4b926>] hugetlb_wp+0x7e/0x818
> [   82.574873] Kernel panic - not syncing: panic_on_warn set ...
> [   82.574875] CPU: 9 PID: 1674 Comm: linkhuge_rw Kdump: loaded Not tainted 5.19.0-next-20220815 #36
> [   82.574877] Hardware name: IBM 3931 A01 704 (LPAR)
> [   82.574878] Call Trace:
> [   82.574879]  [<00000006ca664f22>] dump_stack_lvl+0x62/0x80
> [   82.574881]  [<00000006ca657af8>] panic+0x118/0x300
> [   82.574884]  [<00000006c9ac3da6>] __warn+0xb6/0x160
> [   82.574887]  [<00000006ca29b1ea>] report_bug+0xba/0x140
> [   82.574890]  [<00000006c9a75194>] monitor_event_exception+0x44/0x80
> [   82.574892]  [<00000006ca668a18>] __do_pgm_check+0xf0/0x1b0
> [   82.574894]  [<00000006ca677b3c>] pgm_check_handler+0x11c/0x170
> [   82.574897]  [<00000006c9d4bc6a>] hugetlb_wp+0x3c2/0x818
> [   82.574899]  [<00000006c9d4c62e>] hugetlb_no_page+0x56e/0x5a8
> [   82.574901]  [<00000006c9d4cac2>] hugetlb_fault+0x45a/0x590
> [   82.574903]  [<00000006c9d06d4a>] handle_mm_fault+0x182/0x220
> [   82.574906]  [<00000006c9a9d70e>] do_exception+0x19e/0x470
> [   82.574907]  [<00000006c9a9dff2>] do_dat_exception+0x2a/0x50
> [   82.574909]  [<00000006ca668a18>] __do_pgm_check+0xf0/0x1b0
> [   82.574912]  [<00000006ca677b3c>] pgm_check_handler+0x11c/0x170


do_dat_exception() sets
  access = VM_ACCESS_FLAGS;

do_exception() sets
  is_write = (trans_exc_code & store_indication) == 0x400;

and FAULT_FLAG_WRITE
   if (access == VM_WRITE || is_write)
          flags |= FAULT_FLAG_WRITE;

however, for VMA permission checks it only checks
  if (unlikely(!(vma->vm_flags & access)))
          goto out_up;

as VM_ACCESS_FLAGS includes VM_WRITE | VM_READ ...

We end up triggering a write fault (FAULT_FLAG_WRITE), even though the
VMA does not allow for writes.

I assume that's what happens and that it's a bug in s390x code.



  reply	other threads:[~2022-08-15 18:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11 10:34 [PATCH v2 0/2] mm/hugetlb: fix write-fault handling for " David Hildenbrand
2022-08-11 10:34 ` [PATCH v2 1/2] mm/hugetlb: fix hugetlb not supporting softdirty tracking David Hildenbrand
2022-08-11 18:27   ` Mike Kravetz
2022-08-11 10:34 ` [PATCH v2 2/2] mm/hugetlb: support write-faults in shared mappings David Hildenbrand
2022-08-11 13:59   ` Peter Xu
2022-08-11 16:24     ` David Hildenbrand
2022-08-11 18:59   ` Mike Kravetz
2022-08-15 13:35     ` Gerald Schaefer
2022-08-15 15:07       ` David Hildenbrand
2022-08-15 15:59         ` Gerald Schaefer
2022-08-15 18:03           ` David Hildenbrand [this message]
2022-08-15 18:38             ` Gerald Schaefer
2022-08-15 21:43               ` Mike Kravetz
2022-08-16  9:33                 ` Gerald Schaefer
2022-08-16 20:43                   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADFyXm40iiz-xFpLK4qGgHGh5Qp+98G9qxnqC20c8qtRiKt9_A@mail.gmail.com \
    --to=david@redhat.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox