From: Mina Almasry <almasrymina@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>,
Linux-MM <linux-mm@kvack.org>,
open list <linux-kernel@vger.kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Peter Xu <peterx@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [External] [PATCH 0/2] Track reserve map changes to restore on error
Date: Wed, 26 May 2021 19:48:00 -0700 [thread overview]
Message-ID: <CAHS8izNDTv37XvowTD2SfFSe3kmVDGGbRBRVAQaJ2UMy42ho_g@mail.gmail.com> (raw)
In-Reply-To: <CAHS8izN+-GVOp5cowjkT9WBXYf9Xg6BThWin8tWoKg2ZGFia0Q@mail.gmail.com>
On Wed, May 26, 2021 at 4:19 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Wed, May 26, 2021 at 10:17 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 5/25/21 8:19 PM, Muchun Song wrote:
> > > On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >>
> > >> Here is a modification to the reservation tracking for fixup on errors.
> > >> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic
> > >> case as well.
> > >>
> > >> Perhaps use this as a prerequisite for your fix(es)? Pretty sure this
> > >> will eliminate the need for the call to hugetlb_unreserve_pages.
> > >
> > > Hi Mike,
> > >
> > > It seems like someone is fixing a bug, right? Maybe a link should be
> > > placed in the cover letter so that someone can know what issue
> > > we are facing.
> > >
> >
> > Thanks Muchun,
> >
> > I wanted to first see if these patches would work in the code Mina is
> > modifying. If this works for Mina, then a more formal patch and request
> > for inclusion will be sent.
> >
>
> So a quick test: I apply my patche and yours on top of linus/master,
> and I remove the hugetlb_unreserve_pages() call that triggered this
> conversation, and run the userfaultfd test, resv_huge_pages underflows
> again, so it seems on the surface this doesn't quite work as is.
>
> Not quite sure what to do off the top of my head. I think I will try
> to debug why the 3 patches don't work together and I will fix either
> your patch or mine. I haven't taken a deep look yet; I just ran a
> quick test.
>
Ok found the issue. With the setup I described above, the
hugetlb_shared test case passes:
./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2
/tmp/kokonut_test/huge/userfaultfd_test && echo test success
The non-shared test case is the one that underflows:
./tools/testing/selftests/vm/userfaultfd hugetlb 10 2
/tmp/kokonut_test/huge/userfaultfd_test && echo test success
I've debugged a bit, and this messy hunk 'fixes' the underflow with
the non-shared case. (Sorry for the messiness).
@@ -2329,17 +2340,14 @@ void restore_reserve_on_error(struct hstate
*h, struct vm_area_struct *vma,
*/
SetHPageRestoreRsvCnt(page);
} else {
- rc = vma_needs_reservation(h, vma, address);
- if (rc < 0)
- /*
- * See above comment about rare out of
- * memory condition.
- */
- SetHPageRestoreRsvCnt(page);
- else if (rc)
- vma_add_reservation(h, vma, address);
- else
- vma_end_reservation(h, vma, address);
+ resv = inode_resv_map(vma->vm_file->f_mapping->host);
+ if (resv) {
+ int chg = region_del(resv, idx, idx+1);
+ VM_BUG_ON(chg);
+ }
The reason being is that on page allocation we region_add() an entry
into the resv_map regardless of whether this is a shared mapping or
not (vma_needs_reservation() + vma_commit_reservation(), which amounts
to region_add() at the end of the day).
To unroll back this change on error, we need to region_del() the region_add().
The code removed above doesn't end up calling region_del(), because
vma_needs_reservation() returns 0, because region_chg() sees there is
an entry in the resv_map, and returns 0.
The VM_BUG_ON() is just because I'm not sure how to handle that error.
> > I believe this issue has existed since the introduction of hugetlb
> > reservations in v2.6.18. Since the bug only shows up when we take error
> > paths, the issue may not have been observed. Mina found a similar issue
> > in an error path which could also expose this issue.
> > --
> > Mike Kravetz
next prev parent reply other threads:[~2021-05-27 2:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-21 7:44 [PATCH v3] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY Mina Almasry
2021-05-22 21:19 ` Andrew Morton
2021-05-22 21:32 ` Mina Almasry
2021-05-24 18:07 ` Mike Kravetz
2021-05-25 0:11 ` Mina Almasry
2021-05-25 0:45 ` Mike Kravetz
2021-05-25 23:31 ` [PATCH 0/2] Track reserve map changes to restore on error Mike Kravetz
2021-05-25 23:31 ` [PATCH 1/2] hugetlb: rename HPageRestoreReserve flag to HPageRestoreRsvCnt Mike Kravetz
2021-05-27 2:49 ` Mina Almasry
2021-05-25 23:31 ` [PATCH 2/2] hugetlb: add new hugetlb specific flag HPG_restore_rsv_map Mike Kravetz
2021-05-27 2:58 ` Mina Almasry
2021-05-26 3:19 ` [External] [PATCH 0/2] Track reserve map changes to restore on error Muchun Song
2021-05-26 17:17 ` Mike Kravetz
2021-05-26 23:19 ` Mina Almasry
2021-05-27 2:48 ` Mina Almasry [this message]
2021-05-27 16:08 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHS8izNDTv37XvowTD2SfFSe3kmVDGGbRBRVAQaJ2UMy42ho_g@mail.gmail.com \
--to=almasrymina@google.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox