From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=us07=KQ=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BDB6FC433ED
	for <linux-mm@archiver.kernel.org>; Fri, 21 May 2021 02:06:05 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 4CFBD6135B
	for <linux-mm@archiver.kernel.org>; Fri, 21 May 2021 02:06:05 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4CFBD6135B
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id E0CCC8D0016; Thu, 20 May 2021 22:06:04 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DE2228D0001; Thu, 20 May 2021 22:06:04 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CAB678D0016; Thu, 20 May 2021 22:06:04 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0071.hostedemail.com [216.40.44.71])
	by kanga.kvack.org (Postfix) with ESMTP id 974F48D0001
	for <linux-mm@kvack.org>; Thu, 20 May 2021 22:06:04 -0400 (EDT)
Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 31CDC1802228E
	for <linux-mm@kvack.org>; Fri, 21 May 2021 02:06:04 +0000 (UTC)
X-FDA: 78163597848.34.4DBFE29
Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45])
	by imf11.hostedemail.com (Postfix) with ESMTP id 4FA2C20007E0
	for <linux-mm@kvack.org>; Fri, 21 May 2021 02:06:01 +0000 (UTC)
Received: by mail-pj1-f45.google.com with SMTP id h20-20020a17090aa894b029015db8f3969eso5703082pjq.3
        for <linux-mm@kvack.org>; Thu, 20 May 2021 19:06:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=pHsoSdXDoBgVt5ifSJzNWOLZxLWzbSBEL6tWMupQiKE=;
        b=Yx011/9eFDb5C4Y3NgsxdifLXakfccVGr5EY0Eny6o4tzN9mIIU6dWpbwPPhG0F1bc
         EyTGPJ+DiQu3fnzwSL+SlgnZ4HsNfKDSZ4DJbtR98SZNSnNUoRxBaUR6wsFgVuig+S1p
         LePxOXVqFTyMNs/wn0J6wO1LP2oKXrEw6I4oNRgdWCL66cskP5mVmimjSGd9BTwXR1gH
         boA4lB7iKRrVRYqi9+W+iFXmFMKpI6iwY4TS5CU6uu1fmkQ0P1RIutyhdOBizFZCAjRP
         7TaE/C98RKV0P7DO2lou6j0HaJOe2z8eVQc1MTz5JM3l4yTfkEE2qcXeEbuf7rlHv0fj
         HLuQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=pHsoSdXDoBgVt5ifSJzNWOLZxLWzbSBEL6tWMupQiKE=;
        b=jv7dslp6srGPvDvEIUIXn3gl43i415/DdEeyVZivYWKPN+Ef6dUf5EGyvEshyt8vwv
         cUkmuVi71Z+ABbw2O12SlnuG7+qKlDSVpIoSGdQFgFqhllv43tb+/CFNLVWsimQjntuA
         KXkxQDIimd1pNU9zw86ln5Vy9VMvf4Vc0odJcBzoxVMIX/BNoE8mukNRAP/RSZZJV3V+
         Xd5MJnuHpsj6nG8cG33iP3n4eO2ITbov2OBhlv675yG/q6WYLToze8oa7NNJ9z2xsOlr
         ZvjBD6bzYr7Klrtr8eqoGxW43dBEGLY/iOi2MK/gQdbPJtGVCFKRes7I0KA0hVE0yiIN
         LLOw==
X-Gm-Message-State: AOAM5317x8wgbh9eU0Rv3a2Qmy5wdB2XbFM8giPyOySpB6rzBx9zbog1
	xTix91gvwpG19iILLxPcDsR8meLF7aIuq2XpZQSjmw==
X-Google-Smtp-Source: ABdhPJwWV7tQAt4/AAohsYPwYwLS23El1UDsc3ooDjQM/g07Vm6EmVq9NMt2Wm/IQZwWumNEgRpDgoiNEabzJeWUULA=
X-Received: by 2002:a17:90a:1a:: with SMTP id 26mr8200181pja.187.1621562762563;
 Thu, 20 May 2021 19:06:02 -0700 (PDT)
MIME-Version: 1.0
References: <20210513234309.366727-1-almasrymina@google.com>
 <CAHS8izNkBvS9gkSjy8FbWBOPDynwr8PXXXbMHt_2=5sZJsa6-Q@mail.gmail.com>
 <09dc0712-48e8-8ba2-f170-4c2febcfff83@oracle.com> <CAHS8izN3+DwCMnVotW7UoiROKEpBh=i+n2jb+oMJQwbKeegx7Q@mail.gmail.com>
 <CAHS8izNECN6U4KDLS-jrtni6WMszK6rF1XjcWwwGAByb3pEr8A@mail.gmail.com>
 <61c78897-27a2-768b-f4fe-04e24b617ab6@oracle.com> <CAHS8izOHs0ZYR-0FCsDS6nxYbnNfYJFzWKGwB_utBJ_9bswjkA@mail.gmail.com>
In-Reply-To: <CAHS8izOHs0ZYR-0FCsDS6nxYbnNfYJFzWKGwB_utBJ_9bswjkA@mail.gmail.com>
From: Mina Almasry <almasrymina@google.com>
Date: Thu, 20 May 2021 19:05:51 -0700
Message-ID: <CAHS8izNhJRvAAWOBWG0E5hTOkLC4xSmsesnNYgZ+Feg+ZUzANg@mail.gmail.com>
Subject: Re: [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>, Peter Xu <peterx@redhat.com>, 
	Linux-MM <linux-mm@kvack.org>, Andrew Morton <akpm@linux-foundation.org>, 
	open list <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 4FA2C20007E0
Authentication-Results: imf11.hostedemail.com;
	dkim=pass header.d=google.com header.s=20161025 header.b="Yx011/9e";
	spf=pass (imf11.hostedemail.com: domain of almasrymina@google.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=almasrymina@google.com;
	dmarc=pass (policy=reject) header.from=google.com
X-Rspamd-Server: rspam04
X-Stat-Signature: xir7g8qc1ra7mgproqs68h3h3kgqs97p
X-HE-Tag: 1621562761-158185
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, May 20, 2021 at 1:31 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Thu, May 20, 2021 at 1:00 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 5/20/21 12:21 PM, Mina Almasry wrote:
> > > On Thu, May 20, 2021 at 12:18 PM Mina Almasry <almasrymina@google.com> wrote:
> > >>
> > >> On Thu, May 13, 2021 at 5:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >>>
> > >>> How about this approach?
> > >>> - Keep the check for hugetlbfs_pagecache_present in hugetlb_mcopy_atomic_pte
> > >>>   that you added.  That will catch the race where the page was added to
> > >>>   the cache before entering the routine.
> > >>> - With the above check in place, we only need to worry about the case
> > >>>   where copy_huge_page_from_user fails and we must drop locks.  In this
> > >>>   case we:
> > >>>   - Free the page previously allocated.
> > >>>   - Allocate a 'temporary' huge page without consuming reserves.  I'm
> > >>>     thinking of something similar to page migration.
> > >>>   - Drop the locks and let the copy_huge_page_from_user be done to the
> > >>>     temporary page.
> > >>>   - When reentering hugetlb_mcopy_atomic_pte after dropping locks (the
> > >>>     *pagep case) we need to once again check
> > >>>     hugetlbfs_pagecache_present.
> > >>>   - We then try to allocate the huge page which will consume the
> > >>>     reserve.  If successful, copy contents of temporary page to newly
> > >>>     allocated page.  Free temporary page.
> > >>>
> > >>> There may be issues with this, and I have not given it deep thought.  It
> > >>> does abuse the temporary huge page concept, but perhaps no more than
> > >>> page migration.  Things do slow down if the extra page allocation and
> > >>> copy is required, but that would only be the case if copy_huge_page_from_user
> > >>> needs to be done without locks.  Not sure, but hoping that is rare.
> > >>
> > >> Just following up this a bit: I've implemented this approach locally,
> > >> and with it it's passing the test as-is. When I hack the code such
> > >> that the copy in hugetlb_mcopy_atomic_pte() always fails, I run into
> > >> this edge case, which causes resv_huge_pages to underflow again (this
> > >> time permemantly):
> > >>
> > >> - hugetlb_no_page() is called on an index and a page is allocated and
> > >> inserted into the cache consuming the reservation.
> > >> - remove_huge_page() is called on this index and the page is removed from cache.
> > >> - hugetlb_mcopy_atomic_pte() is called on this index, we do not find
> > >> the page in the cache and we trigger this code patch and the copy
> > >> fails.
> > >> - The allocations in this code path seem to double consume the
> > >> reservation and resv_huge_pages underflows.
> > >>
> > >> I'm looking at this edge case to understand why a prior
> > >> remove_huge_page() causes my code to underflow resv_huge_pages.
> > >>
> > >
> > > I should also mention, without a prior remove_huge_page() this code
> > > path works fine, so it seems the fact that the reservation is consumed
> > > before causes trouble, but I'm not sure why yet.
> > >
> >
> > Hi Mina,
> >
> > How about quickly posting the code?  I may be able to provide more
> > suggestions if I can see the actual code.
>
> Sure thing, attached my patch so far. It's quite messy with prints
> everywhere and VM_BUG_ON() in error paths that I'm not handling yet.
> I've also hacked the code so that the hugetlb_mcopy_atomic_pte() copy
> always fails so I exercise that code path.
>

Of course right after I send out my patch, I figure out what's wrong.
It turns out freeing the allocated page when the copy fails is
extremely complicated (or complicated to figure out why it's broken).
Turns out I need to:

restore_page_on_error()
if (!HPageRestoreReserves(page)) {
    hugetlb_unreserve_pages(mapping, idx, idx + 1, 1);
}
put_page(page);

This is because even though we always allocate asking for a page from
the reserves, the page may not come from the reserves if the VM
doesn't have reservation for this index (which is the case if the page
has been allocated by hugetlb_no_page() and then removed with
remove_huge_page()). So, we need to correctly handle both cases.

Sorry for the noise but this was hard to track down. Patch should be
incoming soon unless I run into other issues with a closer look.

> > --
> > Mike Kravetz