From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8194C77B73 for ; Wed, 24 May 2023 22:45:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D1236B0074; Wed, 24 May 2023 18:45:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6596E280001; Wed, 24 May 2023 18:45:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F9C0900002; Wed, 24 May 2023 18:45:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3A6DC6B0074 for ; Wed, 24 May 2023 18:45:44 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 04FE31209B3 for ; Wed, 24 May 2023 22:45:43 +0000 (UTC) X-FDA: 80826632208.15.5813BA4 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf28.hostedemail.com (Postfix) with ESMTP id 3CB05C0007 for ; Wed, 24 May 2023 22:45:42 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=SJYhROVi; spf=pass (imf28.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684968342; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cXTRGd3AFmbTMVgsxsKyk8nn2cSOE6ftnwpeJ/zQVss=; b=mzWy2zlyfLmNFfx0WPGFTsvd+Fs8M4c+RpJDFCph/DJrOohrt7eM6QqEySRkHrJtLlamux JkrInVcTfHmRnKQq0Lo5EejPx/G+HVrCch1rGPX8b7jsrCdvMoFqFTYpGGUVjC0gBFEeFY A0S7qLhI4YSadAdsQFTn/4cQb2E2fgE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684968342; a=rsa-sha256; cv=none; b=idLQWEBGZiyIgeGCKvReGWUHGPwfXzIUiJbACV4HxXkJ2lhfv6M2UonOgzJ2IEV3zEdz2G mZoTQpsBS0KuY3oy1LoOQH6pf5k39xCFhs0eIo9pp8SU2GVjOsT3XPsYPcB/3H0R8gAAIY 9FxQMifv+4LJsrDmVHD6B8LmzEX842o= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=SJYhROVi; spf=pass (imf28.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-253e0f1e514so579535a91.1 for ; Wed, 24 May 2023 15:45:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684968341; x=1687560341; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cXTRGd3AFmbTMVgsxsKyk8nn2cSOE6ftnwpeJ/zQVss=; b=SJYhROViY0GHxSvAd/fD5Q4qORBZ2UkKvDbO12X5afCbMimDkH4eRu6Jmo2s7bPIk/ 7aUPsGifqbXw0/6PP0DKQYEUT/Hpm0EvV355qezt99cCZfhOzNBYKPOJdY7KM/2e6/Sl nbySyQDVCQVjazk5nMZR5A6bQbfj6xj4tg2UgJJuEUMYlnhLSXcxmeH2uB//d5Zo7AFG KwtHOaNBvo316rcwToPCkZh2FsJg0DWmnmyAl8VhJDmEWSyT553KaAyV7AnBJWdpvt/x Z3EBd/yqvQ81AyhJJKMSfoP9pxFhGxw8SpEpSngq/XxweYcnMmIRmrNKIGtwnifFcfCy 8ozg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684968341; x=1687560341; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cXTRGd3AFmbTMVgsxsKyk8nn2cSOE6ftnwpeJ/zQVss=; b=BnSTzjFDgWzWgyXzfuhiBFNl1I3upuz/OrKdFBRMoOpU4/Gv0DXBAUvEl8L3S63EaW x9X93ZQ+iNeIejCiPNm2+ZKJdkBaeahyhkKE/zHET3JYFBaV/u9oVg/rgd1dDGUG5OAz xOopQTvfqnrX/99XAYqs+n7tyHLeGxA3X+TnX411X4qpoKeAP8WcffEL+g/trLMD6QH7 6myC9kD3vdN2opULohb1JXhdRErx6pQJCC9h3hiOzzP7UxR0cdyYgPCPZHYe3QdC8Y3F 9xp7wxRq0xhTFb07J342N8TmSjTuhUN4uG01gBOpNr9aTiXXnmLcB8jfTUIAyzy6OHwm CDyw== X-Gm-Message-State: AC+VfDw6PGVf6pn36uwqV+WupHWkgPvDf12q/cS7tZiDaiRmvZ9YZxJf D3uNB2/IehhkdXacAqSQPUkBC6tOW4lpdp/X4EA= X-Google-Smtp-Source: ACHHUZ4QZ3TN1ogcQZvX3Dp8o3S0Oeg19zkXAJXbyevuVzFHwZKHXdFxSgSItGkSKtCYyE2mSshfyZE80/8tIJhfNk4= X-Received: by 2002:a17:90b:1912:b0:24d:e4fd:f487 with SMTP id mp18-20020a17090b191200b0024de4fdf487mr13992612pjb.14.1684968340899; Wed, 24 May 2023 15:45:40 -0700 (PDT) MIME-Version: 1.0 References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> <3d548f45-9ff9-d73a-83e0-bdd312f524@google.com> In-Reply-To: <3d548f45-9ff9-d73a-83e0-bdd312f524@google.com> From: Yang Shi Date: Wed, 24 May 2023 15:45:28 -0700 Message-ID: Subject: Re: [PATCH 25/31] mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable() To: Hugh Dickins Cc: Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3CB05C0007 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: gmp8aiyc5wqhac8u3ewa48j41zc81a8k X-HE-Tag: 1684968342-464265 X-HE-Meta: U2FsdGVkX1+e22sg4upuX4LGh7YzHf6HXamSBE5czxm80Gwta7iVRKJx+6b9smFOsXoEyz39RDfO1eiWEijuF96JL3VuL674/HhI6jq/CGSiioCo5WK5T+9xSkooRPRoDEhGlTs9eczEx+vEMYgk0apmkGwr4mEAHENJch4T+4v4D7QAuh3sS7cq/mNM1SXJKwD9rYvrSMCtCMZp8oHMpDLKarfYDRyYPkC9kUCcXlPK5+zmPpIBMl94bNYc+eLSQtbZKqFsmCv43Nt6cUs6V8omClH4Xhm1RRCn6lW84D1PqVQFxlm1mBhnFIafeniMamC9PBMTbGtpljGrqYGAQa9K+uxLf7amHxXL6a2t0eKcblfhRvSa5vtVT/ogiyNlizJkmsdjLzMeKPzJy7B8dfcezHMmDMYOwP58ItixJAg968w1D6C+zNld7nklN/+D3CnHP+ong3mFwfnzrxEK3hQr3W9XxSGDLLPVSxglSlLrbIohNF+HgSdvl+yPKdXqrVs7ybwJjv5C2y/RiuMK0gHnUphWSsXt366RqerILQo8uUVx+XZcK3JfPjRWil05SCyXuJHRRorlYVyUe6WfQoMvjexd43g87hO7T1NmOXGQFeoFiO7F4DRUgp6gs8qHVRIhgMfCYArhJrNTRyjHB5KcRtLngP4yXpSTs27yAF7NAdtv2AxBesWuqwcL97+X0XJElNMOgInSrofzwuXxEjJIJ1pajorChfcJiECGtNZ76NrdPMFq+QUlA5HxKu69ul9LCcjxPVt58rFKDHeuF+YxKedAXWx6oBz76fJCKegvBNw9id3TvwSFV4I8xpqUxGwIxVR23/I5nLVyDE1mRUZEEQVFS9MSy3EPLN6OFhkhjcjDvVu0NwX/ITcRh370HAJZ9Lkku609s+OQcNKNwX/JKnW/HK4C0AKt/W5WkAcfxFdjXFPf7YgKeUUJ77qfCwUS6VWVVhtJPL1Zqw1 Znx5vPbZ FR43ph1Y+rozGAjsgF1TKdrGbT9q3HLxGlKXEGfT5ujxmh3v+PbrRpGu/sNEmiSGKtvh1kMmjDukpBEF1eXToicU2MvBwuG3oC+4aIisHuL3xizWxUvtTaOnHMcy92XsmfCvgOD/oGbL/yLfFSdXOZJlICmfGYbz4riCMYnE9UD3FeLG7h63JM+LUKozOfxeF9cg207SnC++wjnuoeyj5h9Lh35gtfgbeiZzQyqXmdADOruvhdkH8JjXdub8waeCvoMMawRw0xyjfYt50LjXgr1DvxeS1+8Vz/kZ8BP1P3v1lgt9KkEi0J0M7ReMv1Yn0oBWmhPip+BfxX0VIAVZbR/J836Bkbblirn1xVxhNt+d0aXWktKoMjO7xzWmSpuGICl8cq1hhpfJXneUe4gjx2idDqasAEhHYs52dW/7kh0YCofB7wVRQAcqRGSREDdG7NVw4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 23, 2023 at 9:26=E2=80=AFPM Hugh Dickins wro= te: > > On Mon, 22 May 2023, Yang Shi wrote: > > On Mon, May 22, 2023 at 7:26=E2=80=AFPM Yang Shi = wrote: > > > On Sun, May 21, 2023 at 10:22=E2=80=AFPM Hugh Dickins wrote: > > > > > > > > There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block= to > > > > distinguish huge_zero_page from a normal THP: follow_page_pte() han= dles > > > > any instability, and here it's a good idea to replace any pmd_none(= *pmd) > > > > by a page table a.s.a.p, in the huge_zero_page case as for a normal= THP. > > > > (Hmm, couldn't the normal THP case have hit an unstably refaulted T= HP > > > > before? But there are only two, exceptional, users of FOLL_SPLIT_P= MD.) > > > > > > > > Signed-off-by: Hugh Dickins > > > > --- > > > > mm/gup.c | 19 ++++--------------- > > > > 1 file changed, 4 insertions(+), 15 deletions(-) > > > > > > > > diff --git a/mm/gup.c b/mm/gup.c > > > > index bb67193c5460..4ad50a59897f 100644 > > > > --- a/mm/gup.c > > > > +++ b/mm/gup.c > > > > @@ -681,21 +681,10 @@ static struct page *follow_pmd_mask(struct vm= _area_struct *vma, > > > > return follow_page_pte(vma, address, pmd, flags, &c= tx->pgmap); > > > > } > > > > if (flags & FOLL_SPLIT_PMD) { > > > > - int ret; > > > > - page =3D pmd_page(*pmd); > > > > - if (is_huge_zero_page(page)) { > > > > - spin_unlock(ptl); > > > > - ret =3D 0; > > > > - split_huge_pmd(vma, pmd, address); > > > > - if (pmd_trans_unstable(pmd)) > > > > - ret =3D -EBUSY; > > > > > > IIUC the pmd_trans_unstable() check was transferred to the implicit > > > pmd_none() in pte_alloc(). But it will return -ENOMEM instead of > > > -EBUSY. Won't it break some userspace? Or the pmd_trans_unstable() is > > > never true? If so it seems worth mentioning in the commit log about > > > this return value change. > > Thanks a lot for looking at these, but I disagree here. > > > > > Oops, the above comment is not accurate. It will call > > follow_page_pte() instead of returning -EBUSY if pmd is none. > > Yes. Ignoring secondary races, if pmd is none, pte_alloc() will allocate > an empty page table there, follow_page_pte() find !pte_present and return > NULL; or if pmd is not none, follow_page_pte() will return no_page_table(= ) > i.e. NULL. And page NULL ends up with __get_user_pages() having another > go round, instead of failing with -EBUSY. > > Which I'd say is better handling for such a transient case - remember, > it's split_huge_pmd() (which should always succeed, but might be raced) > in use there, not split_huge_page() (which might take years for pins to > be removed before it can succeed). It sounds like an improvement. > > > For other unstable cases, it will return -ENOMEM instead of -EBUSY. > > I don't think so: the possibly-failing __pte_alloc() only gets called > in the pmd_none() case. I mean what if pmd is not none for huge zero page. If it is not pmd_none pte_alloc() just returns 0, then returns -ENOMEM instead of -EBUSY. Or it is impossible that pmd end up being pmd_huge_trans or !pmd_present? It should be very unlikely, for example, migration does skip huge zero page, but I'm not sure whether there is any corner case that I missed. > > Hugh > > > > > > > > > > - } else { > > > > - spin_unlock(ptl); > > > > - split_huge_pmd(vma, pmd, address); > > > > - ret =3D pte_alloc(mm, pmd) ? -ENOMEM : 0; > > > > - } > > > > - > > > > - return ret ? ERR_PTR(ret) : > > > > + spin_unlock(ptl); > > > > + split_huge_pmd(vma, pmd, address); > > > > + /* If pmd was left empty, stuff a page table in the= re quickly */ > > > > + return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : > > > > follow_page_pte(vma, address, pmd, flags, &= ctx->pgmap); > > > > } > > > > page =3D follow_trans_huge_pmd(vma, address, pmd, flags); > > > > -- > > > > 2.35.3