From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8C37C432BE for ; Tue, 17 Aug 2021 20:14:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63AF161154 for ; Tue, 17 Aug 2021 20:14:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 63AF161154 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E5EC36B0071; Tue, 17 Aug 2021 16:14:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0EDD900003; Tue, 17 Aug 2021 16:14:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2496900002; Tue, 17 Aug 2021 16:14:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id B8CE36B0071 for ; Tue, 17 Aug 2021 16:14:30 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 612DC183ED993 for ; Tue, 17 Aug 2021 20:14:30 +0000 (UTC) X-FDA: 78485675100.33.9FEB4CA Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf19.hostedemail.com (Postfix) with ESMTP id 191FDB00045E for ; Tue, 17 Aug 2021 20:14:29 +0000 (UTC) Received: by mail-ej1-f54.google.com with SMTP id lo4so10591ejb.7 for ; Tue, 17 Aug 2021 13:14:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/HLRdHZP/EooGkLUc5zwmnFLeg6WQf/Ziajpiz1uE/U=; b=KiZcJ6vUGOj98sXLVwnmPaSKOped8mD5JeVABbC3Ootq6gltSjTWAkErBfwyqPuudJ tE4LVd4e9o5xVEr8SkdxSfJEV1s7ORRUI1eleW7T1evKKKWL1sN0KHhF/JBr+1GQu4GI VFIT8KD0eZ9pG2PT9zT7shtb0C3i//HqE/ipFNpLDZ814va1CNQ5u28t2H66cSr5D8eE aDbsiQJmSo92qiak1hKE1uO9QLWFvQN9ShU/LJx3yM8pxf8pVpmzRge01PMevGRn1TdX zwD2C3AFGT8TwvDecG73a8EXw6b5ilLtTTkE8iL0KiigP0Dj0aJZdRHGVm0Y1Acr/eye k2+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/HLRdHZP/EooGkLUc5zwmnFLeg6WQf/Ziajpiz1uE/U=; b=CNUpqp7wI3kp7Fh6Fy7TxFgyGxeE9lZ9d/qRc1hUSiK0vHjt02QyFOQHjKrwQt6loV osSpWyvJ4LtWKcKgTSnIYXjFbAKKdXrLXDT835dmEczHTX1ZEiCaMIxmKnWlNqIsWIyQ S3rHV8FlbWgJMC4FZfuVesSfy/KgPrFG0z3V/2i9Or5BDOFZiBb98AIiu5SWYH1F0Aav +RABV/78c6czNaeeqIRYq0zwGVolThkKhD862TZwrJeSDOhaFLRhp3I+9ciTjBQ1WTyB FCsKMeFx9lsxEESCfDpaAhymI1cs0YGqpcJFmcUsAsQpPf0K7Lcr2eD4kNBL32Yn01zq n4yQ== X-Gm-Message-State: AOAM532cw4sT2Tx45yHUhmUxcPLMKQcBmyiK0I9q21wKzSxWHaiNxU5x AnKApBh1JK7RPEwmbp13IXqf3mZrVKYcyQ4/DPM= X-Google-Smtp-Source: ABdhPJxP4ACqIGCV5q+PFZajjpBxoBCVAmS/PYsWCGjGqtJYEPoLJFDSf/FeHk0cxSK1SvrUMezXqzTdNE8OUe4PPb8= X-Received: by 2002:a17:906:491a:: with SMTP id b26mr5894152ejq.25.1629231268841; Tue, 17 Aug 2021 13:14:28 -0700 (PDT) MIME-Version: 1.0 References: <1355343b-acf-4653-ef79-6aee40214ac5@google.com> In-Reply-To: <1355343b-acf-4653-ef79-6aee40214ac5@google.com> From: Yang Shi Date: Tue, 17 Aug 2021 13:14:16 -0700 Message-ID: Subject: Re: [PATCH 6/9] huge tmpfs: SGP_NOALLOC to stop collapse_file() on race To: Hugh Dickins Cc: Andrew Morton , Shakeel Butt , "Kirill A. Shutemov" , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Matthew Wilcox , Linux Kernel Mailing List , Linux MM Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=KiZcJ6vU; spf=pass (imf19.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 191FDB00045E X-Stat-Signature: ipmo9hubxrawq8pbct7xtssf3941it4j X-HE-Tag: 1629231269-127902 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 17, 2021 at 1:17 AM Hugh Dickins wrote: > > khugepaged's collapse_file() currently uses SGP_NOHUGE to tell > shmem_getpage() not to try allocating a huge page, in the very unlikely > event that a racing hole-punch removes the swapped or fallocated page as > soon as i_pages lock is dropped. > > We want to consolidate shmem's huge decisions, removing SGP_HUGE and > SGP_NOHUGE; but cannot quite persuade ourselves that it's okay to regress > the protection in this case - Yang Shi points out that the huge page > would remain indefinitely, charged to root instead of the intended memcg. > > collapse_file() should not even allocate a small page in this case: why > proceed if someone is punching a hole? SGP_READ is almost the right flag > here, except that it optimizes away from a fallocated page, with NULL to > tell caller to fill with zeroes (like a hole); whereas collapse_file()'s > sequence relies on using a cache page. Add SGP_NOALLOC just for this. > > There are too many consecutive "if (page"s there in shmem_getpage_gfp(): > group it better; and fix the outdated "bring it back from swap" comment. > > Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi > --- > include/linux/shmem_fs.h | 1 + > mm/khugepaged.c | 2 +- > mm/shmem.c | 29 +++++++++++++++++------------ > 3 files changed, 19 insertions(+), 13 deletions(-) > > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h > index 9b7f7ac52351..7d97b15a2f7a 100644 > --- a/include/linux/shmem_fs.h > +++ b/include/linux/shmem_fs.h > @@ -94,6 +94,7 @@ extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, > /* Flag allocation requirements to shmem_getpage */ > enum sgp_type { > SGP_READ, /* don't exceed i_size, don't allocate page */ > + SGP_NOALLOC, /* similar, but fail on hole or use fallocated page */ > SGP_CACHE, /* don't exceed i_size, may allocate page */ > SGP_NOHUGE, /* like SGP_CACHE, but no huge pages */ > SGP_HUGE, /* like SGP_CACHE, huge pages preferred */ > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index b0412be08fa2..045cc579f724 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1721,7 +1721,7 @@ static void collapse_file(struct mm_struct *mm, > xas_unlock_irq(&xas); > /* swap in or instantiate fallocated page */ > if (shmem_getpage(mapping->host, index, &page, > - SGP_NOHUGE)) { > + SGP_NOALLOC)) { > result = SCAN_FAIL; > goto xa_unlocked; > } > diff --git a/mm/shmem.c b/mm/shmem.c > index 740d48ef1eb5..226ac3a911e9 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1871,26 +1871,31 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, > return error; > } > > - if (page) > + if (page) { > hindex = page->index; > - if (page && sgp == SGP_WRITE) > - mark_page_accessed(page); > - > - /* fallocated page? */ > - if (page && !PageUptodate(page)) { > + if (sgp == SGP_WRITE) > + mark_page_accessed(page); > + if (PageUptodate(page)) > + goto out; > + /* fallocated page */ > if (sgp != SGP_READ) > goto clear; > unlock_page(page); > put_page(page); > - page = NULL; > - hindex = index; > } > - if (page || sgp == SGP_READ) > - goto out; > > /* > - * Fast cache lookup did not find it: > - * bring it back from swap or allocate. > + * SGP_READ: succeed on hole, with NULL page, letting caller zero. > + * SGP_NOALLOC: fail on hole, with NULL page, letting caller fail. > + */ > + *pagep = NULL; > + if (sgp == SGP_READ) > + return 0; > + if (sgp == SGP_NOALLOC) > + return -ENOENT; > + > + /* > + * Fast cache lookup and swap lookup did not find it: allocate. > */ > > if (vma && userfaultfd_missing(vma)) { > -- > 2.26.2 >