From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB483C77B73 for ; Sat, 27 May 2023 19:46:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F114900003; Sat, 27 May 2023 15:46:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A119900002; Sat, 27 May 2023 15:46:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 268C6900003; Sat, 27 May 2023 15:46:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 18943900002 for ; Sat, 27 May 2023 15:46:08 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D07B780293 for ; Sat, 27 May 2023 19:46:07 +0000 (UTC) X-FDA: 80837065974.13.02F2325 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by imf11.hostedemail.com (Postfix) with ESMTP id E16ED4000C for ; Sat, 27 May 2023 19:46:04 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="gcr+i/Oe"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.44 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685216765; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yScXvAOo/xUmomxE33DkwJiq9zbdLwNtpoWsDl3jelk=; b=320sPa/LRvQOvTADHa0NSn/cQtF4HjM3iIVewOmAmGZ6fWSv61kypuvfcif/6Slglq0n42 CqxYgln4irRUOg4+nxQOp7tYGwjyzTQGRU0ldFohedkvYvrz4jeKe4UPxWnCb0gEIslGEW uyoXS1m5lNmMPk4BzcrBdKZmmCarSkU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="gcr+i/Oe"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.221.44 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685216765; a=rsa-sha256; cv=none; b=XtTH8vL8lEfserCU75tHTmuQv2+ZjGdx6wETDGHOP9TT0Cj6ivUNMOyNxArHeWVpiJdLwZ mbL5adgR6x8Y0L1HQVvncJ77264TBlYI72tl2wo0ssOtwLk5KJKNNe+I3TES5i0rytsK8+ HEisfY7Xz/WlTw7u0wUKxyS90DmoGSo= Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-307d20548adso1120155f8f.0 for ; Sat, 27 May 2023 12:46:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685216763; x=1687808763; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=yScXvAOo/xUmomxE33DkwJiq9zbdLwNtpoWsDl3jelk=; b=gcr+i/Oe2B/fp27UmbsngDZwfGTLNeLqWKY5Gxg7SD/h4R7bkbR9gSYMnb3GG+4/GA xg5Th5Kjpje2q0zbpBKEblCxios13UBcRT6/lYgYw221LcSV5g2Z7kWwiL8+cuF8KT/I Q4PUbQYuI8LTmRMNTj8eRBkNXVNRBVFaZtLWAztiv4G/1mIvO6nBjLa/6O0OC3jeZ6mp PM87Vo+eBj2sr1lJzHeGnOHb5HV5umbeOMhXcVDrd2M6GPjkHcawsRu79fyYkZIeNhH2 tq7RRqppBi7g+cs3jNZEWXIfbWpp5nYwo+YybNFvNEr/CsDOXEYu5ofn82CZM5WjUj3Z orTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685216763; x=1687808763; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=yScXvAOo/xUmomxE33DkwJiq9zbdLwNtpoWsDl3jelk=; b=CYBzqznUmMpfLFUhKSQXUY0TTch4zcp6wsOfqp31lqsQozAFRY/y5mif2LIXA2dLea Y1LkKqNHUrd+j9Pb8arDib5Gd3W7vKHURqnBx2ct2sWOQ4XzVE0fJ2mOA6CuiJ59dqQx Nz37/2yOp9RHSHpSXDym1/KdPz/Gol5IzXhwCEGt9CEO0YWVg03wReNUwlw9PpDsgADZ vlllMQlA0SDhgfJhffDA4teJS87ekRtOwPPSh+Z2DLiQ+s071wNY7DFTHS+UzWU5Ctcj i7f+gT/fx1zxiAoWYillfNafJ7ctNhj6QRIaPcbHV9QXiwE+lKxcOrTm89R0zXlmuhRK dpOQ== X-Gm-Message-State: AC+VfDzEBW3JgytQDiT5y0bVzapfhqWKSHjsToFbOShWr0xBFC3nfA5Q jToQ2/53zIn/hBpGA2p8lxNhWx3jCHE= X-Google-Smtp-Source: ACHHUZ6HI+9ndP9jA2w0d+e7jXcwjj7zyjBoMOcS+oeZ1wbojrMqinOs2X09wzBcOzCtDZ9DZNBguw== X-Received: by 2002:a05:6000:11c4:b0:306:2fab:1f81 with SMTP id i4-20020a05600011c400b003062fab1f81mr4453730wrx.21.1685216763120; Sat, 27 May 2023 12:46:03 -0700 (PDT) Received: from localhost (host81-154-179-160.range81-154.btcentralplus.com. [81.154.179.160]) by smtp.gmail.com with ESMTPSA id u18-20020adfed52000000b003079986fd71sm8897412wro.88.2023.05.27.12.46.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 27 May 2023 12:46:02 -0700 (PDT) Date: Sat, 27 May 2023 20:46:01 +0100 From: Lorenzo Stoakes To: David Howells Cc: Christoph Hellwig , David Hildenbrand , Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: Re: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages() Message-ID: References: <20230526214142.958751-1-dhowells@redhat.com> <20230526214142.958751-2-dhowells@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230526214142.958751-2-dhowells@redhat.com> X-Rspamd-Queue-Id: E16ED4000C X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: x3azaqo9h4zttmd5bjh6yjbxj9dxbmy4 X-HE-Tag: 1685216764-910306 X-HE-Meta: U2FsdGVkX19+hDMjCcN5P8qJyOe3ZV2/IgGVkO15XtL2cEr9OFVCAN1T1KGQgsDcIR7w8j9d2689gSWPfVxl6hbnlI2bigMwuz3zpQysDvm7eMu0Gg1ZCuiPJ4YSuGECP8v+iS9NZd4orq0nrTZkLehGyLtaPzWaXNxxLwZgbA/Jl4FbqnXBagzIJ9ru5Ua+FfS1cCFFXjL5GGJgU5BxEOn3EQCcaJpjBCeGA87M6JykzW1/YSr2RY3US7UF2GGSUAzpEHBuB+tMVzWDvjK6hVMuXIq3wurge90LmmEMy2sAWV6Ix5XNjMzFTR2tvE6wS2kn6T3TZbJhDVYhHbrz0vIFKG8/ziM2MTFDOfW7b4dRul+jkSH9ecIQ4q8mTGPiGClxanZJa55Z46AvZv1QlhDMs/I9V4qtxXZm5W2GIClsMKCeKUfmRw0KIWPN0qhNl0mY3MSsWLnj8wHdo0LEOi5qaBbc7yZX7ZVv+Q1Z2WaWuy4czUrQo7IVVTEG39V8l0FnpCodrNTo14xCpCQ0962ewEWXdvy56YnP4DkfFAHc+MWDJcUjN45xEAJ5AFN/349OEIfE4Gd170IJSGWn3m9+ICxKkIVQnWFq2CMlOVqAjpXA6KtCjPrhrv4CIQehQwO3HsiicKFKgD69BJsgXskXsK9yg8rKTq1k8MYFITxx0/dqWKliOC4WzkVmp8FkVmuleHV/igZ9YPjER3ojQeu2y44TO7RQ+oHv3eCtSoKaWCchqfPs+8Fmc9Fke5tbce/ku5PyUUFw5XgvUuhW/F782EFG06+SNzbCDZptSlIj16LbbN1ydl5iicrDW4vL+T/m+UNPozhYLgsO8+NFRS8LxjChkfnlgeVNl57vtx1feSWfh+gZga23nt2z5lqDGq+WZlbv/qyv3DjxccXnpBL1Xj9Gdtt2dq6axhtWgaKxCdtYYKSV8kJQflbgbq/A1qSItu2UN/JbA9ctsok KGq2xCd5 pifVH2oxE16EkDa9+ix3t6hMrT5kLD8TX4vzQkMYJUQVyzEpRjr/EF3IQ5n7bFTM576i4I0zKleqUz0eIilXyG6l5heP9waEUBijORDF2VbM/7Da2DSb6j9yZW/+v8F0G39G0IeNVjsAX7tI5REjfqv8ljwSPbx/CZrzjk2T/zZRQsngb6ifyo68LDbN0iQZloIgsjFNdmQIOSVUIfsu5rxIgkAljhcFabOJWmOXjG9wrdP7fwGaGKipcMsc8AhswC8MivQQG+3KQcSR4NcUSQpcNGvdnK9iND59dybbrTeKfIfc8Rq28MsVkA51AtS10hk1PCBTMfNrwzO6tBJLaz0UJDOINZJLOwjiJ06OiBvOnIbIKCOOQd9jKlrNdYOXg5Ow/OpLDEycW26nflHt+qeNLw0HiP2m1quM5D+o1KytnZj9iqtFIlu/G0nYOXRx2zay5YbCvnz2BX/J99AGNmN3/uwFWCYAXBfHzWNimGvKMUvDrTU4UmBLsxbz4tp1pWM1DPP7DA7yG3qHOENgdB7OS5qNKVbI6ycF8PmHx6its3eY54mzOvAhwORvgnztRKRzbL43n6Y4cClOfpgEUh41xB7adlXhwzRQxj7zrdzyjh2lzhQjLMe9BljWerLDNF0MIEpC/s0C/aiXZpVKEPtLeWP0pELCEu5dNEe5KRHoDBCmiqm2X67t9o+V1kkWO173mwgBX4dCZD37uPciFk2mxqRB4p9TBZnEvfjTHBSop5o73rwpOPMFo1xsILnbmiWXBpFk/7R2oTMxC4t/o6JnDBCczuMBb/7KN+rlQ+7HLMDlGwuxd6fNvgrm1PUOn5K7v+1h30Lifwzmn+8z4xoNT91Fvnz2ckRk2g00oWqmfSB4e8rttEIwJskyljfOyEE9Dfzl6cBrG0687/4zHQWoGYTyldndcBEqeI8f/V6ybxcQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, May 26, 2023 at 10:41:40PM +0100, David Howells wrote: > Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer > to it from the page tables and make unpin_user_page*() correspondingly > ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a > zero page's refcount as we're only allowed ~2 million pins on it - > something that userspace can conceivably trigger. > > Add a pair of functions to test whether a page or a folio is a ZERO_PAGE. > > Signed-off-by: David Howells > cc: Christoph Hellwig > cc: David Hildenbrand > cc: Lorenzo Stoakes > cc: Andrew Morton > cc: Jens Axboe > cc: Al Viro > cc: Matthew Wilcox > cc: Jan Kara > cc: Jeff Layton > cc: Jason Gunthorpe > cc: Logan Gunthorpe > cc: Hillf Danton > cc: Christian Brauner > cc: Linus Torvalds > cc: linux-fsdevel@vger.kernel.org > cc: linux-block@vger.kernel.org > cc: linux-kernel@vger.kernel.org > cc: linux-mm@kvack.org > --- > > Notes: > ver #3) > - Move is_zero_page() and is_zero_folio() to mm.h for dependency reasons. > - Add more comments and adjust the docs. > > ver #2) > - Fix use of ZERO_PAGE(). > - Add is_zero_page() and is_zero_folio() wrappers. > - Return the zero page obtained, not ZERO_PAGE(0) unconditionally. > > Documentation/core-api/pin_user_pages.rst | 6 +++++ > include/linux/mm.h | 26 +++++++++++++++++-- > mm/gup.c | 31 ++++++++++++++++++++++- > 3 files changed, 60 insertions(+), 3 deletions(-) > > diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst > index 9fb0b1080d3b..d3c1f6d8c0e0 100644 > --- a/Documentation/core-api/pin_user_pages.rst > +++ b/Documentation/core-api/pin_user_pages.rst > @@ -112,6 +112,12 @@ pages: > This also leads to limitations: there are only 31-10==21 bits available for a > counter that increments 10 bits at a time. > > +* Because of that limitation, special handling is applied to the zero pages > + when using FOLL_PIN. We only pretend to pin a zero page - we don't alter its > + refcount or pincount at all (it is permanent, so there's no need). The > + unpinning functions also don't do anything to a zero page. This is > + transparent to the caller. > + Great! Cheers. :) > * Callers must specifically request "dma-pinned tracking of pages". In other > words, just calling get_user_pages() will not suffice; a new set of functions, > pin_user_page() and related, must be used. > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 27ce77080c79..3c2f6b452586 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1910,6 +1910,28 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, > return page_maybe_dma_pinned(page); > } > > +/** > + * is_zero_page - Query if a page is a zero page > + * @page: The page to query > + * > + * This returns true if @page is one of the permanent zero pages. > + */ > +static inline bool is_zero_page(const struct page *page) > +{ > + return is_zero_pfn(page_to_pfn(page)); > +} > + > +/** > + * is_zero_folio - Query if a folio is a zero page > + * @folio: The folio to query > + * > + * This returns true if @folio is one of the permanent zero pages. > + */ > +static inline bool is_zero_folio(const struct folio *folio) > +{ > + return is_zero_page(&folio->page); > +} > + > /* MIGRATE_CMA and ZONE_MOVABLE do not allow pin pages */ > #ifdef CONFIG_MIGRATION > static inline bool is_longterm_pinnable_page(struct page *page) > @@ -1920,8 +1942,8 @@ static inline bool is_longterm_pinnable_page(struct page *page) > if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) > return false; > #endif > - /* The zero page may always be pinned */ > - if (is_zero_pfn(page_to_pfn(page))) > + /* The zero page can be "pinned" but gets special handling. */ > + if (is_zero_page(page)) > return true; > > /* Coherent device memory must always allow eviction. */ > diff --git a/mm/gup.c b/mm/gup.c > index bbe416236593..ad28261dcafd 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -51,7 +51,8 @@ static inline void sanity_check_pinned_pages(struct page **pages, > struct page *page = *pages; > struct folio *folio = page_folio(page); > > - if (!folio_test_anon(folio)) > + if (is_zero_page(page) || > + !folio_test_anon(folio)) > continue; > if (!folio_test_large(folio) || folio_test_hugetlb(folio)) > VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page); > @@ -131,6 +132,13 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) > else if (flags & FOLL_PIN) { > struct folio *folio; > > + /* > + * Don't take a pin on the zero page - it's not going anywhere > + * and it is used in a *lot* of places. > + */ > + if (is_zero_page(page)) > + return page_folio(page); > + > /* > * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a > * right zone, so fail and let the caller fall back to the slow > @@ -180,6 +188,8 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) > static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) > { > if (flags & FOLL_PIN) { > + if (is_zero_folio(folio)) > + return; > node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs); > if (folio_test_large(folio)) > atomic_sub(refs, &folio->_pincount); > @@ -224,6 +234,13 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) > if (flags & FOLL_GET) > folio_ref_inc(folio); > else if (flags & FOLL_PIN) { > + /* > + * Don't take a pin on the zero page - it's not going anywhere > + * and it is used in a *lot* of places. > + */ > + if (is_zero_page(page)) > + return 0; > + > /* > * Similar to try_grab_folio(): be sure to *also* > * increment the normal page refcount field at least once, > @@ -3079,6 +3096,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast); > * > * FOLL_PIN means that the pages must be released via unpin_user_page(). Please > * see Documentation/core-api/pin_user_pages.rst for further details. > + * > + * Note that if a zero_page is amongst the returned pages, it will not have > + * pins in it and unpin_user_page() will not remove pins from it. > */ > int pin_user_pages_fast(unsigned long start, int nr_pages, > unsigned int gup_flags, struct page **pages) > @@ -3110,6 +3130,9 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast); > * > * FOLL_PIN means that the pages must be released via unpin_user_page(). Please > * see Documentation/core-api/pin_user_pages.rst for details. > + * > + * Note that if a zero_page is amongst the returned pages, it will not have > + * pins in it and unpin_user_page*() will not remove pins from it. > */ > long pin_user_pages_remote(struct mm_struct *mm, > unsigned long start, unsigned long nr_pages, > @@ -3143,6 +3166,9 @@ EXPORT_SYMBOL(pin_user_pages_remote); > * > * FOLL_PIN means that the pages must be released via unpin_user_page(). Please > * see Documentation/core-api/pin_user_pages.rst for details. > + * > + * Note that if a zero_page is amongst the returned pages, it will not have > + * pins in it and unpin_user_page*() will not remove pins from it. > */ > long pin_user_pages(unsigned long start, unsigned long nr_pages, > unsigned int gup_flags, struct page **pages, > @@ -3161,6 +3187,9 @@ EXPORT_SYMBOL(pin_user_pages); > * pin_user_pages_unlocked() is the FOLL_PIN variant of > * get_user_pages_unlocked(). Behavior is the same, except that this one sets > * FOLL_PIN and rejects FOLL_GET. > + * > + * Note that if a zero_page is amongst the returned pages, it will not have > + * pins in it and unpin_user_page*() will not remove pins from it. > */ > long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > struct page **pages, unsigned int gup_flags) > All looks good, thanks for adding comments/updating doc! Reviewed-by: Lorenzo Stoakes