From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC2DEC77B7C for ; Fri, 26 May 2023 08:12:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36CA2900004; Fri, 26 May 2023 04:12:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 33008900003; Fri, 26 May 2023 04:12:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E331900004; Fri, 26 May 2023 04:12:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 101D4900003 for ; Fri, 26 May 2023 04:12:50 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CDC3B40DB7 for ; Fri, 26 May 2023 08:12:49 +0000 (UTC) X-FDA: 80831690058.07.168E37A Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf19.hostedemail.com (Postfix) with ESMTP id 187521A0009 for ; Fri, 26 May 2023 08:12:46 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=rRXX6w93; spf=pass (imf19.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685088767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IQNRjhK3jlaLYFDGbUON4IVN8F+sSVQXRFYbktw9Bqw=; b=IacufLSYr/1WtkhMKN1gekWW5epfMPGD4AcK/HSj1JdlgvplvDE7SKwbtSzD+AWi1t/Wfy Uu6cdKWe/44LH2AhYxVSqVtGVTeL55BXDS0o3OEyC4puC2/NrO4wXXkK4HxRm5WRSJs7K1 P24/bbEsRdEb+ErCT6S4XFcGxsHOu0Y= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685088767; a=rsa-sha256; cv=none; b=U4LhBRNoGG6ZfsnIR8Pgq/uE3rApeUlDuIEWaBQ35dwdhrHL5HT9oszmAQegYqUL9MOZCz 2T5mPpr3Fc26Rw0oDsu0IjM6WIjNCJGxYP5vnQlGMdzD9xmPbTzql3Sds9TH/Gjs/Wy+YC xuYWrR2ZRgpTdqOp9YObVlZUTno1esY= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=rRXX6w93; spf=pass (imf19.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-3f6ef9a928fso409205e9.3 for ; Fri, 26 May 2023 01:12:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685088765; x=1687680765; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=IQNRjhK3jlaLYFDGbUON4IVN8F+sSVQXRFYbktw9Bqw=; b=rRXX6w93fiNwxOUzxA3p521bhc6sXRRDR6a9hSjs8VyyKTyaLYWiqCG/ij2C9D+V7r 5uxqrHE+lzRyXHPmzknbRPeb8PFPhwUtn7ifW6sePg5jM0wvfu0tM3toGeAmRTwQA3WA qM+YLiaKz97SliWHC6SUNW5GmZ625q+8y6e5CPvhOuDIvcV9xhNz0rsqWhNUbz+aTE50 PNGE3IyLV7nwcpf10cg78FJ26zlBBiZAKOCa3NC1/gll4+XAXo17XdnibkMuYI9zvg1q 3Le4b+D2Zr4gIUSZPG4f/YUPfPu/B1Lw54nMHhZbRMP+H0QlPnPkgaBpH8Tssvbv/93+ q7SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685088765; x=1687680765; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=IQNRjhK3jlaLYFDGbUON4IVN8F+sSVQXRFYbktw9Bqw=; b=AfdR3wE4LCRHLRYjIWZGu7phwDJVcJxCiipnAldjBlLMZhcvvI23uuEhFs6QsO9g1q rm+ys+CFUyOS4hMkWrMuJsX0We6so6ygmymeJCbheFXD2K9cqHwd4U1XvM6l2P6OMl0Q Xxq8u7s24uxttLdwK/m7msP+obdxfwHtMkPOhTJyt+AgSYz0wLc2VpEn5utRzXYuukRb WJwABHGuglwH3+JoKQblfiRNeUKGRc3CTpv2YrgyptinT32VXzAHBSHtHmEKzfpV2mXj LwIRYJtIbtaE7vI/ZBHOXNCJkSyyvXWaHjRCBU9ZMe7VBmlUJ4lOarwQRBXVx+uQrlRu exNQ== X-Gm-Message-State: AC+VfDweT78xjxpbPsHAP0iy86dZ3NXWINyI4GXlc1P5MIA6zPkmdtYE mGr6gyEHIMxG9H69uRFiY4U= X-Google-Smtp-Source: ACHHUZ5+6bk9eUeMhvmG1njF28MYQkY/8d5vrsHCWk2yqoDh7SowQtY1F0FNyPmvYHsGjgcAqlnMbw== X-Received: by 2002:a7b:cbc8:0:b0:3f6:464:4b32 with SMTP id n8-20020a7bcbc8000000b003f604644b32mr677742wmi.13.1685088765148; Fri, 26 May 2023 01:12:45 -0700 (PDT) Received: from localhost (host81-154-179-160.range81-154.btcentralplus.com. [81.154.179.160]) by smtp.gmail.com with ESMTPSA id c22-20020a7bc016000000b003f4285629casm4325988wmb.42.2023.05.26.01.12.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 May 2023 01:12:44 -0700 (PDT) Date: Fri, 26 May 2023 09:10:33 +0100 From: Lorenzo Stoakes To: David Howells Cc: Christoph Hellwig , David Hildenbrand , Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: Re: [RFC PATCH v2 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages() Message-ID: <89c7f535-8fc5-4480-845f-de94f335d332@lucifer.local> References: <20230525223953.225496-1-dhowells@redhat.com> <20230525223953.225496-2-dhowells@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230525223953.225496-2-dhowells@redhat.com> X-Rspamd-Queue-Id: 187521A0009 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 3uh7i5eqjqi1dckoikrzjzh37q9uo96c X-HE-Tag: 1685088766-631623 X-HE-Meta: U2FsdGVkX19AJDn3nm0dYPAZYMTMS02CP9UhBiPUjZPGpbzP7zQ5FigJpwBVcc4J9872pVhwP2JDrbSUbjoyo2Jd/HiDsfTozWCTSf6ePi5Gfo7McJWzEO7y2uUOCwTK4UA/WRTMv08SnunuhX6l8HAtafN+c4Zxv6zmX0z5Llybn/G/fKhwd8zTV6WzAbYh2yGyRcvG3DSI9lrlzqzUAFsTBWilzmzwx7+0ZVC0U2RYzIz7Pic+7l7JqPhvSTaPFrplnCJpjuHiu7yuD8nAwzxBs0h/thB19imB06HCp4NTWpT4op9smQzNDCMNI3QwHkFBjKPBTWiiPIi5dyqaWggXxnBS1BxxRO1rXWoQ6Z+IMHAVvX/tTZ96aZ5dReONZp+w/vhWRKBHABvzW+Y0BKht16w5KEPozvH4TTjSJG+MuM8rrjmqIdGiM1XsqNjZenBxLoZ+h+4Hxq0s2B86dkQMSQSWwnoHJe1PPRRxQ2I3lQrXaAAtHL0KBlWZq2y97Z1CYEO+JFX6bREF7grTQglcKr+9rYpdsCVceHr+AfkANuy3b+o3PhXehYl4isBhHA5132wMkwfrmYbCg9EKq+Rkf2U6r+tksH1XjoXSyPUX41cmE+7HESfQPgDa0+Qb+wwLGrMdfQNZ0Pkjj5o9Sp+10D1O60EwknifkuRyC2ZxjYgVV6uAWWkP2YmqvFlbFF8YDTN0j3aBWxeGjHk6ZESkBI8TIcGNQsgoj48qxU0gmLSnWiucu4frZmLJrHMMpe37k0nBeYmQ1a2o9KdUvQA7psYwzIi3YisRsXdh+ol3C1MPAbHEWZPG1/q3uMRnuYGyX083cwu2IzoCDR0rLIeV3i4DuTLY63gfwsw+NZt6Sp3sCaaXbMqB2o2aeKJRJH6v9cEPAQfAcwG2eiVRByBEFsruurkC9w3RLHC2vpWtZpe4p1wvYge8UQzKi94ugho02Gby7JvhKHhupvY vCaAhRkS 3JFmEye3XgMXzpQwNvTeBXyx1QloWSPkdSGpq11XW5C2TKW54/s6mkpLgHRy5gevYt5nz4flyqqP/8oQ3ixZqsXJPRuSZXEG9Uq2Vj5WHasAX3XRPogoD4OOZY4kKxGZpqgyP8VxoEY3VJSjdpSQSrK9IAAzaa+elyeARhuzfV49lds7vMSIOMZJHdLeqOaSCW0ZuP9dlA2vnav2L8rMtmvvGiOr1njtqmkmArRHUE9WE2ZpxIYKpVm1eBIMdUIeNPzMx7US3OA7D0/hoHmkxP4CDRt4W0F/gYoKum6tgve3HnFhjU4Uzetba1/DIJoRquF3lh0tnCRlPwGqwl7MbTePiqdYvy9pOCTgdx/CS+mNdfxHN89J1W1tpedfNFiELZ/XUuIqNX1yBbkGxN/Sq3OB1jyoR5MHAle2FWFHyiMg64z5o95WqsColdiLl0pfTiXcsu7lWP52KGlAiGaWy04nghSRfOEK8H9k+J5yLqx72otlFK3QkprYcvH0JFFo2Y9TEDi6iq0dwA/4IjIgQzdg/1trHc+4vAvNeoCLUxigtHcHWMCSUsNZQQX07GPOMixCk8FChk88LTe4E0MRbBoy/sAZeQ6p39FCYL92gmXH64xz9xtomwckekFymU/sh8XDqaxFTYTka5wzexwfWpW1U/pz6d2IARqIszvKmeIhZJvOoAo5vZrmcstqq5BhLW33a1YasHaN+c4X9RXbGRJtBiTvsR0jZM3h7b0UBHEH6P5EH1i9tNm30IDSH3nOrdWh8QQYoTYTug1LJUXdlUqEje2uRRI4PsE04cmqqDsCa2B3wTYPVbwdXgFJFYbM4SzcFA42c0jEP90sQuLkVM91mcuZVc0HQhyRphdJa5toDer1fOnQ10LAw1aMHVigwU1kSVezQujFc9KOMK+WIS8MoGRefksh1pln6SQIhAqksgCs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 25, 2023 at 11:39:51PM +0100, David Howells wrote: > Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer > to it from the page tables and make unpin_user_page*() correspondingly > ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a > zero page's refcount as we're only allowed ~2 million pins on it - > something that userspace can conceivably trigger. I guess we're not quite as concerned about FOLL_GET because FOLL_GET should be ephemeral and FOLL_PIN (horrifically) adds GUP_PIN_COUNTING_BIAS each time? > > Add a pair of functions to test whether a page or a folio is a ZERO_PAGE. > > Signed-off-by: David Howells > cc: Christoph Hellwig > cc: David Hildenbrand > cc: Andrew Morton > cc: Jens Axboe > cc: Al Viro > cc: Matthew Wilcox > cc: Jan Kara > cc: Jeff Layton > cc: Jason Gunthorpe > cc: Logan Gunthorpe > cc: Hillf Danton > cc: Christian Brauner > cc: Linus Torvalds > cc: linux-fsdevel@vger.kernel.org > cc: linux-block@vger.kernel.org > cc: linux-kernel@vger.kernel.org > cc: linux-mm@kvack.org > --- > > Notes: > ver #2) > - Fix use of ZERO_PAGE(). > - Add is_zero_page() and is_zero_folio() wrappers. > - Return the zero page obtained, not ZERO_PAGE(0) unconditionally. > > include/linux/pgtable.h | 10 ++++++++++ > mm/gup.c | 25 ++++++++++++++++++++++++- > 2 files changed, 34 insertions(+), 1 deletion(-) > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index c5a51481bbb9..2b0431a11de2 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -1245,6 +1245,16 @@ static inline unsigned long my_zero_pfn(unsigned long addr) > } > #endif /* CONFIG_MMU */ > > +static inline bool is_zero_page(const struct page *page) > +{ > + return is_zero_pfn(page_to_pfn(page)); > +} > + > +static inline bool is_zero_folio(const struct folio *folio) > +{ > + return is_zero_page(&folio->page); > +} > + > #ifdef CONFIG_MMU > > #ifndef CONFIG_TRANSPARENT_HUGEPAGE > diff --git a/mm/gup.c b/mm/gup.c > index bbe416236593..69b002628f5d 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -51,7 +51,8 @@ static inline void sanity_check_pinned_pages(struct page **pages, > struct page *page = *pages; > struct folio *folio = page_folio(page); > > - if (!folio_test_anon(folio)) > + if (is_zero_page(page) || > + !folio_test_anon(folio)) > continue; > if (!folio_test_large(folio) || folio_test_hugetlb(folio)) > VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page); > @@ -131,6 +132,13 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) > else if (flags & FOLL_PIN) { > struct folio *folio; > > + /* > + * Don't take a pin on the zero page - it's not going anywhere > + * and it is used in a *lot* of places. > + */ > + if (is_zero_page(page)) > + return page_folio(page); > + This will capture huge page cases too which have folio->_pincount and thus don't suffer the GUP_PIN_COUNTING_BIAS issue, however it is equally logical to simply skip these when pinning. This does make me think that we should just skip pinning for FOLL_GET cases too - there's literally no sane reason we should be pinning zero pages in any case (unless I'm missing something!) > /* > * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a > * right zone, so fail and let the caller fall back to the slow > @@ -180,6 +188,8 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) > static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) > { > if (flags & FOLL_PIN) { > + if (is_zero_folio(folio)) > + return; > node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs); > if (folio_test_large(folio)) > atomic_sub(refs, &folio->_pincount); > @@ -224,6 +234,13 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) > if (flags & FOLL_GET) > folio_ref_inc(folio); > else if (flags & FOLL_PIN) { > + /* > + * Don't take a pin on the zero page - it's not going anywhere > + * and it is used in a *lot* of places. > + */ > + if (is_zero_page(page)) > + return 0; > + > /* > * Similar to try_grab_folio(): be sure to *also* > * increment the normal page refcount field at least once, > @@ -3079,6 +3096,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast); > * > * FOLL_PIN means that the pages must be released via unpin_user_page(). Please > * see Documentation/core-api/pin_user_pages.rst for further details. > + * > + * Note that if the zero_page is amongst the returned pages, it will not have > + * pins in it and unpin_user_page() will not remove pins from it. > */ > int pin_user_pages_fast(unsigned long start, int nr_pages, > unsigned int gup_flags, struct page **pages) > @@ -3161,6 +3181,9 @@ EXPORT_SYMBOL(pin_user_pages); > * pin_user_pages_unlocked() is the FOLL_PIN variant of > * get_user_pages_unlocked(). Behavior is the same, except that this one sets > * FOLL_PIN and rejects FOLL_GET. > + * > + * Note that if the zero_page is amongst the returned pages, it will not have > + * pins in it and unpin_user_page() will not remove pins from it. > */ > long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > struct page **pages, unsigned int gup_flags) > Shouldn't this comment be added to pup() and pup_remote() also? Also I think it's well worth updating Documentation/core-api/pin_user_pages.rst to reflect this as that is explclitly referenced by a few comments and it's a worthwhile corner case to cover. Another nitty thing that I noticed is, in is_longterm_pinnable_page():- /* The zero page may always be pinned */ if (is_zero_pfn(page_to_pfn(page))) return true; Which, strictly speaking I suppose we are 'pinning' it or rather allowing the pin to succeed without actually pinning, but to be super pedantic perhaps it's worth updating this comment too. Other than the pedantic nits, this patch looks good and makes a lot of sense so:- Reviewed-by: Lorenzo Stoakes