From: Oscar Salvador <osalvador@suse.de>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, David Hildenbrand <david@redhat.com>,
Vivek Kasireddy <vivek.kasireddy@intel.com>,
Dave Airlie <airlied@redhat.com>,
Gerd Hoffmann <kraxel@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
Christoph Hellwig <hch@infradead.org>,
Jason Gunthorpe <jgg@nvidia.com>, Peter Xu <peterx@redhat.com>,
Arnd Bergmann <arnd@arndb.de>,
Daniel Vetter <daniel.vetter@ffwll.ch>,
Dongwon Kim <dongwon.kim@intel.com>,
Hugh Dickins <hughd@google.com>,
Junxiao Chang <junxiao.chang@intel.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
linux-stable@vger.kernel.org
Subject: Re: [PATCH v2 1/1] [PATCH] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
Date: Wed, 6 Nov 2024 10:23:34 +0100 [thread overview]
Message-ID: <Zys1luxxLWwy0yXh@localhost.localdomain> (raw)
In-Reply-To: <20241105032944.141488-2-jhubbard@nvidia.com>
On Mon, Nov 04, 2024 at 07:29:44PM -0800, John Hubbard wrote:
> commit 53ba78de064b ("mm/gup: introduce
> check_and_migrate_movable_folios()") created a new constraint on the
> pin_user_pages*() API family: a potentially large internal allocation
> must now occur, for FOLL_LONGTERM cases.
>
> A user-visible consequence has now appeared: user space can no longer
> pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB
> PAGE_SIZE system, when user space tries to (indirectly, via a device
> driver that calls pin_user_pages()) pin 2GB, this requires an allocation
> of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
> kmalloc().
>
> In addition to the directly visible effect described above, there is
> also the problem of adding an unnecessary allocation. The **pages array
> argument has already been allocated, and there is no need for a
> redundant **folios array allocation in this case.
>
> Fix this by avoiding the new allocation entirely. This is done by
> referring to either the original page[i] within **pages, or to the
> associated folio. Thanks to David Hildenbrand for suggesting this
> approach and for providing the initial implementation (which I've tested
> and adjusted slightly) as well.
>
> Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dongwon Kim <dongwon.kim@intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Junxiao Chang <junxiao.chang@intel.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: linux-stable@vger.kernel.org
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Hi John, thanks for doing this.
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Nit below:
> +static int
> +migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
> + struct pages_or_folios *pofs)
> {
> int ret;
> unsigned long i;
>
> - for (i = 0; i < nr_folios; i++) {
> - struct folio *folio = folios[i];
> + for (i = 0; i < pofs->nr_entries; i++) {
> + struct folio *folio = pofs_get_folio(pofs, i);
>
> if (folio_is_device_coherent(folio)) {
> /*
> @@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_folios(
> * convert the pin on the source folio to a normal
> * reference.
> */
> - folios[i] = NULL;
> + pofs_clear_entry(pofs, i);
> folio_get(folio);
> gup_put_folio(folio, 1, FOLL_PIN);
>
> @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios(
> * calling folio_isolate_lru() which takes a reference so the
> * folio won't be freed if it's migrating.
> */
> - unpin_folio(folios[i]);
> - folios[i] = NULL;
> + unpin_folio(pofs_get_folio(pofs, i));
We already retrieved the folio before, cannot we just bypass
pofs_get_folio() here?
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2024-11-06 9:23 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-05 3:29 [PATCH v2 0/1] " John Hubbard
2024-11-05 3:29 ` [PATCH v2 1/1] [PATCH] " John Hubbard
2024-11-05 8:47 ` David Hildenbrand
2024-11-05 21:31 ` John Hubbard
2024-11-06 9:23 ` Oscar Salvador [this message]
2024-11-07 4:20 ` John Hubbard
2024-11-05 8:42 ` [PATCH v2 0/1] " David Hildenbrand
2024-11-07 4:57 ` John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zys1luxxLWwy0yXh@localhost.localdomain \
--to=osalvador@suse.de \
--cc=airlied@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=daniel.vetter@ffwll.ch \
--cc=david@redhat.com \
--cc=dongwon.kim@intel.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=junxiao.chang@intel.com \
--cc=kraxel@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-stable@vger.kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=vivek.kasireddy@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox