From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7141FD3E798 for ; Wed, 6 Nov 2024 09:23:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C16536B0082; Wed, 6 Nov 2024 04:23:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BA06C6B0089; Wed, 6 Nov 2024 04:23:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F1FD6B008A; Wed, 6 Nov 2024 04:23:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7D7506B0082 for ; Wed, 6 Nov 2024 04:23:41 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E872E41A6B for ; Wed, 6 Nov 2024 09:23:40 +0000 (UTC) X-FDA: 82755130878.07.979D1DB Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf15.hostedemail.com (Postfix) with ESMTP id CEB8AA0008 for ; Wed, 6 Nov 2024 09:23:03 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=y73pyFYl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=8lku3hhO; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=y73pyFYl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=8lku3hhO; spf=pass (imf15.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730884882; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x1wxZBL7gdc64Tq3vyI4u8KFmJMP6vW9gBY9WDr8nPo=; b=rlvD/RecUdGT30x7HNIGwHKAbYebYX1kZRI0/+RIcrK2NAxlXX/BpfCc/S7S32mseXZqmi Ak8LDdskNeBVHT8mMjqbNzCNtDvX6yOf70zDPu0EFRsfk29pdtZXNXDF2wM7GuAvIhaJxZ CdZnlBA0xOBfaSCfClsp5p3qe0dkUEY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730884882; a=rsa-sha256; cv=none; b=xSj02lNU0g2UukxiB3dJXTIrAQp911lA7/op4zJT//FFQsUlXfdb3UOlvzLoeNUEHV5PxF SX24kfPvXGGjdYYmn+d7UmEIMtYzECRfNWIHtbYq3Up+F5enakxURYSqlAdeJhlI03nU6i EIfsm7Cgk/o3uhg4zHq5A9KbmH2GlFM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=y73pyFYl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=8lku3hhO; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=y73pyFYl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=8lku3hhO; spf=pass (imf15.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id F1A6B21C29; Wed, 6 Nov 2024 09:23:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1730885017; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=x1wxZBL7gdc64Tq3vyI4u8KFmJMP6vW9gBY9WDr8nPo=; b=y73pyFYl0zRPSMbgA90HjEOzYjrwrmTOJWb4Q8o8TtT4VQPy2oCUdTkDXEuh9qKNqSEgDb r6XgEXE9mzMtPyM6bpO05hU6KLTSrP0JzAxUILbfphr9SROJEwZfvf14MQvEg8SyMQFUmN bmF3IVaPQOyPIFgSkOuVGmw36Xpc6xU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1730885017; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=x1wxZBL7gdc64Tq3vyI4u8KFmJMP6vW9gBY9WDr8nPo=; b=8lku3hhOHi/fLguBC2f11zMO7xeZ/gUp2v4/q+2Os3MY2wjeH76h0vBe9GI0RBBfZSKqM1 ZW7IL3HKzCveoJAw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1730885017; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=x1wxZBL7gdc64Tq3vyI4u8KFmJMP6vW9gBY9WDr8nPo=; b=y73pyFYl0zRPSMbgA90HjEOzYjrwrmTOJWb4Q8o8TtT4VQPy2oCUdTkDXEuh9qKNqSEgDb r6XgEXE9mzMtPyM6bpO05hU6KLTSrP0JzAxUILbfphr9SROJEwZfvf14MQvEg8SyMQFUmN bmF3IVaPQOyPIFgSkOuVGmw36Xpc6xU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1730885017; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=x1wxZBL7gdc64Tq3vyI4u8KFmJMP6vW9gBY9WDr8nPo=; b=8lku3hhOHi/fLguBC2f11zMO7xeZ/gUp2v4/q+2Os3MY2wjeH76h0vBe9GI0RBBfZSKqM1 ZW7IL3HKzCveoJAw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E3D4013736; Wed, 6 Nov 2024 09:23:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id YwdeNJc1K2e7UQAAD6G6ig (envelope-from ); Wed, 06 Nov 2024 09:23:35 +0000 Date: Wed, 6 Nov 2024 10:23:34 +0100 From: Oscar Salvador To: John Hubbard Cc: Andrew Morton , LKML , linux-mm@kvack.org, David Hildenbrand , Vivek Kasireddy , Dave Airlie , Gerd Hoffmann , Matthew Wilcox , Christoph Hellwig , Jason Gunthorpe , Peter Xu , Arnd Bergmann , Daniel Vetter , Dongwon Kim , Hugh Dickins , Junxiao Chang , Mike Kravetz , linux-stable@vger.kernel.org Subject: Re: [PATCH v2 1/1] [PATCH] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases Message-ID: References: <20241105032944.141488-1-jhubbard@nvidia.com> <20241105032944.141488-2-jhubbard@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241105032944.141488-2-jhubbard@nvidia.com> X-Rspamd-Action: no action X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CEB8AA0008 X-Stat-Signature: 95d8k1qfn4db88fhtmjrxkhyo5t5z3q4 X-HE-Tag: 1730884983-933304 X-HE-Meta: U2FsdGVkX19RksLVZJOWkDsZV270DrB+wQwT5lTwqN3HEHyKlXbAdOHn7KvM0RmgcTObn/IVSRo6jI1E4l/9e39f4+GdkCo7eLE/GWaCCwrO9Lic337hOiSEdANdFLtmG7OWU0Uv5R/dFlwGCK0AongHyLZQRMX2NxsTAWHyjC2DaAUdDVlDEItNIZL5MUt4Q6+GJm7uUnoapfORDnwDxbJ4r9e+cskYSR0rfYlMyU31HiuKpVfmpkCxMnX9mf5K3YaqyMyAtxMUpVCrTPcCw3Gd0Nk+ORRMxiNcYDkenywE2E7V+XQXcKH4tZDbt5vQnAJcg3glY95D2nFeEPoPkYxNJptnJFMHteLXK1ASj6QzD8sQ1mx9XLY43M7CkH+ruoz0A8yHh+4kGa4fxUlzkE+R+tbRpSUKRivkK0D5sSmW4YJp0DOROG3mMFA9O1C0lAJhIcybReQ7FTUM/8wpMEAGDLUdnTjg6RGnBhmFyuh4pIvwzqI4Vzp2y6T4gV4hOdAepRfalNMMjdH67aQlsE/vgfE5PyXckqd8x8+H0mvyBXqDjSwj0WnOi+SuHqS9XN4je3PKhE94DhDjPC1WP5an7Fku7dcKBYTOoCdLa9Dt/zsK55TNODL7vOLot10PnyHUnlwhffP9XyaLvsISlOOB9pDUrbJtYQOYBvnZ69q2SLGbrCMpZP5E1hNjPBk5djibj7fhRo/LGfbBWC/X/dWDQre903Yuyy9l2tKN3XwhMNrV2l8lLtwboNwQC6hb5BkD7DkScgb2Exh1c4tAxCZce/HfmaWJvf2zqAY9MYaCsVxUD2aJ9VyfoZuaAPa+I7APlbYZAECsoT22NSnsxcTH7WrH6jWEdYegiC8GUUoR9kt24+htVtAPLMoAGtkqjJctqtVLBV6tZFrvZbzUDn4JtBeoJTe/WRhiltW5O6L3me6LrVZWLSXmBz2hKr//PffABi9r4hyLgUaxPFB 0eEpIAUx HvqsvljZJnud4sxNzL1bK4SGReYF/5wKgzgwNHuoGg7VfMqBTQYy8RyWYOQuu6cwSnOFoQqEIOFjZeht5Dc/TLE1W6nG3f88wt0XDh1Gh2gX/1dsh3ZjIO7ZPOHjsq6yELKaVPyla99cUrATkgQgmQs3r2dfgAUR+3XbXfu7qDlGFFeJco5nyQR6rB50XCiRbcaDQLjv62g5UFPd1P+eRuNCo2m6fWWYUfcINXqho2LbZfjbY4FZdWfIfYhqDdZt5NNJoL7JI0EGetCCmgtOu1w6jHq24c3Nzq+DuWc2bw344rtFxgcR/QM0AB5H+bs4r5VCyQNcH7qNskG6WOaoS+X0ntcdJBa/5Kt5A5Z3FRmZZh4l61gkw4j472qANrmhvq0ZKVAEWcFybY+OPlrGxcS/3VUjSqFJI4fMTV5+wnC0rv/gNP11VaUnkUNfX1LIy90R9PnAUr1uERmT5NFRXZw+MyVNuk05tu7IzJcl+nRs5/PD9IRYFaDZd37uylDp4UvAM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 04, 2024 at 07:29:44PM -0800, John Hubbard wrote: > commit 53ba78de064b ("mm/gup: introduce > check_and_migrate_movable_folios()") created a new constraint on the > pin_user_pages*() API family: a potentially large internal allocation > must now occur, for FOLL_LONGTERM cases. > > A user-visible consequence has now appeared: user space can no longer > pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB > PAGE_SIZE system, when user space tries to (indirectly, via a device > driver that calls pin_user_pages()) pin 2GB, this requires an allocation > of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for > kmalloc(). > > In addition to the directly visible effect described above, there is > also the problem of adding an unnecessary allocation. The **pages array > argument has already been allocated, and there is no need for a > redundant **folios array allocation in this case. > > Fix this by avoiding the new allocation entirely. This is done by > referring to either the original page[i] within **pages, or to the > associated folio. Thanks to David Hildenbrand for suggesting this > approach and for providing the initial implementation (which I've tested > and adjusted slightly) as well. > > Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") > Suggested-by: David Hildenbrand > Cc: Vivek Kasireddy > Cc: Dave Airlie > Cc: Gerd Hoffmann > Cc: Matthew Wilcox > Cc: Christoph Hellwig > Cc: Jason Gunthorpe > Cc: Peter Xu > Cc: Arnd Bergmann > Cc: Daniel Vetter > Cc: Dongwon Kim > Cc: Hugh Dickins > Cc: Junxiao Chang > Cc: Mike Kravetz > Cc: Oscar Salvador > Cc: linux-stable@vger.kernel.org > Signed-off-by: John Hubbard Hi John, thanks for doing this. Reviewed-by: Oscar Salvador Nit below: > +static int > +migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list, > + struct pages_or_folios *pofs) > { > int ret; > unsigned long i; > > - for (i = 0; i < nr_folios; i++) { > - struct folio *folio = folios[i]; > + for (i = 0; i < pofs->nr_entries; i++) { > + struct folio *folio = pofs_get_folio(pofs, i); > > if (folio_is_device_coherent(folio)) { > /* > @@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_folios( > * convert the pin on the source folio to a normal > * reference. > */ > - folios[i] = NULL; > + pofs_clear_entry(pofs, i); > folio_get(folio); > gup_put_folio(folio, 1, FOLL_PIN); > > @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios( > * calling folio_isolate_lru() which takes a reference so the > * folio won't be freed if it's migrating. > */ > - unpin_folio(folios[i]); > - folios[i] = NULL; > + unpin_folio(pofs_get_folio(pofs, i)); We already retrieved the folio before, cannot we just bypass pofs_get_folio() here? -- Oscar Salvador SUSE Labs