From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 010ACD75E5A for ; Fri, 22 Nov 2024 14:54:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86D0F6B00A1; Fri, 22 Nov 2024 09:54:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 81AE56B00A2; Fri, 22 Nov 2024 09:54:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E2566B00A4; Fri, 22 Nov 2024 09:54:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4BC336B00A1 for ; Fri, 22 Nov 2024 09:54:25 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B81101C8207 for ; Fri, 22 Nov 2024 14:54:24 +0000 (UTC) X-FDA: 82814025798.04.961F28B Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf06.hostedemail.com (Postfix) with ESMTP id 56214180008 for ; Fri, 22 Nov 2024 14:53:44 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TCwwGF0r; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732287194; a=rsa-sha256; cv=none; b=xQh1+st9uQjCf0f7qWkrRBi7ESktKXuUckHG6CIy5yV3LEEG6zrmU4zLxZpT7/Y20T09c4 AUzAS1m38dep9X9hHRs8EYQaAJUxm3G9f6kNKc8vTtVYWQ5Je4Swl003qdV+xlj6scLuKZ 921FzkOKzd8v3a0lPRHgd50dpAVbmB8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TCwwGF0r; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732287194; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zDoPEU+qy2y3UY0TCD/yLS6jfZH/qpzapxYmLFAnUqs=; b=Cz/esg4WpMEMCbJHixAE7VgdkoFz/WmSTerK9v5WhI/w+e3wKZw97kxekB9TpIR0lX43Yg WjmFd3NI9JS2WnnPKDSq1zNzG0RzlMuN4nfZ45ScFYbaPTX4saCU0IZVeYyEJjQz3KchbY zI55WOJ2FXMlHoKAeC1ZxdgPz+3yFos= Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-4315abed18aso19291415e9.2 for ; Fri, 22 Nov 2024 06:54:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732287261; x=1732892061; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=zDoPEU+qy2y3UY0TCD/yLS6jfZH/qpzapxYmLFAnUqs=; b=TCwwGF0rs9NnewoVOFmJKCzRxpV9I70soCg9Z4M/Tox31GjuG86uZEy+G6Pw0qTurM dO7HBM6HvQpMs1mFeDGizx+hHYOqMc5U1elSMjNFG2UIjMiuRsRqwS5agEpdUh7zrgNc O5LxAIPeR9K/UYkt5DzyzWSdml6D1qwM8Q8NysC6oZsJ4TqivpDfV/MP1ADboj7NO3gn bzy1Cf30aINi0uk1GpitD53OgVUiRAMr/tjDKQ4LoLR//5IHy9iFCE0QT7agVJwUDj4I 8m28V9v1fAQ4JouilnNZJc0li+DKzBVAwBwZp1y5dr5EKTMkpzEDV0n/CBrE37TIOj2o dCBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732287261; x=1732892061; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zDoPEU+qy2y3UY0TCD/yLS6jfZH/qpzapxYmLFAnUqs=; b=KjZRxa/MoATx0NZOayZmlN8nqayJM39L/MQDjYsNbYI3EKiUW2GhN7g5IFRiwR0ATv aMv8gwSvLyT1Wa6fyUUkaK1dRtUH1QwtTC+Mh5mvO3HZbV0KMElMpBt6wSnFaKLWoRs3 d7Y1GZ0wqT7wHm7T8nw5cnKpJRwtGeQkKe6TNOLIhxuVZmNDqlU0lPTQLqtWhF9P+Skw bsFOmca/WTHP6XML5yGkkzfQMhXRWjczqK/dBPr5YxU/MiEwOr1hf2pbvdh9ukEpuVPT Q+Vu4exRgVLMS7JiGDNWacjnKYlnDpNINSGYZxlDYxYv2fU46q21uTDeAUboaoTEt5A7 PYjw== X-Forwarded-Encrypted: i=1; AJvYcCVGdGIWevYPD39I2qPNU40Yp5LXTYi3FR4LJV6Yqrz72Gf/AbwxDR3kR+ARUqbB2DtyJ3wAYFgjvw==@kvack.org X-Gm-Message-State: AOJu0Yxc/W+/5CPzMN5kL407d+ZO4ngbbGcXb0c6eK+blI7reIBcsB64 dhDZdDKDFKJl6+7dYoh6L1mQqLihGwEV+rVTxH7BKk7IvYx6HuTt X-Gm-Gg: ASbGnctoSKYXiZ6q+eeFPMbnzDwyX+5UvBH9kScSYUQSUH8wrWRsVAfmov5fukHMjKY NxW9f8Ddazr0lYaHjEsAvnMhDUYthgdBkwlRgnSaX+jB2rp7yLIiMqT6I1GpEKe4CuzGZuHntuX ZPHkZs8lfmDxM4psdZSvSzrx8uHGT5xrzr7u41vdoXSHOs4GyTlGH9MKqsf0KP5P3D62AmR/gck TzeRkvGnVDb3Njwdf+1TFQMRXlFsfPlhqARV/nKalvcjIsv7NATmAdPqwGT/+ZxauraRF5KBAsm RjfpiN6aXcvITL2aBS4pbS/UCg4VcOTdip2odidK11vSSA== X-Google-Smtp-Source: AGHT+IH+hj4eCK55DLFCsNGc0lVRgg+A1zeMhZw+NXc/fkGzJ8BIvPBk+3PGPj11qSGinQDMo+DaDA== X-Received: by 2002:a5d:64ec:0:b0:382:4a7a:403e with SMTP id ffacd0b85a97d-38260b4fb58mr2552758f8f.3.1732287260747; Fri, 22 Nov 2024 06:54:20 -0800 (PST) Received: from ?IPV6:2a02:6b67:d751:7400:1074:6f5b:b3c2:6795? ([2a02:6b67:d751:7400:1074:6f5b:b3c2:6795]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-433b463ab44sm96597825e9.30.2024.11.22.06.54.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 22 Nov 2024 06:54:20 -0800 (PST) Message-ID: <24f7d8a0-ab92-4544-91dd-5241062aad23@gmail.com> Date: Fri, 22 Nov 2024 14:54:19 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC v3 4/4] mm: fall back to four small folios if mTHP allocation fails To: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org Cc: axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, surenb@google.com, terrelln@fb.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com, Chuanhua Han References: <20241121222521.83458-1-21cnbao@gmail.com> <20241121222521.83458-5-21cnbao@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: <20241121222521.83458-5-21cnbao@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: 56214180008 X-Rspamd-Server: rspam01 X-Stat-Signature: zbfpfpg3qr3ahiu13okjbiixfx3355s6 X-HE-Tag: 1732287224-273876 X-HE-Meta: U2FsdGVkX18iD95OYE+9Q5NXa/hctlxQa3/G+dZD17kWT++VFP1F3X1Y784gVrEwvE6MxLUmFj/NpYGKQUJqQxzQN4ezgxPFUD2ZMkKHK1jv0Rj+smTuPsoKcFdPCBNM/2vLQj5XJbozhB0lVwHbJZj2qX01gKXJNgYkkXKfrlvvc53zbOD00LPPBr8nLMNlwuUM46EgZHhg50qZw8xtVEg9L+cC11U6DtuSP01+6bG7wHyEyd7cGpAo/b7Ay/ESie1wtSyXa/4pNgwFRQzM3BmCzzsAHxatxH95AhHqdqTPrF2dk9qtoUuEp0eISmZgy7bL0qw/Xah6H0QX27QiYlEhYJSqv5P1A8pwdeStVkMkIWCTCmii7gKJQpQZcODHUSZGsrpfuLuEqMWkIquf5O4H0venYEZ+3EAAK9F+8agvxJKnylyoFozZpUTKHQB9iodt4t3hoO3SyOlPZgAmOx/MUz28veoxUdnXV5TgpGwBKHhS8Ts7p3K4FlIacClH51Y/bQTfNYYak/Vy+bBLMlIRE64J1B2ZEALl4fzUnoO1zbcvYZzl5lt5cp8caOInWyNq76tjOjXnUhflGnHerclshDIqFXdjCbEP8MKreJnvjptqpPCJuvmujqJ47V7zLASxYBQfQHRUGQ68DYbu3cealL0FuNf/skoVCx/R+LTLY00unOAqF8J+rYLm9iXT3LeKEKSGaQZ2GPvvGv7ESrg8hB4a2r/aastBmso5WWiOq8SNYBnY78iauYvSGg5Ii8/8XaE6r4DR/kIV3pscBKChmCMCDzkGuti+NDFwuE/zT4V4jn+Dhw3Qzmm8ByOdaeRNU4BDjKvdhY1IsANmoQQSAKWcjGkZaXoc5G7AqgXqKrIOi609wWVMmSWhpGhngYERvAtNjxBvTa7dZAWpIH/6uvFaGfOOsLyEc/5U5YNWmbSkCE7bFhTwrbBQSTA2BsURr87USfRA4c5+veD DYuQHxZw li90yVdVwhZ/VfDq1hYY87saAybbBFThYetF85TPBrhEOtDeGwd4LRsoKXMJK4+/u+/COq18FddysvV0RWZFTbmLtymPqKZqDPQyWQ6/7IRxIiEibwoHpRrPTPmQXoBn/UuSLgoKbeFH+6/9JE+Fe/AN+jRR1OPQP7BQpqSSxd7ZVX2T/B63D/Qs8xIFVdDAxWC/5M9ns7CeJ6Moc13RObF9iBFb/tfuMOXZTYu2mgJTUpB9DaBqmEx3/JMA7gBYh7x/oWeuUAFZxyMrXrqu4CpIPpJ3b+Yho8oC46R3G9w4s2FKtL3UK36d/u0kJoP8v/d+iOz3Oj+Bw0Dll/PVi3SBu3k4ZB7BN+Md2qUY4Qfg0HplCOKczAAp3+1jFpkWlK4NLlTMZn4lSvhZ5sOrBIVFsfV0HqR3/P1tEYM5S9CctEUH6IKXt4OoFUMLuRFjotJdRYVyu7lG9noFnRHDX/7+2rORo9n26rtXBBD35q7i0qanzZgbqOkHUw8fhXILBae0R+54f0X6oZUJpYCtJ40e7we2lNNrtFS110MX5W5j3/lcTh4CRjdLR/g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 21/11/2024 22:25, Barry Song wrote: > From: Barry Song > > The swapfile can compress/decompress at 4 * PAGES granularity, reducing > CPU usage and improving the compression ratio. However, if allocating an > mTHP fails and we fall back to a single small folio, the entire large > block must still be decompressed. This results in a 16 KiB area requiring > 4 page faults, where each fault decompresses 16 KiB but retrieves only > 4 KiB of data from the block. To address this inefficiency, we instead > fall back to 4 small folios, ensuring that each decompression occurs > only once. > > Allowing swap_read_folio() to decompress and read into an array of > 4 folios would be extremely complex, requiring extensive changes > throughout the stack, including swap_read_folio, zeromap, > zswap, and final swap implementations like zRAM. In contrast, > having these components fill a large folio with 4 subpages is much > simpler. > > To avoid a full-stack modification, we introduce a per-CPU order-2 > large folio as a buffer. This buffer is used for swap_read_folio(), > after which the data is copied into the 4 small folios. Finally, in > do_swap_page(), all these small folios are mapped. > > Co-developed-by: Chuanhua Han > Signed-off-by: Chuanhua Han > Signed-off-by: Barry Song > --- > mm/memory.c | 203 +++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 192 insertions(+), 11 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 209885a4134f..e551570c1425 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4042,6 +4042,15 @@ static struct folio *__alloc_swap_folio(struct vm_fault *vmf) > return folio; > } > > +#define BATCH_SWPIN_ORDER 2 Hi Barry, Thanks for the series and the numbers in the cover letter. Just a few things. Should BATCH_SWPIN_ORDER be ZSMALLOC_MULTI_PAGES_ORDER instead of 2? Did you check the performance difference with and without patch 4? I know that it wont help if you have a lot of unmovable pages scattered everywhere, but were you able to compare the performance of defrag=always vs patch 4? I feel like if you have space for 4 folios then hopefully compaction should be able to do its job and you can directly fill the large folio if the unmovable pages are better placed. Johannes' series on preventing type mixing [1] would help. [1] https://lore.kernel.org/all/20240320180429.678181-1-hannes@cmpxchg.org/ Thanks, Usama > +#define BATCH_SWPIN_COUNT (1 << BATCH_SWPIN_ORDER) > +#define BATCH_SWPIN_SIZE (PAGE_SIZE << BATCH_SWPIN_ORDER) > + > +struct batch_swpin_buffer { > + struct folio *folio; > + struct mutex mutex; > +}; > + > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) > { > @@ -4120,7 +4129,101 @@ static inline unsigned long thp_swap_suitable_orders(pgoff_t swp_offset, > return orders; > } > > -static struct folio *alloc_swap_folio(struct vm_fault *vmf) > +static DEFINE_PER_CPU(struct batch_swpin_buffer, swp_buf); > + > +static int __init batch_swpin_buffer_init(void) > +{ > + int ret, cpu; > + struct batch_swpin_buffer *buf; > + > + for_each_possible_cpu(cpu) { > + buf = per_cpu_ptr(&swp_buf, cpu); > + buf->folio = (struct folio *)alloc_pages_node(cpu_to_node(cpu), > + GFP_KERNEL | __GFP_COMP, BATCH_SWPIN_ORDER); > + if (!buf->folio) { > + ret = -ENOMEM; > + goto err; > + } > + mutex_init(&buf->mutex); > + } > + return 0; > + > +err: > + for_each_possible_cpu(cpu) { > + buf = per_cpu_ptr(&swp_buf, cpu); > + if (buf->folio) { > + folio_put(buf->folio); > + buf->folio = NULL; > + } > + } > + return ret; > +} > +core_initcall(batch_swpin_buffer_init); > + > +static struct folio *alloc_batched_swap_folios(struct vm_fault *vmf, > + struct batch_swpin_buffer **buf, struct folio **folios, > + swp_entry_t entry) > +{ > + unsigned long haddr = ALIGN_DOWN(vmf->address, BATCH_SWPIN_SIZE); > + struct batch_swpin_buffer *sbuf = raw_cpu_ptr(&swp_buf); > + struct folio *folio = sbuf->folio; > + unsigned long addr; > + int i; > + > + if (unlikely(!folio)) > + return NULL; > + > + for (i = 0; i < BATCH_SWPIN_COUNT; i++) { > + addr = haddr + i * PAGE_SIZE; > + folios[i] = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vmf->vma, addr); > + if (!folios[i]) > + goto err; > + if (mem_cgroup_swapin_charge_folio(folios[i], vmf->vma->vm_mm, > + GFP_KERNEL, entry)) > + goto err; > + } > + > + mutex_lock(&sbuf->mutex); > + *buf = sbuf; > +#ifdef CONFIG_MEMCG > + folio->memcg_data = (*folios)->memcg_data; > +#endif > + return folio; > + > +err: > + for (i--; i >= 0; i--) > + folio_put(folios[i]); > + return NULL; > +} > + > +static void fill_batched_swap_folios(struct vm_fault *vmf, > + void *shadow, struct batch_swpin_buffer *buf, > + struct folio *folio, struct folio **folios) > +{ > + unsigned long haddr = ALIGN_DOWN(vmf->address, BATCH_SWPIN_SIZE); > + unsigned long addr; > + int i; > + > + for (i = 0; i < BATCH_SWPIN_COUNT; i++) { > + addr = haddr + i * PAGE_SIZE; > + __folio_set_locked(folios[i]); > + __folio_set_swapbacked(folios[i]); > + if (shadow) > + workingset_refault(folios[i], shadow); > + folio_add_lru(folios[i]); > + copy_user_highpage(&folios[i]->page, folio_page(folio, i), > + addr, vmf->vma); > + if (folio_test_uptodate(folio)) > + folio_mark_uptodate(folios[i]); > + } > + > + folio->flags &= ~(PAGE_FLAGS_CHECK_AT_PREP & ~(1UL << PG_head)); > + mutex_unlock(&buf->mutex); > +} > + > +static struct folio *alloc_swap_folio(struct vm_fault *vmf, > + struct batch_swpin_buffer **buf, > + struct folio **folios) > { > struct vm_area_struct *vma = vmf->vma; > unsigned long orders; > @@ -4180,6 +4283,9 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) > > pte_unmap_unlock(pte, ptl); > > + if (!orders) > + goto fallback; > + > /* Try allocating the highest of the remaining orders. */ > gfp = vma_thp_gfp_mask(vma); > while (orders) { > @@ -4194,14 +4300,29 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) > order = next_order(&orders, order); > } > > + /* > + * During swap-out, a THP might have been compressed into multiple > + * order-2 blocks to optimize CPU usage and compression ratio. > + * Attempt to batch swap-in 4 smaller folios to ensure they are > + * decompressed together as a single unit only once. > + */ > + return alloc_batched_swap_folios(vmf, buf, folios, entry); > + > fallback: > return __alloc_swap_folio(vmf); > } > #else /* !CONFIG_TRANSPARENT_HUGEPAGE */ > -static struct folio *alloc_swap_folio(struct vm_fault *vmf) > +static struct folio *alloc_swap_folio(struct vm_fault *vmf, > + struct batch_swpin_buffer **buf, > + struct folio **folios) > { > return __alloc_swap_folio(vmf); > } > +static inline void fill_batched_swap_folios(struct vm_fault *vmf, > + void *shadow, struct batch_swpin_buffer *buf, > + struct folio *folio, struct folio **folios) > +{ > +} > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); > @@ -4216,6 +4337,8 @@ static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); > */ > vm_fault_t do_swap_page(struct vm_fault *vmf) > { > + struct folio *folios[BATCH_SWPIN_COUNT] = { NULL }; > + struct batch_swpin_buffer *buf = NULL; > struct vm_area_struct *vma = vmf->vma; > struct folio *swapcache, *folio = NULL; > DECLARE_WAITQUEUE(wait, current); > @@ -4228,7 +4351,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > pte_t pte; > vm_fault_t ret = 0; > void *shadow = NULL; > - int nr_pages; > + int nr_pages, i; > unsigned long page_idx; > unsigned long address; > pte_t *ptep; > @@ -4296,7 +4419,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && > __swap_count(entry) == 1) { > /* skip swapcache */ > - folio = alloc_swap_folio(vmf); > + folio = alloc_swap_folio(vmf, &buf, folios); > if (folio) { > __folio_set_locked(folio); > __folio_set_swapbacked(folio); > @@ -4327,10 +4450,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > mem_cgroup_swapin_uncharge_swap(entry, nr_pages); > > shadow = get_shadow_from_swap_cache(entry); > - if (shadow) > + if (shadow && !buf) > workingset_refault(folio, shadow); > - > - folio_add_lru(folio); > + if (!buf) > + folio_add_lru(folio); > > /* To provide entry to swap_read_folio() */ > folio->swap = entry; > @@ -4361,6 +4484,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > count_vm_event(PGMAJFAULT); > count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); > page = folio_file_page(folio, swp_offset(entry)); > + /* > + * Copy data into batched small folios from the large > + * folio buffer > + */ > + if (buf) { > + fill_batched_swap_folios(vmf, shadow, buf, folio, folios); > + folio = folios[0]; > + page = &folios[0]->page; > + goto do_map; > + } > } else if (PageHWPoison(page)) { > /* > * hwpoisoned dirty swapcache pages are kept for killing > @@ -4415,6 +4548,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > lru_add_drain(); > } > > +do_map: > folio_throttle_swaprate(folio, GFP_KERNEL); > > /* > @@ -4431,8 +4565,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > } > > /* allocated large folios for SWP_SYNCHRONOUS_IO */ > - if (folio_test_large(folio) && !folio_test_swapcache(folio)) { > - unsigned long nr = folio_nr_pages(folio); > + if ((folio_test_large(folio) || buf) && !folio_test_swapcache(folio)) { > + unsigned long nr = buf ? BATCH_SWPIN_COUNT : folio_nr_pages(folio); > unsigned long folio_start = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); > unsigned long idx = (vmf->address - folio_start) / PAGE_SIZE; > pte_t *folio_ptep = vmf->pte - idx; > @@ -4527,6 +4661,42 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > } > } > > + /* Batched mapping of allocated small folios for SWP_SYNCHRONOUS_IO */ > + if (buf) { > + for (i = 0; i < nr_pages; i++) > + arch_swap_restore(swp_entry(swp_type(entry), > + swp_offset(entry) + i), folios[i]); > + swap_free_nr(entry, nr_pages); > + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); > + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); > + rmap_flags |= RMAP_EXCLUSIVE; > + for (i = 0; i < nr_pages; i++) { > + unsigned long addr = address + i * PAGE_SIZE; > + > + pte = mk_pte(&folios[i]->page, vma->vm_page_prot); > + if (pte_swp_soft_dirty(vmf->orig_pte)) > + pte = pte_mksoft_dirty(pte); > + if (pte_swp_uffd_wp(vmf->orig_pte)) > + pte = pte_mkuffd_wp(pte); > + if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) && > + !pte_needs_soft_dirty_wp(vma, pte)) { > + pte = pte_mkwrite(pte, vma); > + if ((vmf->flags & FAULT_FLAG_WRITE) && (i == page_idx)) { > + pte = pte_mkdirty(pte); > + vmf->flags &= ~FAULT_FLAG_WRITE; > + } > + } > + flush_icache_page(vma, &folios[i]->page); > + folio_add_new_anon_rmap(folios[i], vma, addr, rmap_flags); > + set_pte_at(vma->vm_mm, addr, ptep + i, pte); > + arch_do_swap_page_nr(vma->vm_mm, vma, addr, pte, pte, 1); > + if (i == page_idx) > + vmf->orig_pte = pte; > + folio_unlock(folios[i]); > + } > + goto wp_page; > + } > + > /* > * Some architectures may have to restore extra metadata to the page > * when reading from swap. This metadata may be indexed by swap entry > @@ -4612,6 +4782,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > folio_put(swapcache); > } > > +wp_page: > if (vmf->flags & FAULT_FLAG_WRITE) { > ret |= do_wp_page(vmf); > if (ret & VM_FAULT_ERROR) > @@ -4638,9 +4809,19 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > if (vmf->pte) > pte_unmap_unlock(vmf->pte, vmf->ptl); > out_page: > - folio_unlock(folio); > + if (!buf) { > + folio_unlock(folio); > + } else { > + for (i = 0; i < BATCH_SWPIN_COUNT; i++) > + folio_unlock(folios[i]); > + } > out_release: > - folio_put(folio); > + if (!buf) { > + folio_put(folio); > + } else { > + for (i = 0; i < BATCH_SWPIN_COUNT; i++) > + folio_put(folios[i]); > + } > if (folio != swapcache && swapcache) { > folio_unlock(swapcache); > folio_put(swapcache);