From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43C01C54E5D for ; Tue, 12 Mar 2024 15:35:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B12168D0054; Tue, 12 Mar 2024 11:35:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC2928D0036; Tue, 12 Mar 2024 11:35:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B15D8D0054; Tue, 12 Mar 2024 11:35:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8D07D8D0036 for ; Tue, 12 Mar 2024 11:35:57 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3038B1C076B for ; Tue, 12 Mar 2024 15:35:57 +0000 (UTC) X-FDA: 81888787554.28.4BA6C35 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf21.hostedemail.com (Postfix) with ESMTP id 7E0651C0008 for ; Tue, 12 Mar 2024 15:35:54 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710257754; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yZlzNQAf99YQq0ZxgNt7gVj1Y9AyxaAIqjhHg4u1XD0=; b=YfWBm+dVChCwFtT3jK3TgGrSk0J9KIUutbkzUwj9U5UgO+S4Qe6fMIvR/MVeuLPBjbwMBv 4gc0HihWGpXXm0pNigCr408AbIW/2IPfNmg0ij4tQmrRvM1A0D/Fy2g3TJrS42drI5oeGB 2PEAeEG+Ntvf/aRu+gpdIsHb33104Sc= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710257754; a=rsa-sha256; cv=none; b=RgfIYZsdjuZWT5CtW5g0TCMeLRe8Y4P2aPx9sa+kozx+FLZ0pA2ZqWVgAlkYfZVLZ5EJA1 yseXzOEJk0i/kn0DcPEz7Qh+3K0HFLbl2ewQ/oMRQsf/fpt7UBiH4rYp3w08isAy9wEnDH UQFY+Cqg18xyYKBG1qYG2Oe5528lui4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5C6991007; Tue, 12 Mar 2024 08:36:30 -0700 (PDT) Received: from [10.1.27.122] (XHFQ2J9959.cambridge.arm.com [10.1.27.122]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2FB033F762; Tue, 12 Mar 2024 08:35:48 -0700 (PDT) Message-ID: Date: Tue, 12 Mar 2024 15:35:47 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v3 4/5] mm: swap: introduce swapcache_prepare_nr and swapcache_clear_nr for large folios swap-in Content-Language: en-GB To: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Barry Song , Hugh Dickins , Minchan Kim , SeongJae Park References: <20240304081348.197341-1-21cnbao@gmail.com> <20240304081348.197341-5-21cnbao@gmail.com> From: Ryan Roberts In-Reply-To: <20240304081348.197341-5-21cnbao@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7E0651C0008 X-Stat-Signature: am1yq5nxbysq8fbhysd35antqhha81uw X-HE-Tag: 1710257754-571348 X-HE-Meta: U2FsdGVkX1+GYQVi/K1pzN8YhWfstq7Ry3NzUT1NXIJI2f2XJaVb8SsYMcMUi2sBHuZRCOQTG2ofM5WLdUfLYisi+waF4XtuVx2mxR4/bN7WRGb8L6APNTYSYO2KWXDkE5t95kVj872qZESUh7I8FqMdP1BIrpFy3636/d88Yx/HtBeoZ2Gdxf/IuPxpfNaIddFhQxC3/5xw/vMafjMHQNc/ggSxTIm9MDiUdD7WT/xCzOyRn8Tq7xD0DromAEvIhZrfhplik8XVG211Lc/6AeNnG8VnyIgjhUEYKEKECJ8m7GdWJReDq9skflX8dgkR08s9X1/zEoeVEdC0FtPFusE2IXZgFCq/6qGmYo/rxtfoJi+gyEBblyQl/1k3pTuAtMSv4svLp13kG5tRfsSvLSvohLW1ftDLtzczJBHO/h783QxyChft3JbpdP9W+weSqlo1xbwENSKwIwCMQPUJPy59WzBv7HII7nKoJkadjGI9MWtUM9QjWxuJLMxxierzXZW9pnrdfbDvIxQ60F1fd8/yEy3nptS04ViiMVQDgeSBd4OuLAwWh+xCInw8xKvEVhWFYjJb6xOXhdLgyoHna/SA4jzaOzG8sAleZ3Sgv5TxbzIrod7970rWfiiWQwGmfAxDuKJ3YDbco7k4WwY5tCH/B7tXgSfP2Pnxz5/cHOi0LpfCfamOm3lL7qPhZyHF8OLK3woSIXAkMDgi3/Pd41GDyB1I/vR2GNqXwKmyH97M9lnCnaugS4u6vAgPb9IjQXDithcmqbM7YnRqht4wx95p7oFX1eLWHrEtBZbRawU0jG/kiN+pMGDeC2B2UCfnErirCDqY9FCn9n7+1XKx2IixojRQBmMHq+Fh4PGVBe1zbpwVsByDerQVcQQfRGp2nAZVfuOYu2w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 04/03/2024 08:13, Barry Song wrote: > From: Barry Song > > Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") supports > one entry only, to support large folio swap-in, we need to handle multiple > swap entries. > > Cc: Kairui Song > Cc: "Huang, Ying" > Cc: David Hildenbrand > Cc: Chris Li > Cc: Hugh Dickins > Cc: Johannes Weiner > Cc: Matthew Wilcox (Oracle) > Cc: Michal Hocko > Cc: Minchan Kim > Cc: Yosry Ahmed > Cc: Yu Zhao > Cc: SeongJae Park > Signed-off-by: Barry Song > --- > include/linux/swap.h | 1 + > mm/swap.h | 1 + > mm/swapfile.c | 118 ++++++++++++++++++++++++++----------------- > 3 files changed, 74 insertions(+), 46 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index d6ab27929458..22105f0fe2d4 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -480,6 +480,7 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); > extern void swap_shmem_alloc(swp_entry_t); > extern int swap_duplicate(swp_entry_t); > extern int swapcache_prepare(swp_entry_t); > +extern int swapcache_prepare_nr(swp_entry_t entry, int nr); > extern void swap_free(swp_entry_t); > extern void swap_nr_free(swp_entry_t entry, int nr_pages); > extern void swapcache_free_entries(swp_entry_t *entries, int n); > diff --git a/mm/swap.h b/mm/swap.h > index fc2f6ade7f80..1cec991efcda 100644 > --- a/mm/swap.h > +++ b/mm/swap.h > @@ -42,6 +42,7 @@ void delete_from_swap_cache(struct folio *folio); > void clear_shadow_from_swap_cache(int type, unsigned long begin, > unsigned long end); > void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry); > +void swapcache_clear_nr(struct swap_info_struct *si, swp_entry_t entry, int nr); > struct folio *swap_cache_get_folio(swp_entry_t entry, > struct vm_area_struct *vma, unsigned long addr); > struct folio *filemap_get_incore_folio(struct address_space *mapping, > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 244106998a69..bae1b8165b11 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3309,7 +3309,7 @@ void si_swapinfo(struct sysinfo *val) > } > > /* > - * Verify that a swap entry is valid and increment its swap map count. > + * Verify that nr swap entries are valid and increment their swap map count. > * > * Returns error code in following case. > * - success -> 0 > @@ -3319,66 +3319,76 @@ void si_swapinfo(struct sysinfo *val) > * - swap-cache reference is requested but the entry is not used. -> ENOENT > * - swap-mapped reference requested but needs continued swap count. -> ENOMEM > */ > -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) > +static int __swap_duplicate_nr(swp_entry_t entry, int nr, unsigned char usage) perhaps its better to pass order instead of nr to all these functions to make it clearer that entry should be naturally aligned and be a power-of-2 number of pages, no bigger than SWAPFILE_CLUSTER? > { > struct swap_info_struct *p; > struct swap_cluster_info *ci; > unsigned long offset; > - unsigned char count; > - unsigned char has_cache; > - int err; > + unsigned char count[SWAPFILE_CLUSTER]; > + unsigned char has_cache[SWAPFILE_CLUSTER]; I'm not sure this 1K stack buffer is a good idea? Could you split it slightly differently so that loop 1 just does error checking and bails out if an error is found, and loop 2 does the new value calculation and writeback? Then you don't need these arrays. > + int err, i; > > p = swp_swap_info(entry); > > offset = swp_offset(entry); > ci = lock_cluster_or_swap_info(p, offset); > > - count = p->swap_map[offset]; > - > - /* > - * swapin_readahead() doesn't check if a swap entry is valid, so the > - * swap entry could be SWAP_MAP_BAD. Check here with lock held. > - */ > - if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { > - err = -ENOENT; > - goto unlock_out; > - } > - > - has_cache = count & SWAP_HAS_CACHE; > - count &= ~SWAP_HAS_CACHE; > - err = 0; > - > - if (usage == SWAP_HAS_CACHE) { > + for (i = 0; i < nr; i++) { > + count[i] = p->swap_map[offset + i]; > > - /* set SWAP_HAS_CACHE if there is no cache and entry is used */ > - if (!has_cache && count) > - has_cache = SWAP_HAS_CACHE; > - else if (has_cache) /* someone else added cache */ > - err = -EEXIST; > - else /* no users remaining */ > + /* > + * swapin_readahead() doesn't check if a swap entry is valid, so the > + * swap entry could be SWAP_MAP_BAD. Check here with lock held. > + */ > + if (unlikely(swap_count(count[i]) == SWAP_MAP_BAD)) { > err = -ENOENT; > + goto unlock_out; > + } > > - } else if (count || has_cache) { > - > - if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) > - count += usage; > - else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) > - err = -EINVAL; > - else if (swap_count_continued(p, offset, count)) > - count = COUNT_CONTINUED; > - else > - err = -ENOMEM; > - } else > - err = -ENOENT; /* unused swap entry */ > + has_cache[i] = count[i] & SWAP_HAS_CACHE; > + count[i] &= ~SWAP_HAS_CACHE; > + err = 0; > + > + if (usage == SWAP_HAS_CACHE) { > + > + /* set SWAP_HAS_CACHE if there is no cache and entry is used */ > + if (!has_cache[i] && count[i]) > + has_cache[i] = SWAP_HAS_CACHE; > + else if (has_cache[i]) /* someone else added cache */ > + err = -EEXIST; > + else /* no users remaining */ > + err = -ENOENT; > + } else if (count[i] || has_cache[i]) { > + > + if ((count[i] & ~COUNT_CONTINUED) < SWAP_MAP_MAX) > + count[i] += usage; > + else if ((count[i] & ~COUNT_CONTINUED) > SWAP_MAP_MAX) > + err = -EINVAL; > + else if (swap_count_continued(p, offset + i, count[i])) > + count[i] = COUNT_CONTINUED; > + else > + err = -ENOMEM; > + } else > + err = -ENOENT; /* unused swap entry */ > > - if (!err) > - WRITE_ONCE(p->swap_map[offset], count | has_cache); > + if (err) > + break; > + } > > + if (!err) { > + for (i = 0; i < nr; i++) > + WRITE_ONCE(p->swap_map[offset + i], count[i] | has_cache[i]); > + } > unlock_out: > unlock_cluster_or_swap_info(p, ci); > return err; > } > > +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) > +{ > + return __swap_duplicate_nr(entry, 1, usage); > +} > + > /* > * Help swapoff by noting that swap entry belongs to shmem/tmpfs > * (in which case its reference count is never incremented). > @@ -3417,17 +3427,33 @@ int swapcache_prepare(swp_entry_t entry) > return __swap_duplicate(entry, SWAP_HAS_CACHE); > } > > -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) > +int swapcache_prepare_nr(swp_entry_t entry, int nr) > +{ > + return __swap_duplicate_nr(entry, nr, SWAP_HAS_CACHE); > +} > + > +void swapcache_clear_nr(struct swap_info_struct *si, swp_entry_t entry, int nr) > { > struct swap_cluster_info *ci; > unsigned long offset = swp_offset(entry); > - unsigned char usage; > + unsigned char usage[SWAPFILE_CLUSTER]; > + int i; > > ci = lock_cluster_or_swap_info(si, offset); > - usage = __swap_entry_free_locked(si, offset, SWAP_HAS_CACHE); > + for (i = 0; i < nr; i++) > + usage[i] = __swap_entry_free_locked(si, offset + i, SWAP_HAS_CACHE); > unlock_cluster_or_swap_info(si, ci); > - if (!usage) > - free_swap_slot(entry); > + for (i = 0; i < nr; i++) { > + if (!usage[i]) { > + free_swap_slot(entry); > + entry.val++; > + } > + } > +} This is pretty similar to swap_nr_free() which you added in patch 2. Except swap_nr_free() passes 1 as last param to __swap_entry_free_locked() and this passes SWAP_HAS_CACHE. Perhaps their should be a common helper? I think swap_nr_free()'s usage bitmap is preferable to this version's char array too. Thanks, Ryan > + > +void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) > +{ > + swapcache_clear_nr(si, entry, 1); > } > > struct swap_info_struct *swp_swap_info(swp_entry_t entry)