From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A45DC35FF3 for ; Thu, 13 Mar 2025 17:02:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C98928000C; Thu, 13 Mar 2025 13:01:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07A5A280001; Thu, 13 Mar 2025 13:01:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC16F28000C; Thu, 13 Mar 2025 13:01:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B6896280001 for ; Thu, 13 Mar 2025 13:01:57 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3C5CD50547 for ; Thu, 13 Mar 2025 17:01:59 +0000 (UTC) X-FDA: 83217145158.11.B10F95E Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf15.hostedemail.com (Postfix) with ESMTP id D3CC7A0039 for ; Thu, 13 Mar 2025 17:01:56 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZSuebyO8; spf=pass (imf15.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741885316; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N6Z7115UVx6Vl+tDLjAm/FdoMPRDXf2lrj7KLmbVn88=; b=pixXm0GqWyKP8DNpOtpfVAXQ/9tOPjUoTWo4HIH/m4N/lqNMAkGW0wYnrXzvQNbBUv+/US nKl9KJ5n/68dgcKxAiYwdtb1y6dcl7fEuNZAnG1HT+RJEsSKrdJMQu4ugVNCa6VgsgSvFO H7ytPth8BNEFprwYQ7qz7ItC8BKV1a8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZSuebyO8; spf=pass (imf15.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741885316; a=rsa-sha256; cv=none; b=l/iOO3M0OI2GCVf0KeC+IBPq+Co3xX/UKZFF+mkc74YjDlua5VYvKczQn9KnOSd5XwgntF 2omQR5m+GBmCRup/QcRVgKCCdpYcmoVPVo+qK2Wb8F6QVJ0I6X+O0h5prMK6YzISIfaiZC wd98uEaS87jHAVFUuZXKhx5RaO5yeqY= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-224019ad9edso32995575ad.1 for ; Thu, 13 Mar 2025 10:01:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741885315; x=1742490115; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=N6Z7115UVx6Vl+tDLjAm/FdoMPRDXf2lrj7KLmbVn88=; b=ZSuebyO8xScgVeIdkhLJQJQJo8jJB6/G+IF9Hic5sdK2C3epW+FeQPzoxUSRzAsuA/ xfB0H1G6TchqiX3nx38XdqiyZQ7AIz6XBEiIu3NE0wYbCJMSrc/Uz8Jx8eAqUqRB1rPQ frerWd3vUVO/3cYZoiF1FekYGnu2P3R72gGrnzzIgN+kfbb1Mq97sG53dzUvCrrRwfaP f2EPTIEuIywvNQUm8jQff3qWG1KHBzuqC5UqRvV/kTd8zX+6LDlUKwb/YXvgTorvFKXY POxeWsCNQU2k90R9bFNs51WLydi19oP27yWdg0er4+Dg2yR9VkAoUdMC+tNdcOwbeC31 EJLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741885315; x=1742490115; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=N6Z7115UVx6Vl+tDLjAm/FdoMPRDXf2lrj7KLmbVn88=; b=rESUhMgppK1mJCS+o0eqhXL2MzTqgvCyFWudLOg1tmtOx+CC/LZ2S1TSUy09kipPGW LP6nRcdy3gL3by+wiTPVilZwnB2PrdI1lvomOYSKUROojO3DlGfVaK/K0zXfOfGakfWu KynJRUumnHWnYnh9xxRq3h9bQbb7/wUpPK9YmpFZc8mhTndpZSUq56pqvxxQq2BVQDcB y89Ev0k30OzJ4jO6SgiEj3ofHA6NmSs0b5yVaLmx3UmU7BSEZyonnGJ21l0/SXT/nMEL x3TX3cvFxBk01pLabMH/cmXhkVW3UWc3Z1AUR/LcCiuaRoJJgxl7mjrRDW5TxTI/ON2b h7Yw== X-Gm-Message-State: AOJu0YwPJ0/exFebm47MJgHQovUKbQSmvek5RnoVD+CAPzGVM8o/agor EwdPw9rWLVjIHC7EnlHRtU712eX0kggeMkRtxWwv1W2AxVMC24w5fc6lekOkiys= X-Gm-Gg: ASbGncu298ZfwPtO1lADlJN3IBz7eY+8eR8WiKa7CtEnkV1weQ3UbcwMSIDBN/iqyYi 61o57ryzIaQwGbbCZ/V4T79Sp2fw1UNYZBoDXX4nlVV15e6zZRI5P7fv6MvQmG6atHBbr+AF4+u aVqdM4isoCXcW/pqMvryDMBiLCK76v16oK1uPrwqrrpdj3TrFRULMNxVQQAJa9kmHbT28Zy1cRo hAVgWakPpxHE05xoX1Q9jgRF84a+B9lnkOSaWQ6FToNMHI1rmOUyiHRan+pezvXPdUes8Hz5I/q C94od+tzviGPcwoWqIQClOn/+RC8DPGpR3ffR6uYeul1kL0N/b9m1nyRYOspYq/YZw== X-Google-Smtp-Source: AGHT+IGFqZ5LNoVkmizf8XxxIrr2WieCKS2BNmYocBIQLWMsAOjYTWAUqhLCVcG6WQw9yTbXp2I88g== X-Received: by 2002:a17:902:f605:b0:220:cd9a:a167 with SMTP id d9443c01a7336-225dd822bbdmr4475895ad.4.1741885315030; Thu, 13 Mar 2025 10:01:55 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.220]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167df0esm1613529b3a.93.2025.03.13.10.01.49 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 10:01:54 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 7/7] mm, swap: simplify folio swap allocation Date: Fri, 14 Mar 2025 00:59:35 +0800 Message-ID: <20250313165935.63303-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313165935.63303-1-ryncsn@gmail.com> References: <20250313165935.63303-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: D3CC7A0039 X-Rspamd-Server: rspam03 X-Stat-Signature: 5g8pm4e8qo6ern96jnxom1eh77z8fwcs X-HE-Tag: 1741885316-741305 X-HE-Meta: U2FsdGVkX1+KLUtDBHwPU49XNjPu19w3OS1T154IRJ1QYLusPZAdQiUuXPLg+8rWqsIZh2qjYuIBZKX8oVPDmp+GmLwl0zJLs6J6C4VgB3FDpbowaUDZ5UovtzwYPy4HDZxJlp9I80WBisElmU9LfFhJmoTY5d/G2DdtXTZgqmHKF3CdZpX9IYw8mi22jBJfP5T38XlgkzOBMLtDTUdwYHc42kBkxMCAnjqZCrNUmfZTfcZ7TIi1w60e1Vz6zLyVF+35D9wEoLD+bJF0vrssNVsL411W/vXlyd4namBYl8SzbyWeosB2bPgWHHmqtXjMBVIBheRhLHYAON0UOE61QAijF5Mx/zwsXT9VzE9QFHAa3FZexainmmke0eTyuBakpsh3nr11X4/83I3awUHgmlpDW2Xl0y6s6BoG9Cni5dQSAXpfzcs1VwjyScvdAdNRJATy/Dzcf3E0s1lr8rLkyGGb2yY+OutXDznTFCBtGrkUSvDa98nSKYkJ5oYWnmq43Pw0bA+osv+XHWJq+JIXoWwRSCJptz5HiDaxWbNrPeyS+9ntAZyk74hayXZKJcXXmAFvdz6XA0FzRiqPUVzZbro8ZGefSVWIUEIiyZkWhBt+i3EGoVaFqQ1xIq+nrWmX+8TqeKQNW6WvTXLwk4t0VFosXZjfCNqpRXRdoF/dxhHgXSsNu9LQZ+s1khX8AFk/Qe2TV69fyAFe6NWoxoxqxiIfvQgnq5ujjY8k1wZcWzXuz4E6IgJOG1IilogNsKvEQiONBvU6QcblL216XWvDj2dQbIXNDGSiBmVfa+PdcuDQ1JQCcD0yahsX08BJ0VXMivoBBCM7COlTFpaDY8BMP43Cupyxgo1Rv4f8obiV9qRiqi7sOHMj0VP81mVvpF0SCbRG2N8V7ROFkoKUAoD0H2TLhLpK72fWH/CnXlqx+GuuCO9CNfQDzeyc2EmZnIwDSZk/eryUuT0+7TDpfig uhUU1dMk xY3PUo0cIpE7LaSWiJMuXChD/2GJNnmdo+YAYquXXCbtfjRPVYF/iuRCihD9HJJ1DPitN5kpRVq/ITM09vhs+r9WZb2sAD8SwC6VR0dFrA4M09/ddEOPWlPl+bruooeXn645+ZcI/YjlZSRixvcLa5wrf9PbzYFbKxDagXVcnac/90ysATrN85Lcs9msA31iOoqUn+8L9C6uj1enyyWltdVc8FMBdMcGqhxWsRalAQdcbMTZiseqpcVy7KBlm00P8QXxGBAvI18vA4xJyM9LFNCQby305NIjGRFOyUUKEqhMcBx7R/QG7IOhPJzMIxGXfcE8IHCJ8WdvsNoWlEujdIMuibCmkf972wJlRcHmywLiYJHQokkdxyIk/w6pMeycUETX7RzYCqwyNk/5wzP6IBkG4zudWqAc4gA3B3bQKnUlMLqH6w9nmR71FPsl3i1dnt855GP+DBGRcn/U/dmqwo4+5cMhyWK++oE+eAPBNdS+WlX8uvFXDgv2Qkff3hmsrivQRV0WcRQYDnzhAOs7puHmx0niryWv4Zm02ujrlJhiKKRknbwrlW0mnYvV7RRM4rFgc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song With slot cache gone, clean up the allocation helpers even more. folio_alloc_swap will be the only entry for allocation and adding the folio to swap cache (except suspend), making it opposite of folio_free_swap. Signed-off-by: Kairui Song --- include/linux/swap.h | 8 ++-- mm/shmem.c | 21 +++----- mm/swap.h | 6 --- mm/swap_state.c | 57 ---------------------- mm/swapfile.c | 111 ++++++++++++++++++++++++++++--------------- mm/vmscan.c | 16 ++++++- 6 files changed, 95 insertions(+), 124 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index c5856dcc263a..9c99eee160f9 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -478,7 +478,7 @@ static inline long get_nr_swap_pages(void) } extern void si_swapinfo(struct sysinfo *); -swp_entry_t folio_alloc_swap(struct folio *folio); +int folio_alloc_swap(struct folio *folio, gfp_t gfp_mask); bool folio_free_swap(struct folio *folio); void put_swap_folio(struct folio *folio, swp_entry_t entry); extern swp_entry_t get_swap_page_of_type(int); @@ -586,11 +586,9 @@ static inline int swp_swapcount(swp_entry_t entry) return 0; } -static inline swp_entry_t folio_alloc_swap(struct folio *folio) +static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp_mask) { - swp_entry_t entry; - entry.val = 0; - return entry; + return -EINVAL; } static inline bool folio_free_swap(struct folio *folio) diff --git a/mm/shmem.c b/mm/shmem.c index 1eed26bf8ae5..7b738d8d6581 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1546,7 +1546,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) struct inode *inode = mapping->host; struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); - swp_entry_t swap; pgoff_t index; int nr_pages; bool split = false; @@ -1628,14 +1627,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) folio_mark_uptodate(folio); } - swap = folio_alloc_swap(folio); - if (!swap.val) { - if (nr_pages > 1) - goto try_split; - - goto redirty; - } - /* * Add inode to shmem_unuse()'s list of swapped-out inodes, * if it's not already there. Do it now before the folio is @@ -1648,20 +1639,20 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) if (list_empty(&info->swaplist)) list_add(&info->swaplist, &shmem_swaplist); - if (add_to_swap_cache(folio, swap, - __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN, - NULL) == 0) { + if (!folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN)) { shmem_recalc_inode(inode, 0, nr_pages); - swap_shmem_alloc(swap, nr_pages); - shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap)); + swap_shmem_alloc(folio->swap, nr_pages); + shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap)); mutex_unlock(&shmem_swaplist_mutex); BUG_ON(folio_mapped(folio)); return swap_writepage(&folio->page, wbc); } + list_del_init(&info->swaplist); mutex_unlock(&shmem_swaplist_mutex); - put_swap_folio(folio, swap); + if (nr_pages > 1) + goto try_split; redirty: folio_mark_dirty(folio); if (wbc->for_reclaim) diff --git a/mm/swap.h b/mm/swap.h index ad2f121de970..0abb68091b4f 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -50,7 +50,6 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry) } void show_swap_cache_info(void); -bool add_to_swap(struct folio *folio); void *get_shadow_from_swap_cache(swp_entry_t entry); int add_to_swap_cache(struct folio *folio, swp_entry_t entry, gfp_t gfp, void **shadowp); @@ -163,11 +162,6 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping, return filemap_get_folio(mapping, index); } -static inline bool add_to_swap(struct folio *folio) -{ - return false; -} - static inline void *get_shadow_from_swap_cache(swp_entry_t entry) { return NULL; diff --git a/mm/swap_state.c b/mm/swap_state.c index 2b5744e211cd..68fd981b514f 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -166,63 +166,6 @@ void __delete_from_swap_cache(struct folio *folio, __lruvec_stat_mod_folio(folio, NR_SWAPCACHE, -nr); } -/** - * add_to_swap - allocate swap space for a folio - * @folio: folio we want to move to swap - * - * Allocate swap space for the folio and add the folio to the - * swap cache. - * - * Context: Caller needs to hold the folio lock. - * Return: Whether the folio was added to the swap cache. - */ -bool add_to_swap(struct folio *folio) -{ - swp_entry_t entry; - int err; - - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); - - entry = folio_alloc_swap(folio); - if (!entry.val) - return false; - - /* - * XArray node allocations from PF_MEMALLOC contexts could - * completely exhaust the page allocator. __GFP_NOMEMALLOC - * stops emergency reserves from being allocated. - * - * TODO: this could cause a theoretical memory reclaim - * deadlock in the swap out path. - */ - /* - * Add it to the swap cache. - */ - err = add_to_swap_cache(folio, entry, - __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, NULL); - if (err) - goto fail; - /* - * Normally the folio will be dirtied in unmap because its - * pte should be dirty. A special case is MADV_FREE page. The - * page's pte could have dirty bit cleared but the folio's - * SwapBacked flag is still set because clearing the dirty bit - * and SwapBacked flag has no lock protected. For such folio, - * unmap will not set dirty bit for it, so folio reclaim will - * not write the folio out. This can cause data corruption when - * the folio is swapped in later. Always setting the dirty flag - * for the folio solves the problem. - */ - folio_mark_dirty(folio); - - return true; - -fail: - put_swap_folio(folio, entry); - return false; -} - /* * This must be called only on folios that have * been verified to be in the swap cache and locked. diff --git a/mm/swapfile.c b/mm/swapfile.c index 9bd95173865d..2eff8b51a945 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1176,9 +1176,8 @@ static bool get_swap_device_info(struct swap_info_struct *si) * Fast path try to get swap entries with specified order from current * CPU's swap entry pool (a cluster). */ -static int swap_alloc_fast(swp_entry_t *entry, - unsigned char usage, - int order) +static bool swap_alloc_fast(swp_entry_t *entry, + int order) { struct swap_cluster_info *ci; struct swap_info_struct *si; @@ -1197,7 +1196,7 @@ static int swap_alloc_fast(swp_entry_t *entry, if (cluster_is_usable(ci, order)) { if (cluster_is_empty(ci)) offset = cluster_offset(si, ci); - found = alloc_swap_scan_cluster(si, ci, offset, order, usage); + found = alloc_swap_scan_cluster(si, ci, offset, order, SWAP_HAS_CACHE); if (found) *entry = swp_entry(si->type, found); } else { @@ -1208,47 +1207,30 @@ static int swap_alloc_fast(swp_entry_t *entry, return !!found; } -swp_entry_t folio_alloc_swap(struct folio *folio) +/* Rotate the device and switch to a new cluster */ +static bool swap_alloc_slow(swp_entry_t *entry, + int order) { - unsigned int order = folio_order(folio); - unsigned int size = 1 << order; - struct swap_info_struct *si, *next; - swp_entry_t entry = {}; - unsigned long offset; int node; + unsigned long offset; + struct swap_info_struct *si, *next; - if (order) { - /* - * Should not even be attempting large allocations when huge - * page swap is disabled. Warn and fail the allocation. - */ - if (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER) { - VM_WARN_ON_ONCE(1); - return entry; - } - } - - /* Fast path using percpu cluster */ - local_lock(&percpu_swap_cluster.lock); - if (swap_alloc_fast(&entry, SWAP_HAS_CACHE, order)) - goto out; - - /* Rotate the device and switch to a new cluster */ + node = numa_node_id(); spin_lock(&swap_avail_lock); start_over: - node = numa_node_id(); plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { + /* Rotate the device and switch to a new cluster */ plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { offset = cluster_alloc_swap_entry(si, order, SWAP_HAS_CACHE); put_swap_device(si); if (offset) { - entry = swp_entry(si->type, offset); - goto out; + *entry = swp_entry(si->type, offset); + return true; } if (order) - goto out; + return false; } spin_lock(&swap_avail_lock); @@ -1267,16 +1249,67 @@ swp_entry_t folio_alloc_swap(struct folio *folio) goto start_over; } spin_unlock(&swap_avail_lock); -out: + return false; +} + +/** + * folio_alloc_swap - allocate swap space for a folio + * @folio: folio we want to move to swap + * @gfp: gfp mask for shadow nodes + * + * Allocate swap space for the folio and add the folio to the + * swap cache. + * + * Context: Caller needs to hold the folio lock. + * Return: Whether the folio was added to the swap cache. + */ +int folio_alloc_swap(struct folio *folio, gfp_t gfp) +{ + unsigned int order = folio_order(folio); + unsigned int size = 1 << order; + swp_entry_t entry = {}; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); + + /* + * Should not even be attempting large allocations when huge + * page swap is disabled. Warn and fail the allocation. + */ + if (order && (!IS_ENABLED(CONFIG_THP_SWAP) || size > SWAPFILE_CLUSTER)) { + VM_WARN_ON_ONCE(1); + return -EINVAL; + } + + local_lock(&percpu_swap_cluster.lock); + if (!swap_alloc_fast(&entry, order)) + swap_alloc_slow(&entry, order); local_unlock(&percpu_swap_cluster.lock); + /* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */ - if (mem_cgroup_try_charge_swap(folio, entry)) { - put_swap_folio(folio, entry); - entry.val = 0; - } - if (entry.val) - atomic_long_sub(size, &nr_swap_pages); - return entry; + if (mem_cgroup_try_charge_swap(folio, entry)) + goto out_free; + + if (!entry.val) + return -ENOMEM; + + /* + * XArray node allocations from PF_MEMALLOC contexts could + * completely exhaust the page allocator. __GFP_NOMEMALLOC + * stops emergency reserves from being allocated. + * + * TODO: this could cause a theoretical memory reclaim + * deadlock in the swap out path. + */ + if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) + goto out_free; + + atomic_long_sub(size, &nr_swap_pages); + return 0; + +out_free: + put_swap_folio(folio, entry); + return -ENOMEM; } static struct swap_info_struct *_swap_info_get(swp_entry_t entry) diff --git a/mm/vmscan.c b/mm/vmscan.c index 84ec20f12200..2bc740637a6c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1289,7 +1289,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, split_folio_to_list(folio, folio_list)) goto activate_locked; } - if (!add_to_swap(folio)) { + if (folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOWARN)) { int __maybe_unused order = folio_order(folio); if (!folio_test_large(folio)) @@ -1305,9 +1305,21 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, } #endif count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK); - if (!add_to_swap(folio)) + if (folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOWARN)) goto activate_locked_split; } + /* + * Normally the folio will be dirtied in unmap because its + * pte should be dirty. A special case is MADV_FREE page. The + * page's pte could have dirty bit cleared but the folio's + * SwapBacked flag is still set because clearing the dirty bit + * and SwapBacked flag has no lock protected. For such folio, + * unmap will not set dirty bit for it, so folio reclaim will + * not write the folio out. This can cause data corruption when + * the folio is swapped in later. Always setting the dirty flag + * for the folio solves the problem. + */ + folio_mark_dirty(folio); } } -- 2.48.1