From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C488CC30653 for ; Thu, 4 Jul 2024 23:23:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51EA56B00B1; Thu, 4 Jul 2024 19:23:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CE066B00B3; Thu, 4 Jul 2024 19:23:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 395606B00B4; Thu, 4 Jul 2024 19:23:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1A4EB6B00B1 for ; Thu, 4 Jul 2024 19:23:01 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8BE98A3FDC for ; Thu, 4 Jul 2024 23:23:00 +0000 (UTC) X-FDA: 82303647720.13.CC891D8 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf09.hostedemail.com (Postfix) with ESMTP id C533A14000F for ; Thu, 4 Jul 2024 23:22:58 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Dz1kDJ0T; spf=pass (imf09.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720135366; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FnJ5zgYZ2qHOPibO++k221w8eu/HKBo/52AdZR0l3Uc=; b=AHqRZa70ck+fzNA4UJH2KSe5mpl+LOPREvzrfRzXh21HbibhI9iZb5rs8BhL+Ws92Ncjhf 9yJv8Ztd8FlxW8NaS9zQKtBZWR9CyDkFIE1ciZNZ6CQivRMq0r7VStHYIcuCOF/Scw2nt6 jk2VQTvdUaiPeLIySwSOmDuL/FRN69s= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Dz1kDJ0T; spf=pass (imf09.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720135366; a=rsa-sha256; cv=none; b=Y1Le1s6w7oa8mGlu25nWHFmFsFhs87JH1Py4n/nfFqnuVJfx+2CLzZn564Z+E0FpdLSkQZ /YjNM6sPgIGM3IUDdDYQl2MKlGj/RgTqYvxSAF9QXC0fndTqZKH6HdlkbNHR29UiafaD0K /n9x2Oy5WCu8QK0vkVyecIbYAcVTuPc= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id CB714628DD; Thu, 4 Jul 2024 23:22:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18DECC3277B; Thu, 4 Jul 2024 23:22:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1720135377; bh=WcyGzBDPm08jDqw9VW7mWQ1QgDFVUCZ5sEiUBFTEALY=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Dz1kDJ0TGUKuQxQI0YRdtxhLnMEAjmd05nT+q/gspjPw1TfnMUwr5t0zrJ3dmmj6C FTz6hJCYInPVC7ygaRAAkUsj3NV8OQO+ebkTDuq5FiWRyrqXfiQinc9yJsOxm4Pm64 2OYItLZ0tzEtlrUC7YEjNM8n5qEFW/xhhYT7cCzE= Date: Thu, 4 Jul 2024 16:22:56 -0700 From: Andrew Morton To: Usama Arif Cc: Johannes Weiner , shakeel.butt@linux.dev, david@redhat.com, ying.huang@intel.com, hughd@google.com, willy@infradead.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Andi Kleen Subject: Re: [PATCH v7 1/2] mm: store zero pages to be swapped out in a bitmap Message-Id: <20240704162256.f64fa9b6752d0d5e003f9c18@linux-foundation.org> In-Reply-To: <6f8b64f8-b7b0-42f4-a1d4-bf79624cff1e@gmail.com> References: <20240627105730.3110705-1-usamaarif642@gmail.com> <20240627105730.3110705-2-usamaarif642@gmail.com> <20240627161852.GA469122@cmpxchg.org> <44a57df4-e54c-47ee-96b8-e2361c549239@gmail.com> <5743d4e4-3e34-4ac1-b4a9-0ddc4f0e624d@gmail.com> <6f8b64f8-b7b0-42f4-a1d4-bf79624cff1e@gmail.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: C533A14000F X-Stat-Signature: b9q4xkqofhpxwfp3aznen4ze5t7rrpba X-HE-Tag: 1720135378-723787 X-HE-Meta: U2FsdGVkX180RnPrKAY8WfogvJ+gKd1U3UZXJ+hTxP6IOhDcZwyqNtNKpUykBb/faKgrX9l7sQp0YRMix5KkULcfeZYiLxOpwFdlWkImWYjahLD7f1CRUuy5qHz3EALOvrxguL8BSWx7ta9w099NrzeovnRiE9V5EgVXMlHFYHjkGy8oJ11VUErHGUrHzABejzxnM/RvZYYcMzVK5409QReEZoJsEO+a9lD7yUNslhB6iV0ekeDonYRUT1sX+gn1yBf+Kwqo04e4n6CuthRyjrtIpwVwgiJRMmebUYXyo2xiOOj+PFeOm5lUOGg6maM59De7wuzAHkDo6FVka8I+JxEBJoiiDorVhzwxOp1r6kW9K8uOCeBvxj1M2yHaxMS6RvKHBICJIf7tsbxS8Gp5VwWPd9gPfwZskdvnjhGLU3l5UvaTcfulfcJyX5S35a3orPeib+PJ7fdPYlhZQcTEdWmMDS2/jIdMu501kZD6KM1xXrIMpAJmx7hFlXkmnrB/0wI2S3DBbnKnH6I1PjH4KZmWyrsO82oZgQ77E007BBGkI3J8pWQNm1arjIN85hFl9m85jTOrGg2uazqldZp0go2XtrHxR6xZeJhdqDNzcFphworrwjmHNCTYYSHG/RexUoFH8DPzRk+QEQCAQx1+L/DWEZ9TZTUNHCLJCma/MpqN9Kd//wVrTw5m61egA+JjPK9X6vu/OEkjhG0XzcGOuKV7IPJF2c+uPN6r6NkXrnGAjjWeD0FBSkJZgtpiHvv984789rno15X+vJKrih9JsBVPaPRqkcXe0cCMQoE7a9xdV8n5YfQFhsK7za4R8i92gIDV+6Gc7vfuQQBnsbdOHsfKjDlBtkRE5jgLioPOZTUX5iiTUG3UPFPGLdSe0B7A1f/xHOgBuLURfrhbvummRAnHT+UcQLflDupuQILj2GLmf4JcWleF4c86xWQv0NLiJXbUOMUiogmxa/Ht8Zo R21g+lqz lMfGjlGaKUWDRi6DTRXoPpApbL8AjOsbmE2ctfyzyZCwTL5Uimqd2NSvVCUjSksZBQOIphtMo5Rp2clutD12eitsVbuqqQiB0jwA5EuGWQa0nKsadBtDd6cvLrhpfIvPVmaOyHcVnULlho3Bnwo7uG3k9bMbicNjXBJzVwpVI+0gp4ryMmn+YaU6Jy9LKFif35W9Tnm9PVzXfi1zLSoRyuyG8QyaEaiPAC4UafwnZVK/jHa1l4vVWUNYS0IG3rNys0gAH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 1 Jul 2024 20:15:46 +0300 Usama Arif wrote: > Changes from v7 (Johannes): > - Give overview and explain how locking works in zeromap in comments > - Add comment for why last word is checked first when checking if > folio is zero-filled > - Merge is_folio_zero_filled and is_folio_page_zero_filled into > 1 function. > - Use folio_zero_range to fill a folio with zero at readtime. > - Put swap_zeromap_folio_clear in an else branch (even if checkpatch > gives warning) and add comment to make it explicitly clear that it > needs to happen if folio is not zero filled. > - add missing kvfree for zeromap incase swapon fails. I queued the below as a delta against what was in mm-unstable. Can we please get this nailed down? It has been nearly three weeks and this patch is getting in the way of the "mm/zsmalloc: change back to per-size_class lock" series, at least. include/linux/swap.h | 2 - mm/page_io.c | 64 +++++++++++++++++++++-------------------- mm/swapfile.c | 9 +++-- 3 files changed, 40 insertions(+), 35 deletions(-) --- a/include/linux/swap.h~mm-store-zero-pages-to-be-swapped-out-in-a-bitmap-v8 +++ a/include/linux/swap.h @@ -299,7 +299,7 @@ struct swap_info_struct { signed char type; /* strange name for an index */ unsigned int max; /* extent of the swap_map */ unsigned char *swap_map; /* vmalloc'ed array of usage counts */ - unsigned long *zeromap; /* vmalloc'ed bitmap to track zero pages */ + unsigned long *zeromap; /* kvmalloc'ed bitmap to track zero pages */ struct swap_cluster_info *cluster_info; /* cluster info. Only for SSD */ struct swap_cluster_list free_clusters; /* free clusters list */ unsigned int lowest_bit; /* index of first free in swap_map */ --- a/mm/page_io.c~mm-store-zero-pages-to-be-swapped-out-in-a-bitmap-v8 +++ a/mm/page_io.c @@ -172,42 +172,34 @@ bad_bmap: goto out; } -static bool is_folio_page_zero_filled(struct folio *folio, int i) -{ - unsigned long *data; - unsigned int pos, last_pos = PAGE_SIZE / sizeof(*data) - 1; - bool ret = false; - - data = kmap_local_folio(folio, i * PAGE_SIZE); - if (data[last_pos]) - goto out; - for (pos = 0; pos < PAGE_SIZE / sizeof(*data); pos++) { - if (data[pos]) - goto out; - } - ret = true; -out: - kunmap_local(data); - return ret; -} - static bool is_folio_zero_filled(struct folio *folio) { + unsigned int pos, last_pos; + unsigned long *data; unsigned int i; + last_pos = PAGE_SIZE / sizeof(*data) - 1; for (i = 0; i < folio_nr_pages(folio); i++) { - if (!is_folio_page_zero_filled(folio, i)) + data = kmap_local_folio(folio, i * PAGE_SIZE); + /* + * Check last word first, incase the page is zero-filled at + * the start and has non-zero data at the end, which is common + * in real-world workloads. + */ + if (data[last_pos]) { + kunmap_local(data); return false; + } + for (pos = 0; pos < last_pos; pos++) { + if (data[pos]) { + kunmap_local(data); + return false; + } + } + kunmap_local(data); } - return true; -} -static void folio_zero_fill(struct folio *folio) -{ - unsigned int i; - - for (i = 0; i < folio_nr_pages(folio); i++) - clear_highpage(folio_page(folio, i)); + return true; } static void swap_zeromap_folio_set(struct folio *folio) @@ -278,12 +270,24 @@ int swap_writepage(struct page *page, st return ret; } + /* + * Use a bitmap (zeromap) to avoid doing IO for zero-filled pages. + * The bits in zeromap are protected by the locked swapcache folio + * and atomic updates are used to protect against read-modify-write + * corruption due to other zero swap entries seeing concurrent updates. + */ if (is_folio_zero_filled(folio)) { swap_zeromap_folio_set(folio); folio_unlock(folio); return 0; + } else { + /* + * Clear bits this folio occupies in the zeromap to prevent + * zero data being read in from any previous zero writes that + * occupied the same swap entries. + */ + swap_zeromap_folio_clear(folio); } - swap_zeromap_folio_clear(folio); if (zswap_store(folio)) { folio_unlock(folio); return 0; @@ -528,7 +532,7 @@ static bool swap_read_folio_zeromap(stru if (WARN_ON_ONCE(idx < folio_nr_pages(folio))) return true; - folio_zero_fill(folio); + folio_zero_range(folio, 0, folio_size(folio)); folio_mark_uptodate(folio); return true; } --- a/mm/swapfile.c~mm-store-zero-pages-to-be-swapped-out-in-a-bitmap-v8 +++ a/mm/swapfile.c @@ -3181,11 +3181,11 @@ SYSCALL_DEFINE2(swapon, const char __use } /* - * Use kvmalloc_array instead of bitmap_zalloc as the allocation order - * might be above MAX_PAGE_ORDER incase of a large swap file. + * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might + * be above MAX_PAGE_ORDER incase of a large swap file. */ - p->zeromap = kvmalloc_array(BITS_TO_LONGS(maxpages), - sizeof(unsigned long), GFP_KERNEL | __GFP_ZERO); + p->zeromap = kvmalloc_array(BITS_TO_LONGS(maxpages), sizeof(long), + GFP_KERNEL | __GFP_ZERO); if (!p->zeromap) { error = -ENOMEM; goto bad_swap_unlock_inode; @@ -3345,6 +3345,7 @@ bad_swap: p->flags = 0; spin_unlock(&swap_lock); vfree(swap_map); + kvfree(p->zeromap); kvfree(cluster_info); if (inced_nr_rotate_swap) atomic_dec(&nr_rotate_swap); _