From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B95AFD0E6EE for ; Mon, 21 Oct 2024 10:44:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B7826B007B; Mon, 21 Oct 2024 06:44:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 366E16B0082; Mon, 21 Oct 2024 06:44:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22E426B0083; Mon, 21 Oct 2024 06:44:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 054E66B007B for ; Mon, 21 Oct 2024 06:44:30 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7940A161838 for ; Mon, 21 Oct 2024 10:44:13 +0000 (UTC) X-FDA: 82697275206.18.B1B513C Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) by imf17.hostedemail.com (Postfix) with ESMTP id 3B29A40020 for ; Mon, 21 Oct 2024 10:44:16 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Cxp4+fOX; spf=pass (imf17.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729507392; a=rsa-sha256; cv=none; b=hXwUEwS0N1h+mz7SGkvDf6jgTXhw3IJKD+LsguZkUMrz6jdMYw11r8MGDWkxWfo0RVfkf3 2oiUAWlM00olUAF41vt7GiGRzoOFpG1aAZccMOuFluf/i8iUZU6xTfKf9M02ScqGCelvbW R/rbESkdnMBFTb8k066K5bPjL0qjYHA= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Cxp4+fOX; spf=pass (imf17.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729507392; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2YjHc/MhGgRsnDDylmRAlAOK5o68LWuyNGszJ2NKgCU=; b=mi+5FLPrgMzxrFr6FNNBBrUn5OjfOGH002PzCXpsv+FN2wbCvZYILRhlK5wdUAj1p2a3LJ XJR2mgjAOq65QXgV6a1X7V9TObxZXVDTzo8Or3aT+Msu063CkwVbfNeEqLYJ6tgZ3lWkUm PKb6Xu8jrsvvDgnbKf+SHspUFDWcxa4= Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-37d4b0943c7so3176732f8f.1 for ; Mon, 21 Oct 2024 03:44:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729507467; x=1730112267; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=2YjHc/MhGgRsnDDylmRAlAOK5o68LWuyNGszJ2NKgCU=; b=Cxp4+fOXydgFHtYp3+r1A9ki8CSpa0Zk+lkpDERYP4L7NKhF/J2Q3So3Jt5WFJmxiK 79YwersbzjQ+2r86QYJQuCXktIBynaO4Iq2sHq+xwyTX5T8XoQmkMCb+dHZIIrgaPuiN JogLJLGKAf/e4KS4EjyOgpfRoCnwhU3H5CBAg62dYS5hgQ/B7OH1U4Xy8CbC33m2cA2a lBpPnX54uI3NpqZamz+4RqQ4pse7E5vgnmtupi9/iUcCtXcNUWwIEaDV8K8DaQGJ+T0n 18MSD7ldO0sDAXCi7sMEn0CwsO9Nrhq8/1L9KzuRt5QWTGpgk7yMW00ksvUppTA3uOPF Rfaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729507467; x=1730112267; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2YjHc/MhGgRsnDDylmRAlAOK5o68LWuyNGszJ2NKgCU=; b=U3npZXkf4WKzFE9ZQpRJI5syu6O8GDnTQtkie4R4HHEmzNd46/qleCJpZdFrkx5/yU Llb2x9q84kQE9KXvJ9iiD6XsOppDBzY4u42JdXaqUE+micyOfg7LRxcr6u3CQ73FGm38 hfXO0xLKLVQNKSP8lE8dUU1QemsotWaoNDApl9fe77pLRTxtfjyhs9/Dh0UPJQPGdKrz SNX0vje5uOXOxXd9eywiYW5cI1zJcVJ/PaFgdqh9O86l8161wNZuPSr8uApnr2IfvRov /VkWmHuV5kxXzUauYttenmxrVJK7enJGDIV4OdF0WVGjtRRRD3Zk3IN/6eYOXAGnmI1t doMA== X-Forwarded-Encrypted: i=1; AJvYcCWuA9mk2B3l1wbaQyPa5/PT1OgYlPE11IOjqWr3GnD2byWN1Q73BQ14rR8HWYSGGqQ4Ok0LsJ0XGQ==@kvack.org X-Gm-Message-State: AOJu0YxjrX9gWy+DJhJiZFkKd/LoiyGPp+7VCAom81eX6sa11G0FOWAM ixJLRoyBiPxyBtKo/LCL6tk03ig6oE6B54WZtj4FmVrly24gmP5H X-Google-Smtp-Source: AGHT+IHPTofyAuIjaaxVNDH15uhPKU3Wv587rTdWyPLXQFd/z5ndZagFBChY25BrMPxCqVMt4KOugQ== X-Received: by 2002:a05:6000:1544:b0:37d:5047:76d8 with SMTP id ffacd0b85a97d-37ebd3c36demr8342275f8f.59.1729507466803; Mon, 21 Oct 2024 03:44:26 -0700 (PDT) Received: from ?IPV6:2a02:6b67:d751:7400:c2b:f323:d172:e42a? ([2a02:6b67:d751:7400:c2b:f323:d172:e42a]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-37ee0a64daasm4005514f8f.64.2024.10.21.03.44.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 21 Oct 2024 03:44:26 -0700 (PDT) Message-ID: Date: Mon, 21 Oct 2024 11:44:26 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 3/4] mm/zswap: add support for large folio zswapin To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, david@redhat.com, willy@infradead.org, kanchana.p.sridhar@intel.com, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, riel@surriel.com, shakeel.butt@linux.dev, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org References: <20241018105026.2521366-1-usamaarif642@gmail.com> <20241018105026.2521366-4-usamaarif642@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: beoza4if9kje83gjbcikdu11z4m8hci5 X-Rspamd-Queue-Id: 3B29A40020 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1729507456-572192 X-HE-Meta: U2FsdGVkX18AxwHcbBuiWZOq+Sy5LjIPWfJrdiP/qtuA7PuZ1YP/vnbUeUHhL5aXfX3+s7ah/chbyAz45lYTiVkfzBABfArezAEqAjG9r2GUs14HgJSBDb5fSVA8WucRsm8uhaHg8FNCV92q6B20MS3hwJxH3SwZiwZt183nYB4r1m3l3j2HvfJRb64hNJhavESslNEuq+SA+e9oNzAU9CJGITrDcqThlE2J45GohJvzZyDdEc13y0HmTidZ8/xUUa2J3yqa+IDLkt8jmIyRsazrgXSrHpA+aQuYxTs2tVWS1Jorf41B6IRLFWF7C9yhcNH6nQg1+a6BqqlaSKg++1rdHEAOh5l3+KiT+nRZA6tzFUu4XwxSNQ+xewWY39mEcqYn7w/+BZnzPop4InjFnnWyrlx4p3NgWjuMH5mOZ8CO80PFSVBJ8BQnorgW62T47y6Dx49ZxF5zsy6yWUG1WuLqLOkdpPfTHhyjJ1NdiJj7h7+f3u1ku31cXqb937M0oSAGJZb81A7x0Ffk3sIN1w6Pd0CADLYzNrmrtxOa7Nlijr2QVkBnRpDWZwg+gQTmf6dQrjGXyE64b5pxr90/LmTYDmidwoNYvEuhL+KPLoIb+kTmkEcUsfXlL0TDzyO4z6JjhF1uQ9AefL5NRedp1sIb1TSU+yXxwCyBQcndkUOKTgTkffapQwn9eFTheVCA17S6VB6FaWCx4oqzhsZh4f6gUQarJiNqQNFb01/hT9QExyW9MrFPw0Qd1Y/oQ/b+BM3k9KMgPvVdo0vCTlcnYwi/yhbILshJXzYlvmo8W3Iy8bFBVsmcbUUrEvkXozMqMOx592BUVRz1/WrKzeicfBKmjpwH4eK9W+006rwPDAiB2WGmkoRO7axeOLsIjHQ4p7LyfB6FpbC7r1UiD0y5Na/P57HlH/1qF9NEF1Vi0lggrmVMr7WAfils8TwvmGe1Q7Pv4a7sUBqyMTB9kk4 XQMTe8ng FxmwAk5dDtL2mBllNMlXHimsh3bLqRDa0C1WIuLpF0nH2ULiI8ew3CkmDZyw0kxR5sqIeI4HfvAG5IDGHMafRUGwMCL4ZrEdAGj+se6wkPpwClTk4EeI8tVFt+PyIWjvCknopxBveidnlygmUQ+lrsJBVMGlWDYVNEeygunsgHMvnVEbhtLaP+i7/uT3ZBgpvGukyjzEbKDbF0JM+KP77Ju2keAGlhu+XCu+sUTV27gEfWu6TwlStqZXWmG4UMzBUfa2mSYzcHvNgfe9qwrJJCWt5p3dc1hJQlQE56aCfz183JZ5FgBmVYQkwj3D5eucshKWG+CsM355XabOUXe46ztyMGCq7SJcwNHSpv7Ck+/gC6+QOELmIGQ8pMKL+uKl7hmyymmS2QXNpyWImdY/f6Mm6PIGDpi7qyUrRup8SSMOjnBMzSeBe6Cy9+lEyrt880Rxw1S5/sKEeiNGWTeZusbJkTWLf4VrIvcDozlsIFL9B9kY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 21/10/2024 06:49, Barry Song wrote: > On Fri, Oct 18, 2024 at 11:50 PM Usama Arif wrote: >> >> At time of folio allocation, alloc_swap_folio checks if the entire >> folio is in zswap to determine folio order. >> During swap_read_folio, zswap_load will check if the entire folio >> is in zswap, and if it is, it will iterate through the pages in >> folio and decompress them. >> This will mean the benefits of large folios (fewer page faults, batched >> PTE and rmap manipulation, reduced lru list, TLB coalescing (for arm64 >> and amd) are not lost at swap out when using zswap. >> This patch does not add support for hybrid backends (i.e. folios >> partly present swap and zswap). >> >> Signed-off-by: Usama Arif >> --- >> mm/memory.c | 13 +++------- >> mm/zswap.c | 68 ++++++++++++++++++++++++----------------------------- >> 2 files changed, 34 insertions(+), 47 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 49d243131169..75f7b9f5fb32 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -4077,13 +4077,14 @@ static bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) >> >> /* >> * swap_read_folio() can't handle the case a large folio is hybridly >> - * from different backends. And they are likely corner cases. Similar >> - * things might be added once zswap support large folios. >> + * from different backends. And they are likely corner cases. >> */ >> if (unlikely(swap_zeromap_batch(entry, nr_pages, NULL) != nr_pages)) >> return false; >> if (unlikely(non_swapcache_batch(entry, nr_pages) != nr_pages)) >> return false; >> + if (unlikely(!zswap_present_test(entry, nr_pages))) >> + return false; >> >> return true; >> } >> @@ -4130,14 +4131,6 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) >> if (unlikely(userfaultfd_armed(vma))) >> goto fallback; >> >> - /* >> - * A large swapped out folio could be partially or fully in zswap. We >> - * lack handling for such cases, so fallback to swapping in order-0 >> - * folio. >> - */ >> - if (!zswap_never_enabled()) >> - goto fallback; >> - >> entry = pte_to_swp_entry(vmf->orig_pte); >> /* >> * Get a list of all the (large) orders below PMD_ORDER that are enabled >> diff --git a/mm/zswap.c b/mm/zswap.c >> index 9cc91ae31116..a5aa86c24060 100644 >> --- a/mm/zswap.c >> +++ b/mm/zswap.c >> @@ -1624,59 +1624,53 @@ bool zswap_present_test(swp_entry_t swp, int nr_pages) >> >> bool zswap_load(struct folio *folio) >> { >> + int nr_pages = folio_nr_pages(folio); >> swp_entry_t swp = folio->swap; >> + unsigned int type = swp_type(swp); >> pgoff_t offset = swp_offset(swp); >> bool swapcache = folio_test_swapcache(folio); >> - struct xarray *tree = swap_zswap_tree(swp); >> + struct xarray *tree; >> struct zswap_entry *entry; >> + int i; >> >> VM_WARN_ON_ONCE(!folio_test_locked(folio)); >> >> if (zswap_never_enabled()) >> return false; >> >> - /* >> - * Large folios should not be swapped in while zswap is being used, as >> - * they are not properly handled. Zswap does not properly load large >> - * folios, and a large folio may only be partially in zswap. >> - * >> - * Return true without marking the folio uptodate so that an IO error is >> - * emitted (e.g. do_swap_page() will sigbus). >> - */ >> - if (WARN_ON_ONCE(folio_test_large(folio))) >> - return true; >> - >> - /* >> - * When reading into the swapcache, invalidate our entry. The >> - * swapcache can be the authoritative owner of the page and >> - * its mappings, and the pressure that results from having two >> - * in-memory copies outweighs any benefits of caching the >> - * compression work. >> - * >> - * (Most swapins go through the swapcache. The notable >> - * exception is the singleton fault on SWP_SYNCHRONOUS_IO >> - * files, which reads into a private page and may free it if >> - * the fault fails. We remain the primary owner of the entry.) >> - */ >> - if (swapcache) >> - entry = xa_erase(tree, offset); >> - else >> - entry = xa_load(tree, offset); >> - >> - if (!entry) >> + if (!zswap_present_test(folio->swap, nr_pages)) >> return false; > > Hi Usama, > > Is there any chance that zswap_present_test() returns true > in do_swap_page() but false in zswap_load()? If that’s > possible, could we be missing something? For example, > could it be that zswap has been partially released (with > part of it still present) during an mTHP swap-in? > > If this happens with an mTHP, my understanding is that > we shouldn't proceed with reading corrupted data from the > disk backend. > If its not swapcache, the zswap entry is not deleted so I think it should be ok? We can check over here if the entire folio is in zswap, and if not, return true without marking the folio uptodate to give an error. >> >> - zswap_decompress(entry, &folio->page); >> + for (i = 0; i < nr_pages; ++i) { >> + tree = swap_zswap_tree(swp_entry(type, offset + i)); >> + /* >> + * When reading into the swapcache, invalidate our entry. The >> + * swapcache can be the authoritative owner of the page and >> + * its mappings, and the pressure that results from having two >> + * in-memory copies outweighs any benefits of caching the >> + * compression work. >> + * >> + * (Swapins with swap count > 1 go through the swapcache. >> + * For swap count == 1, the swapcache is skipped and we >> + * remain the primary owner of the entry.) >> + */ >> + if (swapcache) >> + entry = xa_erase(tree, offset + i); >> + else >> + entry = xa_load(tree, offset + i); >> >> - count_vm_event(ZSWPIN); >> - if (entry->objcg) >> - count_objcg_events(entry->objcg, ZSWPIN, 1); >> + zswap_decompress(entry, folio_page(folio, i)); >> >> - if (swapcache) { >> - zswap_entry_free(entry); >> - folio_mark_dirty(folio); >> + if (entry->objcg) >> + count_objcg_events(entry->objcg, ZSWPIN, 1); >> + if (swapcache) >> + zswap_entry_free(entry); >> } >> >> + count_vm_events(ZSWPIN, nr_pages); >> + if (swapcache) >> + folio_mark_dirty(folio); >> + >> folio_mark_uptodate(folio); >> return true; >> } >> -- >> 2.43.5 >> > > Thanks > barry