From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 733D2C7114C for ; Wed, 28 Aug 2024 22:34:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB2AF6B007B; Wed, 28 Aug 2024 18:34:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A63226B0083; Wed, 28 Aug 2024 18:34:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92B316B0085; Wed, 28 Aug 2024 18:34:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 729216B007B for ; Wed, 28 Aug 2024 18:34:56 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E1577120143 for ; Wed, 28 Aug 2024 22:34:55 +0000 (UTC) X-FDA: 82503110550.12.F76DF1C Received: from mail-ua1-f48.google.com (mail-ua1-f48.google.com [209.85.222.48]) by imf04.hostedemail.com (Postfix) with ESMTP id 1797940014 for ; Wed, 28 Aug 2024 22:34:53 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=orFi3+p5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of yosryahmed@google.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724884428; a=rsa-sha256; cv=none; b=aZuO0e+caUldLg+ACkCy/Lb7sJjyEO9VRQLyAzHaSawt0Wx48N5ttvyLWNr/69hOFE/BmY m+lOVEf+LZCMai6hkysrmRG0zVyDKHrio20dkvuVcJwxdMzW/1KBEZe29uByfnw19xwoob U3X6/tbMlavRpT3rTcHJtDp+kBPFlBs= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=orFi3+p5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of yosryahmed@google.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724884428; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=41f5gi45pknPv02YTSCRYyHEn7QAxd1eKb1xj0lbARc=; b=GFWj9YUxvJgRvJ2Q3Br7qkXpZLRYwEvcq2W7baNFo1FjXzrzgtNye/KJaK/ZJ5oOtn13gB fG4jM6wKU9K/apCmWa6nKcbG8qtlUIVxOHkvs8duyxiNREwXSa1nUyhPhhMzfqy4O9HCWR 4KqYWk4aDay+1PIxVHgEJf9YKWHIguU= Received: by mail-ua1-f48.google.com with SMTP id a1e0cc1a2514c-8430fdf256bso31438241.0 for ; Wed, 28 Aug 2024 15:34:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1724884493; x=1725489293; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=41f5gi45pknPv02YTSCRYyHEn7QAxd1eKb1xj0lbARc=; b=orFi3+p5XPmtWvgFfbcOFt/J0F2Vd6LF4dM5R9G3dvVj0hBZDRg0bxcumC8iQA2bF5 Dh2uOVyYaBz9q4FzbAZnyDg92biWhQNMg6ns9DvnV9mJ9zH6sp4OmDHVPvHlNGiMBQ/4 jA0b7PLkXMRVsW5+YnH+Akm+ERYSd61ieo1NkjC1z7UFo2m3edNjxI6/enz9cuDaSFZw fqJN2N3k/a9ytHL5PzoEDdyuDBl71cK+N/5l7K9pvyzu3FiKu0DVR16dQ4z5rAXt9SKh FYQJ3R2NjCqBV3ka9jRT1mqnDgCcIOQ04ATntRABWYkoPEKGCgRNETKfPC70j9MR7S38 t2+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724884493; x=1725489293; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=41f5gi45pknPv02YTSCRYyHEn7QAxd1eKb1xj0lbARc=; b=Qq9xGlvp4Uy08Rnbl8lHK3QW6Gs0LtWo0TgmcKue7pcTw/2iMOGuUJBOIxa5hSYFYn PyBsQjuTQudQXMHevBmBiEqQH+HycM/beyScyoauY8c4Sxf2RUm/VgdnEemEvPy0aqr7 Yh43dqi62LChLObeBdbaSNtnyL1zsxCaS8kbDWLOjJ68jFyuegetChWVH4bSrqDdDk/5 Df6YgmiRn9yeO6EeYDcSeE+/RRyPCU9PRE4QUSroBNq1PrIeKxZQ2+3/YmFPSpYUxabH qwNgKgKTJhlpmU0F8A+rKcqiEKPwWvvs5fLccc+eYTY3vQ+qHqxXuxBbc0I3pHk4efQ3 CPiA== X-Forwarded-Encrypted: i=1; AJvYcCXTJLhrVI08HhT+lFLiSqhCZzzAQfUGuZnZw3jU/0UM61d6a8xr3+HMG//8YJwV1SSp4qdymmAXzA==@kvack.org X-Gm-Message-State: AOJu0YxtCWHT8Lyh50L9GM9colcjHRZq1QykUXHL3amTyTw9+NAbfUcR s3KDnczi1VVXzbia1aPSNrv5Z1Hcs0ZqsZVjhDHKTj54ksu8In9J8DnWNCrPBpBkBiYcJd4IJWz nAlXwpLOlw4X710Fm0kSAzTF4Tprtd6g58VlI X-Google-Smtp-Source: AGHT+IFvmewkWoVr6K28frknzrUzXljJbINQuPS+H2qE0+ZfxUiThxc1/+OfgJyyVnpewvG3hqOFR86H2Buei3/+eGg= X-Received: by 2002:a05:6102:dcb:b0:49a:4992:e1d8 with SMTP id ada2fe7eead31-49a5afa1f6fmr1330145137.31.1724884492830; Wed, 28 Aug 2024 15:34:52 -0700 (PDT) MIME-Version: 1.0 References: <20240819021621.29125-1-kanchana.p.sridhar@intel.com> In-Reply-To: From: Yosry Ahmed Date: Wed, 28 Aug 2024 15:34:13 -0700 Message-ID: Subject: Re: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios To: "Sridhar, Kanchana P" Cc: Nhat Pham , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "ryan.roberts@arm.com" , "Huang, Ying" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1797940014 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: o33e7aer8rb95wimp49rwc7tqa86r88k X-HE-Tag: 1724884493-749621 X-HE-Meta: U2FsdGVkX1/IuqVmOd7uNm/B/gmVDJgYkAyEY3wZLpUD6B/gmTKdSMT2FthDTogAr6K67w51MwdYepN7togw/LmYZBX54CnSdaAqtmmivd7B4ApdpyT9i+WlfZ2rNYpMNTL2Ik7z5H/EPPjRJFIf5317GSf+8cG/I0tB13MW54oe7O/OFgSLsbrAB/+jCfUduOZMuJpPbFxub5/xxwfQqlSs0eUVVphrTKjJn5le4HkfVyrOJnkgRaCKTGo9mRBZrXIC4BGJVee/rY4Iy2C51s3Or4xrlbXQZnfcYcZ7RHvCsvOMoFAh/F33RnXTBfEAsmSOJVuK5hEgqIDYnCpdZQ1hVd7dIAzgw9cOegCJb+KRLJe62z8cAVNNKSFuYuyd18EoE1XTtzeB/5Fkl+FNlko8DLkMSOpbOh4bfbZ2634GOiPMJtx/p9WfwRnmA05HvsykILgF3QJQMme6X4aUeYltkTmYAUoufYyc57I3USrXH/vyhpyDteMqDfIN5o7wciil3Gmt/7/hd52su8xd2bQQTJToVqK48rVvBCLfjsjX+5Uj/YRl6BeKu1jAd8vKR3+7wyxJJxDCV5kfiL/SS43oDD2vXvb3nAZDXaKPRniV2XApwy++obbmA4FQ74XmzErZju4uuF0Eg27OAgPdDKzjrOikw7UX/Zvf1gx4PwywnhvUQa8zq8PN6/AsfVYxHT8gbfQWiOWisZc6IQdivZtAD5BIBPPr8OyEzRmuAyGCjFRolfPU2xQM+qdZGn/B4HFa4Hk5gfmai5mgTmGWyX7D29rjOuiHdIUTCKW31z+GDmCb4ecAk1VLr5BPZz4n3RX8rGch8S4/vnLzCWgyjRG82Sf6tfoyGW+AuCCF7vXy2pUOLCNvDKq8yH5/AKz2YjAC3DiopbiIG5cq4a/4czv+XMnasLoXHfs0stgUgIZJ8mPO6ygj/YLyTPNstHNsiN9WaVZs9wRaWQeqaeT ArBbAtYR NL1oiIPB9pEQBpI9jbs1ucIJUSb93bAAKVNhUm4JAlYyKi4VwFYapPEl3JQolStLJrWKe5s9ULetJhXi2kywawjfGAT2ZoGV9UsjLGitFrLHt069hD+AjsJSc/JtmnDC32hnNmDFeDuttbN3hQoIUkzIT7kn3ncWOdmU8u0rqiicSH7/3TBBStvAvQyGCtJTtxg85C6C+khR/0kdgYVMqo+KHWBtkmat9+aahAq3ky9bQZfx1NREmaE07zvcP1OCJ20Y/1sGYgPuF7V1m13g5717rL+eNHGbjEavykt7o4OfUf+QelosWcMiaXtlRTIndtwQ8GsUq+H/lken3qhJiC6eK7Irg11/AQHl7uIY4dLDTH5jqvQLBqARVtZAA6ZbHdJPIZGPUtd5EoLb084TNSmlMlrSrdclVHJNUbd4qZr2qnTI/rEZKy5vz87ctyrkphCMupT4f4WTQwqk8liNOZ5q0UxWsLrZ9e7IrTUaV2QpHC8Hk+53d+ldtovJbUA33rR7VCMX8TBtutjDKeWWedXPIeiGD+fmP2gbLbv8Fl98dD7FTcnwG98a3biFKKQfgj7hdX0h6vzwr6wI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Aug 28, 2024 at 11:50=E2=80=AFAM Sridhar, Kanchana P wrote: > > Hi Yosry, > > > -----Original Message----- > > From: Yosry Ahmed > > Sent: Wednesday, August 28, 2024 12:44 AM > > To: Sridhar, Kanchana P > > Cc: Nhat Pham ; linux-kernel@vger.kernel.org; linux- > > mm@kvack.org; hannes@cmpxchg.org; ryan.roberts@arm.com; Huang, Ying > > ; 21cnbao@gmail.com; akpm@linux-foundation.org; > > Zou, Nanhai ; Feghali, Wajdi K > > ; Gopal, Vinodh > > Subject: Re: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios > > > > [..] > > > > > > This shows that in all cases, reclaim_high() is called only from the = return > > > path to user mode after handling a page-fault. > > > > I am sorry I haven't been keeping up with this thread, I don't have a > > lot of capacity right now. > > > > If my understanding is correct, the summary of the problem we are > > observing here is that with high concurrency (70 processes), we > > observe worse system time, worse throughput, and higher memory_high > > events with zswap than SSD swap. This is true (with varying degrees) > > for 4K or mTHP, and with or without charging zswap compressed memory. > > > > Did I get that right? > > Thanks for your review and comments! Yes, this is correct. > > > > > I saw you also mentioned that reclaim latency is directly correlated > > to higher memory_high events. > > That was my observation based on the swap-constrained experiments with 4G= SSD. > With a faster compressor, we allow allocations to proceed quickly, and if= the pages > are not being faulted in, we need more swap slots. This increases the pro= bability of > running out of swap slots with the 4G SSD backing device, which, as the d= ata in v4 > shows, causes memcg_swap_fail events, that drive folios to be resident in= memory > (triggering memcg_high breaches as allocations proceed even without zswap= cgroup > charging). > > Things change when the experiments are run in a situation where there is = abundant > swap space and when the default behavior of zswap compressed data being c= harged > to the cgroup is enabled, as in the data with 176GiB ZRAM as ZSWAP's back= ing > swapfile posted in v5. Now, the critical path to workload performance cha= nges to > concurrent reclaims in response to memcg_high events due to allocation an= d zswap > usage. We see a lesser increase in swapout activity (as compared to the s= wap-constrained > experiments in v4), and compress latency seems to become the bottleneck. = Each > individual process's throughput/sys time degrades mainly as a function of= compress > latency. Anyway, these were some of my learnings from these experiments. = Please > do let me know if there are other insights/analysis I could be missing. > > > > > Is it possible that with SSD swap, because we wait for IO during > > reclaim, this gives a chance for other processes to allocate and free > > the memory they need. While with zswap because everything is > > synchronous, all processes are trying to allocate their memory at the > > same time resulting in higher reclaim rates? > > > > IOW, maybe with zswap all the processes try to allocate their memory > > at the same time, so the total amount of memory needed at any given > > instance is much higher than memory.high, so we keep producing > > memory_high events and reclaiming. If 70 processes all require 1G at > > the same time, then we need 70G of memory at once, we will keep > > thrashing pages in/out of zswap. > > > > While with SSD swap, due to the waits imposed by IO, the allocations > > are more spread out and more serialized, and the amount of memory > > needed at any given instance is lower; resulting in less reclaim > > activity and ultimately faster overall execution? > > This is a very interesting hypothesis, that is along the lines of the > "slower compressor" essentially causing allocation stalls (and buffering = us from > the swap slots unavailability effect) observation I gathered from the 4G = SSD > experiments. I think this is a possibility. > > > > > Could you please describe what the processes are doing? Are they > > allocating memory and holding on to it, or immediately freeing it? > > I have been using the vm-scalability usemem workload for these experiment= s. > Thanks Ying for suggesting I use this workload! > > I am running usemem with these config options: usemem --init-time -w -O -= n 70 1g. > This forks 70 processes, each of which does the following: > > 1) Allocates 1G mmap virtual memory with MAP_ANONYMOUS, read/write permis= sions. > 2) Steps through and accesses each 8 bytes chunk of memory in the mmap-ed= region, and: > 2.a) Writes the index of that chunk to the (unsigned long *) memory a= t that index. > 3) Generates statistics on throughput. > > There is an "munmap()" after step (2.a) that I have commented out because= I wanted to > see how much cold memory resides in the zswap zpool after the workload ex= its. Interestingly, > this was 0 for 64K mTHP, but of the order of several hundreds of MB for 2= M THP. Does the process exit immediately after step (3)? The memory will be unmapped and freed once the process exits anyway, so removing an unmap that immediately precedes the process exiting should have no effect. I wonder how this changes if the processes sleep and keep the memory mapped for a while, to force the situation where all the memory is needed at the same time on SSD as well as zswap. This could make the playing field more even and force the same thrashing to happen on SSD for a more fair comparison. It's not a fix, if very fast reclaim with zswap ends up causing more problems perhaps we need to tweak the throttling of memory.high or something.