From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A5C5E77188 for ; Mon, 13 Jan 2025 03:16:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D830C6B0085; Sun, 12 Jan 2025 22:16:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D33626B0088; Sun, 12 Jan 2025 22:16:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD3906B0089; Sun, 12 Jan 2025 22:16:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9E41D6B0085 for ; Sun, 12 Jan 2025 22:16:30 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4C9AC161A8F for ; Mon, 13 Jan 2025 03:16:30 +0000 (UTC) X-FDA: 83000965740.09.0BAB4FE Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf24.hostedemail.com (Postfix) with ESMTP id 5512F18000A for ; Mon, 13 Jan 2025 03:16:28 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dEkLCzOz; spf=pass (imf24.hostedemail.com: domain of chuanhuahan@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=chuanhuahan@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736738188; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gz91bFUiKY3qFvsXep8xoGfoLvFi57ao+djff8mryb0=; b=DWbXf47pMC8ptG7lFSp0ofq5cfr3mWdE/QSJM3EkdhGqWZgwwJkfoJyD1U9LtajhhGtoEU y66dALtLRBIlDd1voK1A8zHIUHUMVJCYSKBoLo8Y7LV6gVi/m2xc+/g2LAPgsIz9szXKTc SWEOhpwr66MR7JqQzCbBlqBd+TZOrjc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736738188; a=rsa-sha256; cv=none; b=Faqal1fA0DNst+u98rBfLfYzXrd9Mvufu3muNAWUbKM3xO35U3XaBfcXFMfsG38LsLA4Gp zx5JFrDctKM/2eo9wftMws1ikW0trgFhVa3Pi9dVNvDHQ0lar1N8kptF6uolL0qvDv61JM gTIM1kakb90N+IkD7Y6q+KVFVDT/fYY= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dEkLCzOz; spf=pass (imf24.hostedemail.com: domain of chuanhuahan@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=chuanhuahan@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5d9b6b034easo2763660a12.3 for ; Sun, 12 Jan 2025 19:16:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736738187; x=1737342987; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gz91bFUiKY3qFvsXep8xoGfoLvFi57ao+djff8mryb0=; b=dEkLCzOzYLV0k09/h3XLBsAFEkZYkFubuO1qpsZAafrPwbS4T1qLBP+I8u7AMgWCxg OG7SZ1gNy9nd63nGQXYfDdn0+ozBuenvGGYRVYSiFsprdzqulaM7+G5Lbu0lBtPCk2/N FAGbpGIfwAGKmd36VG1Bpi1rla5V90VgjEIyVplEQHOJASv8fXkwWm4VtKsm1uWSwnKL aUA+4NaqRGPOwSFs8KjdbzmMr4yBBRaZJHVtOLOIsZABokOclCYV7ZlEYgXl2Dj9J5z9 mTMJqSm1+J0ri+8wghAyxejKdtveSOCvephZt+95JI40dOEmSKCpHLIlCMhRPHXiPIOS IZzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736738187; x=1737342987; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gz91bFUiKY3qFvsXep8xoGfoLvFi57ao+djff8mryb0=; b=XqmiQvCiNIMp+aGdbBCq1bJQ+K7j6g7cdbmu6FKrqL6x8/jONnzqDu1plDdRUcnJlH pSITm/OjjYZWDZwqtiXDud0vaAiZKPBEXVN8x0rTMukxmcU7OVFTEnl1NsbS8fOendQM OLcJ/L5GA/Tn4n4syvZImrJp35GLrdwh7/K5I9gZflDyt69lYTC/432AswqlNUXDCH3t 4fgA8Y1nOe/vlDmUUNA5rAiq9DLQ0k1h0sN3rKqp3aRa8dnfyhkLpNNg9oG+2nQn7uZg BrUQpBl2wwhHr9bSBxGmymsQcgJ2yu8MM79O97NmzQzmiVfl8aKQ7Aaqx8B5Ke0B70ag 1WoA== X-Forwarded-Encrypted: i=1; AJvYcCVAD6ll4b+vrtArRFMQNn8iX0FEPT3axxIYmxL9hw0OhJiT+7Zq3FepLfxm/08+PTGUXmvbEXjSLg==@kvack.org X-Gm-Message-State: AOJu0YwrmINBbH/6u6W0Ok3Ek+6QjJDrC+UzEA/DylrdIXsAnM5YwnTL 1X8jLDWzxYPJ9qtYJ/cv3bbfBMmWroXraSNep6tVaNglZE665Qvd+XvFENrlaHuuxLH288QQSld j45ZKFCWAhTFuId8zz2JK+TCZOBA= X-Gm-Gg: ASbGncu9XRaobe24KKaIaZvwSREfIGkncxRBG45YIxlxpckfcSFhZ+WqptgUBKKO7q4 pF3D6Uz9x5RCvsHXAMgMsQnDn7aBYLvhZxOApXMVMkzqQL1aA78A= X-Google-Smtp-Source: AGHT+IHROUdUfKrk2TYPgQDcQt+2JjIM4pk+84CQMR5dl15n6AYA5aS01mNX541Nw1OaBRyPmuWUkYl4B60GiYXzcuA= X-Received: by 2002:a17:907:3e9b:b0:aae:d199:6eae with SMTP id a640c23a62f3a-ab2ab6a38bbmr1605309066b.14.1736738186628; Sun, 12 Jan 2025 19:16:26 -0800 (PST) MIME-Version: 1.0 References: <58716200-fd10-4487-aed3-607a10e9fdd0@gmail.com> In-Reply-To: <58716200-fd10-4487-aed3-607a10e9fdd0@gmail.com> From: Chuanhua Han Date: Mon, 13 Jan 2025 11:16:15 +0800 X-Gm-Features: AbW1kvY52cNCo6zT9C1JhqcZYYNbwjLp8xyW-JhQvSxSm-SjUeyYlNarRR5HDaU Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Large folio (z)swapin To: Usama Arif Cc: lsf-pc@lists.linux-foundation.org, Linux Memory Management List , Johannes Weiner , Barry Song <21cnbao@gmail.com>, Yosry Ahmed , Shakeel Butt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 5512F18000A X-Stat-Signature: a4r97chydudfr5bxxhqohtrajw88fjaw X-Rspam-User: X-HE-Tag: 1736738188-553845 X-HE-Meta: U2FsdGVkX1+7A4p34O3kZjzb4ROV7ZyDARz4A+baSrtq96eAXUbGIBRF3Bd054s3BOpw7ZeaZXVkuEzZuB6EokmCsYdK1A1+IuOCPsc0MKYe2CRCbNcrihXwGXiv+v8Rw0zJxTLdMR9vBI6nfkj+bgGazU3S+yao5ws9Xplv1QSvhcNEifHnxI4tQxk96z43EFDogmUm6O6pXasN4ciAkfMlIi9P4ORL4eMd4pzOBolA3ygvHf8BdQEDJis1/c1DVQiRZNZ9pKPU0RV26OV6dOV2G626XoNKiKV2/8+6zMLtm7Ra4tZwTWhyzwxbpzOvBeEV5s779OHgIJWat4M0m3RebO1YEbmYjkuNn53vMZ1rhJgWJI87eRnZXuBgQhcOG/YgHztSW9oS+qW8zcMulM75I2Oa15S2aZ7cDq8WXE6M6Qn5Nq3HYZdyV3p710CUINsW/vEvLg4Ss0KtyvMHlzeBxX3kI3PjAY5CsoRo8YcCdEB117UfaTBChjz9vZtY/25GS2qe5QIEsTKAUpE8ylXbS1k6YB2Fz3CmBzDZyT+wwn7Zrn7DHzNJ3bdZZ3od6w0qjDTvdF6QWDb698qkppLE56jB0FIObr12fcRxEdmkfmzGK8J4ucGZLtvHXM2iJ1HgIrU/xo1np3LBpAV/fUaPUUdiYT/MauyIsSLoP1pZluCjrwBVewTDgST1/YKZRz5Mk6bDmAdMYx5ZkqasT7hXSNDp4cc6h4MD6mR2zqAR4T9Y3/ex/XXwZ0t4C0SyT6YP3obPrfSNj36+qxI2O9qlxhrqppV/fnmJxAFUj0zVc7ZWSCg4CZXaf5grvNzCkVG260EawWiyFLl+rk8miMfjNOY5cjHqzzXia4+QQBWHfhsGPXd8wFsJyhMsia9sbSA1rYoX+U8EgXlqKs49XYozEr/J6Q8i/TCJJrYNWIATfUwIt3k63WLMsPsJ1qgwxY0DYClbjjC2VgnIiBi t2MYVwid tmL/Ose22cmEHFjMKmG731oYPN5/lO9IEGFgkwcnTBFD2OaaLM7WnLLqiUqdEmvO8eoDk2d42bYwXdjO5Nm/M+t6NC+mx9leI42m+iFQ3WumfNfBPYr+sax8jM/MPzsPyTi4WY8EFOmDFosTcJju07FH8br/KjS/y152Tnh9sq92xFg+DIWireu6pPjnd/+UrNCyeCZ1lcz0dw4C1pMRMk9LUI2w6ZSa44htmVI70XQgGgkB5CmRZUXj6YxhRU1LB7b2d1xIBSxDx5YxVBOMSo8IAZmM9mlKUfAK3THCXU8HZjsepWfowq24DuAcYHl34qF6FZHL50pVPLCWdx4n3ILzoDOtobD47zwSExTTserLiQAHpRzyeX0+hL9OhA2e82BKmcN0vXYgQf/qwjEjl+u1c8jgWYtIf8ovD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I am also interested in this topic. Please include me in the discussion too. I'll try to attend, at least remotely =EF=BC=9A=EF=BC=89 On Fri, 10 Jan 2025 at 04:06, Usama Arif wrote: > > I would like to propose a session to discuss the work going on > around large folio swapin, whether its traditional swap or > zswap or zram. > > Large folios have obvious advantages that have been discussed before > like fewer page faults, batched PTE and rmap manipulation, reduced > lru list, TLB coalescing (for arm64 and amd). > However, swapping in large folios has its own drawbacks like higher > swap thrashing. > I had initially sent a RFC of zswapin of large folios in [1] > but it causes a regression due to swap thrashing in kernel > build time, which I am confident is happening with zram large > folio swapin as well (which is merged in kernel). > > Some of the points we could discuss in the session: > > - What is the right (preferably open source) benchmark to test for > swapin of large folios? kernel build time in limited > memory cgroup shows a regression, microbenchmarks show a massive > improvement, maybe there are benchmarks where TLB misses is > a big factor and show an improvement. > > - We could have something like > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/swapin_enabled > to enable/disable swapin but its going to be difficult to tune, might > have different optimum values based on workloads and are likely to be > left at their default values. Is there some dynamic way to decide when > to swapin large folios and when to fallback to smaller folios? > swapin_readahead swapcache path which only supports 4K folios atm has a > read ahead window based on hits, however readahead is a folio flag and > not a page flag, so this method can't be used as once a large folio > is swapped in, we won't get a fault and subsequent hits on other > pages of the large folio won't be recorded. > > - For zswap and zram, it might be that doing larger block compression/ > decompression might offset the regression from swap thrashing, but it > brings about its own issues. For e.g. once a large folio is swapped > out, it could fail to swapin as a large folio and fallback > to 4K, resulting in redundant decompressions. > This will also mean swapin of large folios from traditional swap > isn't something we should proceed with? > > - Should we even support large folio swapin? You often have high swap > activity when the system/cgroup is close to running out of memory, at thi= s > point, maybe the best way forward is to just swapin 4K pages and let > khugepaged [2], [3] collapse them if the surrounding pages are swapped in > as well. > > [1] https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@gma= il.com/ > [2] https://lore.kernel.org/all/20250108233128.14484-1-npache@redhat.com/ > [3] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@arm.com/ > > Thanks, > Usama > --=20 Thanks, Chuanhua