From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F78AE77197 for ; Thu, 9 Jan 2025 20:06:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D1336B00AE; Thu, 9 Jan 2025 15:06:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 37FF86B00B0; Thu, 9 Jan 2025 15:06:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2210A6B00B1; Thu, 9 Jan 2025 15:06:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 00DA46B00AE for ; Thu, 9 Jan 2025 15:06:49 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6227012047F for ; Thu, 9 Jan 2025 20:06:49 +0000 (UTC) X-FDA: 82988996538.11.03401F6 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) by imf11.hostedemail.com (Postfix) with ESMTP id 6F0CD40013 for ; Thu, 9 Jan 2025 20:06:47 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ab9MmaqP; spf=pass (imf11.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736453207; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=BR2OLOIUNW+WkSXvpVxcsTrqQqdZdg5SR0TGuemrstk=; b=K+hdCX+lbHomTIa0HzPWB46UjKQj7Szz9jTwYQ8/zPur/qNfaBxCQoRkfSJeCPDfWu7lKV f4zm01CNMz8eNVgP2ulTktvn7XIdJGc9D5l5ipXJVhAGWf6AigHC4/kXiEvCMrs3J4nno2 BscfbXQlfxlNA2DELweTm+KKSsM1VrU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736453207; a=rsa-sha256; cv=none; b=kZRFDVHDe59HKgZ9xdK134hIU9GETWhPDJMVk//RHtXz7zAXLm9AOXzlX/nbnNzBggg6DM 5q9+s2inBn6ytTbFYs7hoXd3a6Ognouvxhqd1kBk8/mezVQxwL5+NrjNe0MeadRYWdHBLx MVhPghsSg4UKpKkU8VNXielZYXMfJOA= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ab9MmaqP; spf=pass (imf11.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-436202dd730so10408295e9.2 for ; Thu, 09 Jan 2025 12:06:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736453206; x=1737058006; darn=kvack.org; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=BR2OLOIUNW+WkSXvpVxcsTrqQqdZdg5SR0TGuemrstk=; b=ab9MmaqPIL3rOfYAgDcTnf+w1+dwzeu+yU/MpQFOIHL3q1S2DHi869kRaR5JkG7khz haTSJydKSI64YPR4wCOX+EFHnGv8FFsEee/3UzKcUyHcRFEol/7bpkHdbamrwQF+G6LL reUUUIYAo51MS/+ZPHxPk4z9x2W2WdZ4QnRZl1HgWxnNzqin5BJr1GrUEythn+qgSpt3 HPdYUgdsFQUcwe8FAhOBL57S8c2laJumdA3SqyzxAmIqQt8dHpbxBvIxXTZv7IDLi6Fo lqppGpg2J8oHkQFc6d9NqyRVNNPksmaJpsO2BvYWfFv4y2TO1EjGj6mhPGLK9LyPlqJ6 Y/Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736453206; x=1737058006; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=BR2OLOIUNW+WkSXvpVxcsTrqQqdZdg5SR0TGuemrstk=; b=o72sbdMNiCzdEXQax5/TYt4FGlYtS5nzsEeYyM9eLBLZ/W2QomhT36X/Vn8AV3oh8w YTqpjY68w5vyMhermffhtZZBPSqR+qlZN0EASw6BCcvoxrgejWcC2fWTMatq58rHLXb1 uJhbK+tJKA4jqOMVeXBxc4cv2CSyzpEkUJLxr2iUGtFyPZs0eOwtu2rccaglgXTmdGuC mH6xRXcxvWyFv7f7h93+zix5LoFfmXZgrgkT/BEre1hgotVoezmM7whKnG9isBE6C6wW JPX9J/KdeeNV1ETX17ctO7mAO+O7DrSqzUIjZ0ggvxCw94r9mp7LhfWqy6bnyQpK8xxS USMQ== X-Forwarded-Encrypted: i=1; AJvYcCVvDnzxQ2mlvwnSMF0rqXaz456c8FipBdE8sfbjLRUcCVUq8hxumgKdwtPS/0y27OiQ40DgsSUBgQ==@kvack.org X-Gm-Message-State: AOJu0YyZ/lk91IwIh7kecgvKKh5NvBOs4T6eyNVL/MQduw/6TcEuImU/ 9iWk8c+uVlcYa2Q6qsGTlGQuLRIgzoKuWv/hOusIL2G9lySryqQt X-Gm-Gg: ASbGncuZA+IVe18f/H/BVPt+OaFnLqdbtnToML9Jv1CT+ig8HlNwhoytDR80R5bnbwQ QwKJoe8sTUzz8A08ms1yZZM2UFsQROHElKTRuMZYoYoOLm0I1ct2Z8BmA6Vej4po3lzsdHCof5J nwlCOI9O8L2n9E85I3iwGsgXT/BArdbtOv4Q4y1nQO7ZdNIWjBBYABHL1aNWLEmv3LHwJffH5zd FqRDMn3eLqMEyBw7yHtBWlbVshbp9DLj63oYr8TXXnMJV+TFxRzPLGD6C99NJZ/AKxbFuk1jPXI SzKGpAMjid6npPMIQ6QnNGC52BGKopXhBmywyL3Nh5C9VVlzLw== X-Google-Smtp-Source: AGHT+IFiP1Yrbpt5E4S+rVgWyuYo3kpqdABpRtUsAa1JVf2kmTmo2N8dwA4mZc5vH3/W2WO1zOIkzg== X-Received: by 2002:a05:600c:3584:b0:434:9dfe:20e6 with SMTP id 5b1f17b1804b1-436e26f47efmr19530445e9.23.1736453205626; Thu, 09 Jan 2025 12:06:45 -0800 (PST) Received: from ?IPV6:2a02:6b67:d752:5f00:c46:86ac:45ea:7590? ([2a02:6b67:d752:5f00:c46:86ac:45ea:7590]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38a8e4c1dc6sm2639306f8f.96.2025.01.09.12.06.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 09 Jan 2025 12:06:45 -0800 (PST) Message-ID: <58716200-fd10-4487-aed3-607a10e9fdd0@gmail.com> Date: Thu, 9 Jan 2025 20:06:43 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: lsf-pc@lists.linux-foundation.org, Linux Memory Management List Cc: Johannes Weiner , Barry Song <21cnbao@gmail.com>, Yosry Ahmed , Shakeel Butt From: Usama Arif Subject: [LSF/MM/BPF TOPIC] Large folio (z)swapin Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 6F0CD40013 X-Stat-Signature: cunm3brej7eq9bwwnrfh19guim647qwm X-Rspam-User: X-HE-Tag: 1736453207-313009 X-HE-Meta: U2FsdGVkX18/P1tAPdRGaZZQ4T0eAuDb6ZJny9kh7T4oUIF/GxRgOzTF5ZLhlv+IjHGOtuxaHpgCZfGaWTbkxtjyjNWsF+2i8k+epfo23sOPUlVHfJDVwNkD7T2DEi/ZpbrM3b0wx2szKmKaw59hQ9F71oCPqP3n9QeAil0K+w/ouhMVJ7ApdJH1YVz1skD4oLUyCEfo2ZORKbZmqZZSZL3jxrFXaUO6MjvNAYvu3FXmNk/jkRqQfcOB77REAYtYYKp816s2gB/hub2U7Wt3+7dmn1iLUrMv4vhSF9FNO7AUaGLmhwSZQ60v4bPheYJHbdcjG5ULx8j2pWoEerTg1qgyRxQHvh8I9P6qRGNM85zOVy93vYcswPMvYaJywxFTgKKPIGOVZVi/KtfkPoIan37rZI+7eoSpMzQexEAMqK7NqnBkHPignGpDaVmi3Q8THi33rHwCWRO7B186qn0Vdl62LoqEQ99l/W+D/5dPbbw9AMrXaab1BrbCv3e0BAox7sK1LGmBofr1xLGtLs7hHGss7fwiJP3rFZCFWxVapWeM6RUnd49riA+loyuY92S7rBh4YjGqsLQWnDU2opJaDrbGJXPlArgs5cBHLJvvHdNxEHlSskZblz0mSuxP2haSjgbcRsyhVt3kE5zfW+6D22x2fEKwCPEgDnCdy5a0WS6/q6SIMgPGy3LKlgzYn93ncCy2vRiveVqygej4uFiLvEZu6jyOchQvE+rmZ57weizFEhBZPhCTegnkcPR5FUqxGenc9vbc9qKNKOZGZsddp67KixFLKGqOEWYZDCDj/GR3BHEs1BZILHVTJD5FsxJDivjYBXtmfrYVu3KnheeHWBw7fdE/iKJj/1MIRWYu2BPjtCGzlEcyu/D+CoFN1+kdA1/KpIHI1uNJMwSJxbEKm+e9tReVisPVPnTu1A53XR1MeFipS18KMhCehgTUQOcDScV3PxaUjbNlze5HoiR t5padwo1 PPq+Y+fUxki9JQCOP3noXoYXSC2M3HD4U89dZOXf1HixdCenAVdxrXlReT7YpYYnnYV0i9B/PMKXQGooJv5hVe3NiNuORbDMq56wWNTiARBZtLO+G4AtZ1TeiMe+SUG2gd136DU2MM2UVnQarT05qfdVcjwWQUkDvvrfZZCWeMHD/zuN5buStK3VXGclq4rA5ul/q4meOBuE0FhgULbEY7bysw3Fv01s8PuwLuAoFVcvMUQjiFiTBUF92R7ZpAajkBXY8NCxlf5xw8jgvFitzXn+lKuLpWJBhK9b2rC0H8XNr7Fgg5EAbnkrvc10RwtMd1Gw4dWIW3ThJWUu64PHDUtZbuVWCIzW//9bD5zAY+xtTObuYiAcUUgZX6yh1iUvYMFFp3ST+oRJRpBThdXozU6VBsB7XIAEwLyP3yRNnnCaOpCzCxw36MLBkwMZ3Wyk4sP9vZRpFS5yJrp95Yod6Wtd5R0N/EFtJUrKjzG8NEgrc6o6vlljhFsnzYYMLo+1nNh2qXZym2VqHeuXr7mDzlsb6eQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I would like to propose a session to discuss the work going on around large folio swapin, whether its traditional swap or zswap or zram. Large folios have obvious advantages that have been discussed before like fewer page faults, batched PTE and rmap manipulation, reduced lru list, TLB coalescing (for arm64 and amd). However, swapping in large folios has its own drawbacks like higher swap thrashing. I had initially sent a RFC of zswapin of large folios in [1] but it causes a regression due to swap thrashing in kernel build time, which I am confident is happening with zram large folio swapin as well (which is merged in kernel). Some of the points we could discuss in the session: - What is the right (preferably open source) benchmark to test for swapin of large folios? kernel build time in limited memory cgroup shows a regression, microbenchmarks show a massive improvement, maybe there are benchmarks where TLB misses is a big factor and show an improvement. - We could have something like /sys/kernel/mm/transparent_hugepage/hugepages-*kB/swapin_enabled to enable/disable swapin but its going to be difficult to tune, might have different optimum values based on workloads and are likely to be left at their default values. Is there some dynamic way to decide when to swapin large folios and when to fallback to smaller folios? swapin_readahead swapcache path which only supports 4K folios atm has a read ahead window based on hits, however readahead is a folio flag and not a page flag, so this method can't be used as once a large folio is swapped in, we won't get a fault and subsequent hits on other pages of the large folio won't be recorded. - For zswap and zram, it might be that doing larger block compression/ decompression might offset the regression from swap thrashing, but it brings about its own issues. For e.g. once a large folio is swapped out, it could fail to swapin as a large folio and fallback to 4K, resulting in redundant decompressions. This will also mean swapin of large folios from traditional swap isn't something we should proceed with? - Should we even support large folio swapin? You often have high swap activity when the system/cgroup is close to running out of memory, at this point, maybe the best way forward is to just swapin 4K pages and let khugepaged [2], [3] collapse them if the surrounding pages are swapped in as well. [1] https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@gmail.com/ [2] https://lore.kernel.org/all/20250108233128.14484-1-npache@redhat.com/ [3] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@arm.com/ Thanks, Usama