From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92BF7C02181 for ; Mon, 20 Jan 2025 16:27:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E54E36B0083; Mon, 20 Jan 2025 11:27:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E06726B0088; Mon, 20 Jan 2025 11:27:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCD406B0089; Mon, 20 Jan 2025 11:27:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AFA6F6B0083 for ; Mon, 20 Jan 2025 11:27:41 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5CFC01401FA for ; Mon, 20 Jan 2025 16:27:41 +0000 (UTC) X-FDA: 83028361122.06.5775A10 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf13.hostedemail.com (Postfix) with ESMTP id 2897320015 for ; Mon, 20 Jan 2025 16:27:38 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737390459; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TXhoBdDK0B+s8TpeEr1rkWjjx4g8OpRG6QaHzuSi+mg=; b=gOaw/Op+A8Tzots51gsworgF07itpUlmirmz4MbbzAr87unfq1Rev2zn+6jBexkLWRASjM lx/KyW6C8BF5ZRznLjRVvs8he+5aDis9Vpb+tbJrLSmjVkKphCfk7NOB62kzWox85ADnsw iA6nY28DCYHFJjx1Sg5HXlOp91+PCw4= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737390459; a=rsa-sha256; cv=none; b=zSaLtNHVGMLmLA+fPQFxuKT0N7sfXOJ5BfKEc11tKN4EeGf91DbfGpdSNhkZccnbknljb9 Tb+uTbQmyHg23k4sh7k7Ps/aQKyGN+nJ/YQsb9qir1BdrnWrFIMMkem9HP6upDZoE/rH7z SZO6+04dfGy09C6Fl57qEZOd0wyVPsU= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A4D281063; Mon, 20 Jan 2025 08:28:06 -0800 (PST) Received: from [10.57.80.131] (unknown [10.57.80.131]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 11C9C3F5A1; Mon, 20 Jan 2025 08:27:32 -0800 (PST) Message-ID: <9bf875ad-3e31-464d-bccd-7c737a2c53bc@arm.com> Date: Mon, 20 Jan 2025 16:27:31 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 00/11] khugepaged: mTHP support Content-Language: en-GB To: David Hildenbrand , Nico Pache Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org References: <20250108233128.14484-1-npache@redhat.com> <40a65c5e-af98-45f9-a254-7e054b44dc95@arm.com> <37375ace-5601-4d6c-9dac-d1c8268698e9@redhat.com> <0a318ea8-7836-405a-a033-f073efdc958f@arm.com> <8305ddf7-1ada-4a75-a2c3-385b530b25d4@redhat.com> From: Ryan Roberts In-Reply-To: <8305ddf7-1ada-4a75-a2c3-385b530b25d4@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2897320015 X-Stat-Signature: 4zm9cfqpe5uxnwda9nifazsqm5n864jr X-Rspam-User: X-HE-Tag: 1737390458-764794 X-HE-Meta: U2FsdGVkX18L+xufbq5xB6KWDtMchZVIARNZG22DKFca2ozX3uqFG9LmA5hQDZKVQIzRk0Q09pzoUnj+YA9tx3d5Mu0b4rYwSXjQPlZrbnpNUdB7tD5u6hbWswrX/D+CocFXKFKjps6w46jd0QcACrcUMdyUCowo8b8NNES0keTenCPWjIn2ddzaOQQBgM+mJxaRCd6Tvy8AXm+aJTgY+z+vNGkxDe/kaNqbKYbQ7NVGH7NxdSuHUIojUwRfLetVm6G3Z6w5VbLMOmtbRbyh6f+2x5KMI1NLT7BqpuYNzQ+2voDJdT9KaddV/3Ug8Wnc27SQ40dZX/Tj5rnm7gFvaKy3tReEZqGKUjAqpC71PPTov29q+1AczwvL22Bk/GXTrpGM+UFxHv4WuHVMB8BMCGP9oA3tqq23TX2/M2MScWm7VustXf2YOEr9rMv8harkPBFSHdhkJarvOrXHG9ZXYHu4H/6HieQMnxDLRpHyfNfA6z+KNC7uT+KSPT93wYe/SxvRAovcdq+AvPk7LCxFtU0+vrnSu5+dV3jq9F04nOOa7nR496YD037AMdHsSpNwYGmrM+K2eq0hFirx7qx5Fnu3ePsrWTUCfQqzOReAdtKlqAN/L4y+NEgdSr080dxIWVRoJ0NIIrtsb/Th+qRDyditkRAFmwm+wJke/L4vp0tK/eVmpudhThJCKdlRns0V7oxUPEaum3Pw6piglfz1rflb30+DKwZooeEqJMclZVeDqqxQXgAYAcTCBQdpEtRP8LLTitE1oketuvieugX9ZCHPqIyfgmHT6no7bxXJ5UPnE1ubhka4wkkWaPMn8aNKiDqf5bMaMTY/k1EhfgtZDXM570EK/gLzPeDWWbyUqrhokey/fYQlsIU4SYzjA/PeHTm6VDzfcsgbPrw8GuGj/UFSk1CmDDUUhb2+2vU7ZvZzK66jtOP8Vc42ZbBhaSynRWl5xGeRdeIbJ2YybzI YaaJpzKj yOGP/i7XxPhr8SHRVnn/kRz+jTRD20Fm5UuqiQFrMMJKZPpmLiYV9SbBD+psRD4zpgx8hVZ6qvbSQ3+3Lso7xy19Jmy5ZOSHRekxM5m+zc19jcaza6uWKHrhVEx13/VCuLFkCv9xzWW/hGKWcF+i4d172V2zU7z55YsJPah29mSryT8wpemQRZH9BOP90WDFZV3spdM9G++aGIuBSyL4KIfdAWg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 20/01/2025 13:56, David Hildenbrand wrote: > On 20.01.25 14:37, Ryan Roberts wrote: >> On 20/01/2025 12:54, David Hildenbrand wrote: >>>>> I think the 1 problem that emerged during review of Dev's series, which we >>>>> don't >>>>> have a proper solution to yet, is the issue of "creep", where regions can be >>>>> collapsed to progressively higher orders through iterative scans. At each >>>>> collapse, the required thresholds (e.g. max_ptes_none) are met, and the >>>>> collapse >>>>> effectively adds more non-none ptes so the next scan will then collapse to >>>>> even >>>>> higher order. Does your solution suffer from this (theoretical/edge case) >>>>> issue? >>>>> If not, how did you solve? >>>> >>>> Yes sadly it suffers from the same issue. bringing max_ptes_none much >>>> lower as a default would "help". >>> >>> Can we just keep it simple and only support max_ptes_none = 511 ("pagefault >>> behavior" -- PMD_NR_PAGES - 1) or max_ptes_none = 0 ("deferred behavior") and >>> document that the other weird configurations will make mTHP skip, because "weird >>> and unexpetced" ? :) nit: Rather than values of max_ptes_none other than 0 and max making mTHP skip, perhaps it's better to say we round to closest of 0 and max? >>> >> >> That sounds like a great simplification in principle! > > And certainly a much easier to start with :) > > If we ever get the request to support something else, maybe that's also where we > can learn *why*, and what we would actually want to do with mTHP. > >> We would need to consider >> the swap and shared tunables too though. Perhaps we can pull a similar trick >> with those? > > Swapped and shared are a bit more challenging, because they are set to "/ 2" or > "/ 8" heuristics. > > > One simple starting point here is of course to say "when collapsing mTHP, all > have to be unshared and all have to be swapped in", so to essentially ignore > both tunables (in a memory friendly way, as if they are set to 0) for mTHP > collapse and worry about that later, when really required. For swap, if we assume we start with the whole VMA swapped out, I think setting max_ptes_swap to 0 could still cause the "creep" problem if faulting pages back in sequentially? I guess that's creep due to faulting pattern though, so at least it's not due to collapse. Doesn't feel ideal though. I'm not sure what the semantic of "shared" is? I'm guessing it's specifically for private COWed pages, and khugepaged will trigger the COW on collapse? So again depending on the pattern of writes we could still end up with creep in a similar way to swap? > > Two alternatives I discussed with Nico for these (not sure which is implemented > here) is to calculate it proportionally to the folio order we are collapsing: You're only listing one option here... what's the other one you discussed? > > Assuming max_ptes_swap = 64 (PMD: 512 PTEs) and we are collapsing a 1 MiB mTHP > (256 PTEs), 32 PTEs would be allowed to be swapped out. Yeah this is exactly what Dev's version is doing at the moment. But that's the behaviour that leads to the "creep" problem. Thanks, Ryan >