From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FD52E784AC for ; Mon, 2 Oct 2023 12:59:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 40EAA6B016A; Mon, 2 Oct 2023 08:59:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3BF296B016C; Mon, 2 Oct 2023 08:59:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 260246B016D; Mon, 2 Oct 2023 08:59:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 137266B016A for ; Mon, 2 Oct 2023 08:59:08 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D5BEB8024E for ; Mon, 2 Oct 2023 12:59:07 +0000 (UTC) X-FDA: 81300526734.19.A7AF496 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 966F71C001F for ; Mon, 2 Oct 2023 12:59:05 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="PC/O7Jqs"; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696251545; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tuG3su0IXX03onkWfkEU5/AUY6A79JdBWuvZ2Wv+z4U=; b=2AK/bAfWNrLxhGSnpCtfOPFcEv8tTvoMl+JCHcpfIXGXKGOpw8ooIun6sjZiSLTC9w7VDq D7hHYY5VjVRiop3Xvppoc6vM9G3qVXIFE3MQf/2sziHvbWgf18pyZajT2QkZZLmajhLYm+ mGAYvBTK50Wz3MP3Z6qnx9EJVzBpHKk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696251545; a=rsa-sha256; cv=none; b=oGUFHTqK81EYW1lD39nY1AM/BjpUyroXVh4uZfhyZnXq0QOrdVgV33e1qAExEI6mXZeIta OehmeU65HwNN44O4d7pG0AWE8006z6LEl2l6dqleb4GvnCRAaQC7YWN/rcLQOTwqymD35Q /x13bSEjtbiOV9uQnaYz67uat1Jeti8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="PC/O7Jqs"; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696251544; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tuG3su0IXX03onkWfkEU5/AUY6A79JdBWuvZ2Wv+z4U=; b=PC/O7Jqsq2ki6y83TggKhFaEWfAjmvpgnaBl1vDGbKAv0oFBqxw8WPvWRv5CgisUkXwQ3b S7uPBmkyk/CuaBBmaHQihZO99pp0RE+mcSIy+MPHRcbchrKptDZPDsGtA7uyTw7A2tcijq qbWd0FbuJX8nLjFmOXSPjukKJGsYjj8= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-607-NsNHuuxIOkOkV3JSDnWM4g-1; Mon, 02 Oct 2023 08:59:01 -0400 X-MC-Unique: NsNHuuxIOkOkV3JSDnWM4g-1 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-3fd0fa4d08cso135617815e9.1 for ; Mon, 02 Oct 2023 05:59:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696251540; x=1696856340; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tuG3su0IXX03onkWfkEU5/AUY6A79JdBWuvZ2Wv+z4U=; b=QXkIsuFVzS+UYSl7wyUOrHrcg6/05+UDCMFTENM58alzD3zyd2/UIFg9rxwrmTMCWp TTdPsWe7+Gl40jBkSh4JrnuPMRWaN7MZTa7RVCyXXgQEUr61Ug7d4mtflVRYyWQy+/Uo ZNyYNRTf3rQlz/fg07UbbFSdagF/AWF+gI2Fv+ZDYPA30oKZGswZ5sgE6E/CFLuCvyMF uWLrrJbNbmgkHE31xoRa0Mm82Od3Snvm/pzhNssZ9Tt+R7rIeil3GiKeQgxuWzuXTwc3 C5FO3XftfWoTilInGHVxFe9nVV9gWjqBr+zI2sX5eDgYq+iPxb6l/FuM2Dh2fmu9K01B /DUQ== X-Gm-Message-State: AOJu0YznzpLpvpVNTmba2axGAvc7qCmFn1NKxhCG5F8K+9i54ik9oFBq AXZQIuKFXYcQRrtbnanWsB7PYrrLpm7oZ4z8692wpptDJA6vRjTbuyhtOIeC0fmo+RxCWWtKbsp Dllr/jGXAFME= X-Received: by 2002:a1c:6a09:0:b0:401:b76d:3b8b with SMTP id f9-20020a1c6a09000000b00401b76d3b8bmr10189824wmc.16.1696251540549; Mon, 02 Oct 2023 05:59:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEr0zb+AJ0EXCRArhaXOuluibUJdnGOc1BwjppI5sPj4yjRCtsYuieaPWRQnXIKxG2uca7tLA== X-Received: by 2002:a1c:6a09:0:b0:401:b76d:3b8b with SMTP id f9-20020a1c6a09000000b00401b76d3b8bmr10189798wmc.16.1696251539920; Mon, 02 Oct 2023 05:58:59 -0700 (PDT) Received: from ?IPV6:2003:cb:c735:f200:cb49:cb8f:88fc:9446? (p200300cbc735f200cb49cb8f88fc9446.dip0.t-ipconnect.de. [2003:cb:c735:f200:cb49:cb8f:88fc:9446]) by smtp.gmail.com with ESMTPSA id x17-20020a5d6511000000b0031fd849e797sm28083704wru.105.2023.10.02.05.58.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 02 Oct 2023 05:58:59 -0700 (PDT) Message-ID: Date: Mon, 2 Oct 2023 14:58:58 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 To: Ryan Roberts , John Hubbard , Matthew Wilcox , Yang Shi , "Yin, Fengwei" , Yu Zhao , Zi Yan , David Rientjes , Andrew Morton , Vlastimil Babka , "Kirill A. Shutemov" , Hugh Dickins Cc: Linux-MM References: <4966f496-9f71-460c-b2ab-8661384ce626@arm.com> <4830fb3e-4a35-4842-98f4-9e7baa0e692a@arm.com> <7301771f-d654-4e5a-a197-3a3d8750440c@nvidia.com> <92937776-1e16-47e5-bef9-4c1a04bc98c0@arm.com> <5fa4aa95-6982-7879-e067-69fdb8b76d01@redhat.com> <1b03f4d6-634d-4786-81a0-5a104799b125@arm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: ANON_LARGE_FOLIOS meeting follow-up & refined proposal In-Reply-To: <1b03f4d6-634d-4786-81a0-5a104799b125@arm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 966F71C001F X-Rspam-User: X-Stat-Signature: r58gtzwy4e9q6fcrr893tmdbrbbzet6f X-Rspamd-Server: rspam03 X-HE-Tag: 1696251545-48448 X-HE-Meta: U2FsdGVkX1/bErnXs3k3mqimmxiS7Qujb2TSkGg1XnHuM9kxUFVDkIi0AeOfBLz5nfvNjixX3E56VbSoeVCnTKbRtvxPqRvYa8Ido7BBmJoGQcPfDYqx9j0XF00p51CpL01aBs8RGjYrT1oKyhjMwTlvDc31NuqT88GtITyOK86yVBQmhVjL04F/F+TF0ODP0suI8OBOI4AX8FHNv6A9onnKixBlTgfQJg9VY/EhpV87Rak0KKyzifjz+Y6cvNTXxniWdn0Cj5MslO4agUlqh7KZ1aACCbzCYbWEkq2nQvKwzds/cjg/kmqjk4rDZUt6VBlNv+SKyfqlt8LuoPNK6SHpwIxLeBI6iEx/UmOkSW3AuElUMNfigeOhSRuU4IDItu5+WaWaFRxKeRTeuyYCyNC7+eg8c/6LmLaaJ8qVrY0+fT5ZxqCmBoQ/0Zl1dcxdCxn0Uc871TIeVReNOO8s7IeguE5S5KKLE0RS+DtiGK1NkShlsflz2WO0FiFZcpWHbtO4DnM3K5gtTtw5uoi3RHQP4MiBCnmi6glNQl4O5VKD8VGLCPsgQlI/eN7Z+as2K+/MaeTEJb0TWpIBYWkUwCbdOTAUAMavm1iL+Vqnn6O4iVBg1MsFL2kFlKHV+vpwcHf5PtWpP0OZQW0ket1srd+v93ThMT0V/QIO7Vp4eFTkvzkCGBljXSabyh5TXcTNuOY9O3ekeWJIvWXMDI4wGchz7frWlil7FjkIVYQ0VlN7w39yDEEudXcxAR86udgsKKNJvG+5BDs/UfYoaZk/pHMuyw8MwrQdn7wL5kyuUMdF55EZVp/sPL/YThyoGzy5tlDsZbjw0kO+fICVu43EVktoDI1fJJY1N7Ux5Wx+UEDk+mifuIK8evnTsdcKY51oBTesQTG6mm5w3HA0fGdcbdrw3MXdQ+OkTwOO51OReykltcxifg/pnSV98I+VNY+zlnh+n7CkZm/MpURlLPo lNcCMVvS PAx/0yZ/LCWxFZSyvDdE0bRGJ7zN3FGezG1SmDPT+dMhjFzj7/dcw+oxORf3fiiIP9MA3Bl2NOYQhlsoowXBH03EOEcHaLi77ip0n5fNP3MTO3O/XyjTwBA4Wnz71sHNmmWsyIXvNcG2j/iuvVvM5i7kET5tqfrH/PdkS6uKzN+v4SuORYiJY66il843WsmtTO6QPAuUWWgDOORIqSElDOnLwcWXLFW0SCoux/81x4EKojd3HfCWAqfErs99QJTjrP3AhIK7tp4gfNDCaVI2sY7khxsMqVZlaFWvcsKpfsQs6WaaynD8sDe1i1B7/7qT6COeAAl5KBtFc8bHdAre/Oj/+bpyv4ctMOQs8jnf8veI0HuaPzx19+iWQ0sWWCiuahpfDVRNZMm1uJO413XUAcdPEPhgzCM1v6mxKG+PNK5nCbwVF0qrDhJ77vA+MU0Fvi8koIkZeifqWuDCMuoV33CJUT9auV6LocNdYnq3NEhJiYUM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> Not only that. It was also because we didn't want to confuse users/devs that >> assume that THP == PMD-sized. >> >> I'll CC Hugh, I recall he had an opinion on that (I recall some comments about >> cleanly separating both features towards the user). > > Were those comments made during the first meeting? I don't recall them, but will > go back and watch the video. Unless I am daydreaming, Willy made such comments during a THP cabal meeting, and Hugh somewhere on the list in other context. Maybe they changed their mind or I am making things up (after 4 days of fever dreams I don't know what's real anymore :D ). The biggest concern was that huge really implies PMD (and maybe later PUD) -- and could end up confusing users, stats etc. Maybe it can be handled, I'll have to take a look at your proposal. I was advocating just calling these things THP right from the start (and was using the arguments you are using in this mail :) ), but understood the concerns. Apparently, freebsd wants to call these things "Medium-sized superpages" [1]. Of course, superpages are just huge pages, but everybody has to invent a new term for the same thing [I did not check who had it first ;) ]. Of course, under Windows static (like hugetlb) huge pages are called large-pages. [1] https://www.freebsd.org/status/report-2022-04-2022-06/superpages/ > >> >>> Personally I think my latest proposal is a way to solve that problem, and in >>> that case, I personally think exposing it as an extension to THP is neater: >>> >>>   - all existing THP controls work as they did before >>>   - new anon_orders and anon_always_mask files allow opt-in to >>>     smaller-than-PMD-orders >> >> As "enable" controls anon only (that's correct, right?), maybe these should also >> simply be called "orders" and "always_mask". shmem could get their own set, like >> "shmem_enable". > > Yes, could do it that way. I thought that since "shmem_" was used when shmem was > introduced, it would be clearer to prefix the new anon controls too. Happy to > remove though. > >> >>>   - All exisitng counters remain unchanged, and continue to count PMD-mapped THP >>>     only: >>>        - /proc/meminfo:AnonHugePages >>>        - /sys/devices/system/node/nodeX/meminfo:AnonHugePages >>>        - /proc/vmstat:nr_anon_transparent_hugepages >>>        - /proc//smaps[_roolup]:AnonHugePages >>>        - memory.stat(v1):rss_huge >>>        - memory.stat(v2):anon_thp >>>   - New counters introduced to count PTE-mapped THP/large folios: >>>        - /proc/meminfo:AnonHugePteMap >>>        - /sys/devices/system/node/nodeX/meminfo:AnonHugePteMap >>>        - /proc/vmstat:nr_anon_thp_pte >>>        - /proc//smaps[_roolup]:AnonHugePteMap >>>        - memory.stat(v1):anon_thp_pte >>>        - memory.stat(v2):anon_thp_pte >>>   - It's a lot less code (I have an implementation for both approaches) >>> >>> Admittedly, I haven't spent too much time thinking about the other thp counters >>> in vmstat yet (e.g. thp_fault_alloc, thp_fault_fallback, etc). Proposal is that >>> for now, they would continue to be PMD-order only. But I think you could >>> probably hook those upto the PTE-mapped ones as well, instead of duplicating all >>> the counters. >>> >>> As Kiril mentioned, PTE-mapped THP is already a thing, so this approach just >>> formalises it. >> >> Not quite. PTE-mapped THP were just a side-effect of the transparency handling. >> We never allocated and populated PTE-mapped PMD-sized THP on allocation. So I >> don't immediately see the connection between both for this case. > > I'm just making the point that when they become PTE-mapped, we don't stop > calling them THP. I accept that its not exactly the same though. ... PTE-mappin them does not change their size ;) But I get what you mean. > >> >> Would you account a PTE-mapped (PMD-sized) THP as anon_thp or anon_thp_pte? What >> if it's mapped via PTEs and PMDs? I don't see how that formalises that case for >> the existing PMD-szed THP. > > I account PTE-mapped THP as anon_thp_pte. And if the same folio is mapped both > ways, I account it in both counters. I'm not claiming that anon_thp + > anon_thp_pte = amount of allocated thp in total. anon_thp_pte is intended to > help debug; you can use it to see what percentage of pte-mapped memory is THP. > >> >>> >>> I also think the "huge" means PMD-size argument is a bit weak, given that THP >>> supports PUD-size today for file mappings, and in the context of hugetlb, huge >>> can mean contpte, pmd, contpmd, pud, etc. >> >> I made similar statements in the past but was convinced otherwise :) >> >>> >>> I'll have the patch set ready to post by Friday. How about I post it, then we >>> can continue the conversation in the context of the actual code? If the >>> concensus is that this is not the way to do it, then I'll post the large_folio >>> version instead? >> >> No strong opinion from my side, I considered a "fresh start" without the THP >> implication/thermonology after all the previous discussions cleaner [which I >> think was one of the outcomes of the previous discussions]. >> > > My concern is that the "fresh start" is not as simple as it appears. I've come > to the conclusion that if we have a new interface, then it should really be a > strict superset of THP to make it extensible in future. But that opens questions ^ +1 > about how you configure PMD-sized allocations when both interfaces disagree. For > "enabled" its fairly straightforward; you can do a logical OR. But its less > clear how to handle disagreement over defrag. And then you have huge_zero_page > and khugepaged etc, which might just stay with THP. But eventually we will Probably we want everything that THP had (khugepaged, zeropage, ...) also on some (selected?) smaller orders. > probably want to do async collapse for smaller order folios too, and at that > point you have to duplicate all those controls... So I concluded that actually > it is cleaner to just bolt on a small-order extension to THP. I've updated all > the docs, and that was pretty simple to do, which usually suggests that the > extension is purely additive and shouldn't be confusing. Fine with me. I don't quite like bitmaps exposed to user space, though. Just having a user-readable list or a "directory" with various options as files might be cleaner ... > > Take a look at the patches, then make a judgement ;-) > ... but we'll discuss it there :) -- Cheers, David / dhildenb