From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 941D4C433EF for ; Tue, 12 Jul 2022 17:57:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 341459400C2; Tue, 12 Jul 2022 13:57:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F103940063; Tue, 12 Jul 2022 13:57:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 192BC9400C2; Tue, 12 Jul 2022 13:57:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0CB6F940063 for ; Tue, 12 Jul 2022 13:57:13 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BE6B120FE4 for ; Tue, 12 Jul 2022 17:57:12 +0000 (UTC) X-FDA: 79679204304.28.4E75A58 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf07.hostedemail.com (Postfix) with ESMTP id 564E84003B for ; Tue, 12 Jul 2022 17:57:12 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id o31-20020a17090a0a2200b001ef7bd037bbso9175455pjo.0 for ; Tue, 12 Jul 2022 10:57:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=1mIUwI0XHYyZiqY/mOeOC/6maVCh7qxvHzfc0+pDFM4=; b=EPZibSEZHOUKp6Q8FzqMyCWWgxldNvDmulrRKQjHEM35RD8FZlLvgVU75w3AHpstkp 5S1RkToFVRoN8h1/TTEG+KAwKCtJyEiqs5EH/icz3xSzA2VT0efqc3wWTzGQECDOF8Fk 4EhJ6dOg0sreIklZvkqjhK36dwRYQBGhsskZmD/evO68zp/FixEiDaDB+T4WWH8f+XZ6 qFaHb/8klFwu8rvNHWueJQXoFbTIu2Y9907//Eu3ZBIH4DsGMutIycmAIz2UfhOo6U4q zYCPhV0laa068zBRHYz20Svoy5mOnDQS40IJ/X0YnmzZyBud/+dZMKxsoiw8wkpbLVXX Ycrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=1mIUwI0XHYyZiqY/mOeOC/6maVCh7qxvHzfc0+pDFM4=; b=yjrPzsjTuIJGe6cDVMem/0Q0Nl+bsfWmbED8Q/7GG1TJK2ZEMDUCZHiN8DKoh2TVLk tgVrdQw0vO4nnqEKF8JI6hn/2mXYIySEdakbKum5A6d8G1tIooz7x3ONRjzYH5Y16DKe x7sbL2q6M9KYOYoO9QR2rEdMVXHxbKQuoqInazJk790qChvd2p7TcL+WdIq0ofEN1AFO 5wqkEodhKNaD3EfV7iu7ViDy8aiyw5C9SAWgoZj+Suvsk9ZhSxONh7XWVLhI3TmwaccF c9OaJrQKT90fPOODfv4Gb0Ua9tEKTho3T8WDcdNOgNxZ8MyJUv8Ure0fYKJhfwE2dhRn RkWg== X-Gm-Message-State: AJIora/GL9Kc0+5AmG6PIT36J9SX3U5P3AhC7JkkiIJc/x2GT0tdnKg5 lFKu02nqQ+4oaBhyRBESNdeegw== X-Google-Smtp-Source: AGRyM1sREMpjZXuKIUWPWfAlrolvguOflsTGUraEFTzGQSGBMA4heil3RCbq8h+J3Ej8jrpSEOVswQ== X-Received: by 2002:a17:90a:4e05:b0:1ec:8de4:1dd5 with SMTP id n5-20020a17090a4e0500b001ec8de41dd5mr5694598pjh.242.1657648631006; Tue, 12 Jul 2022 10:57:11 -0700 (PDT) Received: from google.com (55.212.185.35.bc.googleusercontent.com. [35.185.212.55]) by smtp.gmail.com with ESMTPSA id q17-20020aa79831000000b0052ad5ceb624sm3858938pfl.74.2022.07.12.10.57.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 10:57:10 -0700 (PDT) Date: Tue, 12 Jul 2022 10:57:07 -0700 From: Zach O'Keefe To: Yang Shi Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , James Houghton Subject: Re: [mm-unstable v7 13/18] proc/smaps: add PMDMappable field to smaps Message-ID: References: <20220706235936.2197195-1-zokeefe@google.com> <20220706235936.2197195-14-zokeefe@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EPZibSEZ; spf=pass (imf07.hostedemail.com: domain of zokeefe@google.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657648632; a=rsa-sha256; cv=none; b=BKvaYfNcL9dYYmp9tUCxOSSE4eOl68S/FoAIDgBSeFTZ9wOe19MX5gIglV/8n2yevctDo4 RyA2tnxMZJe7aTZ7vBO8KeEuWleGIfpKdBvhzQNH6ZSqaxJm5fYIsKaWvFgo/3qyJQYk8H CIzpLqeqCR2CBdG/aCnQdpS5XeJtI/g= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657648632; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1mIUwI0XHYyZiqY/mOeOC/6maVCh7qxvHzfc0+pDFM4=; b=Kq9HM1qEAq96cOI/pLKUBeO1DRBm7nTIvgUSxEDPjY6Blx/AhBi0vhI+tWHOIUo2LQGmo2 6ulcjKI9b0blu4I1Y3ULmdcTjUIUz2b3YmtT72SbIrovVjh55Wn0RER4szHYO8r4VgHhV1 A3C0Pxcfb7GcpYJL/m/eEMD15FI3mRU= X-Rspamd-Queue-Id: 564E84003B Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EPZibSEZ; spf=pass (imf07.hostedemail.com: domain of zokeefe@google.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: keca5rntaeg1gmuiqt5q4o5mq7p5pmkk X-HE-Tag: 1657648632-614216 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Jul 12 10:27, Yang Shi wrote: > On Tue, Jul 12, 2022 at 9:31 AM Zach O'Keefe wrote: > > > > On Jul 11 14:37, Yang Shi wrote: > > > On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe wrote: > > > > > > > > Add PMDMappable field to smaps output which informs the user if memory > > > > in the VMA can be PMD-mapped by MADV_COLLAPSE. > > > > > > > > The distinction from THPeligible is needed for two reasons: > > > > > > > > 1) For THP, MADV_COLLAPSE is not coupled to THP sysfs controls, which > > > > THPeligible reports. > > > > > > > > 2) PMDMappable can also be used in HugeTLB fine-granularity mappings, > > > > which are independent from THP. > > > > > > Could you please elaborate the usecase? The user checks this hint > > > before calling MADV_COLLAPSE? Is it really necessary? > > > > > > And, TBH it sounds confusing and we don't have to maintain both > > > THPeligible and PMDMappable. We could just relax THPeligible to make > > > it return 1 even though THP is disabled by sysfs but MADV_COLLAPSE > > > could collapse it if such hint is useful. > > > > > > > Hey Yang, > > > > Thanks for taking the time to review this series again, and thanks for > > challenging this. > > > > TLDR: "Is it really necessary" - at the moment, no, probably not .. but I think > > it's "useful". > > > > Rationale: > > > > 1. IMO, I thought was was confusing seeing: > > > > ... > > AnonHugePages: 2048 kB > > ShmemPmdMapped: 0 kB > > FilePmdMapped: 0 kB > > Shared_Hugetlb: 0 kB > > Private_Hugetlb: 0 kB > > Swap: 0 kB > > SwapPss: 0 kB > > Locked: 0 kB > > THPeligible: 0 > > ... > > > > Maybe this could simply be clarified in the docs though. I guess we can already > > get: > > > > ... > > AnonHugePages: 0 kB > > ShmemPmdMapped: 0 kB > > FilePmdMapped: 2048 kB > > Shared_Hugetlb: 0 kB > > Private_Hugetlb: 0 kB > > Swap: 0 kB > > SwapPss: 0 kB > > Locked: 0 kB > > THPeligible: 0 > > ... > > > > today[1], so perhaps it's not a big deal. > > Not only that, if you have file PMD mapped then turn the THP sysfs > flag off, you get the same result. It is just a hint and just shows > the status at that moment when reading smaps. > Very good point. > > > > > > 2. It was useful for debugging - similar to rationale for including > > THPeligible1[2], the logic for determining if a VMA is eligible is pretty > > complicated. I.e. is this file mapped suitably? Unlike THPeligible, however, > > madvise(2) has the ability to set errno on failure to help* diagnose why some > > memory isn't being backed. > > I don't disagree it would help for debugging. But as a user who > doesn't know too much about kernel internals, when I see THPeligible > and PMDmappable, I would get confused TBH. And do we have to maintain > another similar hint? Maybe not. > > > > > 3. For the immediately-envisioned usecases, the user "knows" about what memory > > they are acting on. However, eventually we'd like to experiment with moving THP > > utilization policy to userspace. Here, it would be useful if the userspace agent > > managing was made aware of what memory it should be managing. I don't have a > > working prototype of what this would like yet, however. > > It is not a strong justification to add some user visible stuff for a > future feature (not even prototyped) since things may change, it > sounds safer to add such things once the usecase is solid TBH. > Ya, this was a weaker point for inclusion *now* TBH. > > > > 4. I thought it was neat that this field could be reused for HugeTLB > > fine-granularity mappings - but TBH I'm not sure how useful it'd be there. > > > > I figured relaxing existing THPeligible could break existing users / tests, and > > it'd be likewise confusing for them to see THPeligible: 1, but then have faults > > fail and they'd then have to go check sysfs settings and vma flags ; we'd be > > back in pre-commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each > > vma"). > > I'm not sure what applications rely on this hint, but if they are just > some test scripts, I think it should be fine. I don't think we > guarantee the test scripts won't get broken. AFAIK some test scripts > rely on the kernel dmesg text, for example, OOMs. And the meaning of > the fields do change, for example, inactive anon of /proc/meminfo, > which was changed by the patchset which put anon pages on inactive > list first instead of active list. We already noticed the abnormal > value from our monitoring tool when we adopted 5.10+ kernel. And > /proc/vmstat also had some fields renamed, for example, > workingset_refault of /proc/vmstat, it was split to > workseting_refault_anon and workingset_refault_file, so we had to > update our monitoring scripts accordingly. I think /proc/meminfo and > /proc/vmstat are more heavily used than smaps. > Thanks for the great context. My guess is, right now, THPelligible is more useful as-is than if we were to relax it to MADV_COLLAPSE eligibility. As such, I'm fine dropping this until a stronger and more immediate usecase presents itself. Thanks for checking my rationale here. Best, Zach > > > > Thanks, > > Zach > > > > [1] https://lore.kernel.org/linux-mm/YrxbQGiwml24APCx@google.com/ > > > > > > > > > > > > > > > Signed-off-by: Zach O'Keefe > > > > --- > > > > Documentation/filesystems/proc.rst | 10 ++++++++-- > > > > fs/proc/task_mmu.c | 2 ++ > > > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > > > > index 47e95dbc820d..f207903a57a5 100644 > > > > --- a/Documentation/filesystems/proc.rst > > > > +++ b/Documentation/filesystems/proc.rst > > > > @@ -466,6 +466,7 @@ Memory Area, or VMA) there is a series of lines such as the following:: > > > > MMUPageSize: 4 kB > > > > Locked: 0 kB > > > > THPeligible: 0 > > > > + PMDMappable: 0 > > > > VmFlags: rd ex mr mw me dw > > > > > > > > The first of these lines shows the same information as is displayed for the > > > > @@ -518,9 +519,14 @@ replaced by copy-on-write) part of the underlying shmem object out on swap. > > > > does not take into account swapped out page of underlying shmem objects. > > > > "Locked" indicates whether the mapping is locked in memory or not. > > > > > > > > +"PMDMappable" indicates if the memory can be mapped by PMDs - 1 if true, 0 > > > > +otherwise. It just shows the current status. Note that this is memory > > > > +operable on explicitly by MADV_COLLAPSE. > > > > + > > > > "THPeligible" indicates whether the mapping is eligible for allocating THP > > > > -pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise. > > > > -It just shows the current status. > > > > +pages by the kernel, as well as the THP is PMD mappable or not - 1 if true, 0 > > > > +otherwise. It just shows the current status. Note this is memory the kernel can > > > > +transparently provide as THPs. > > > > > > > > "VmFlags" field deserves a separate description. This member represents the > > > > kernel flags associated with the particular virtual memory area in two letter > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > > index f8cd58846a28..29f2089456ba 100644 > > > > --- a/fs/proc/task_mmu.c > > > > +++ b/fs/proc/task_mmu.c > > > > @@ -867,6 +867,8 @@ static int show_smap(struct seq_file *m, void *v) > > > > > > > > seq_printf(m, "THPeligible: %d\n", > > > > hugepage_vma_check(vma, vma->vm_flags, true, false, true)); > > > > + seq_printf(m, "PMDMappable: %d\n", > > > > + hugepage_vma_check(vma, vma->vm_flags, true, false, false)); > > > > > > > > if (arch_pkeys_enabled()) > > > > seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); > > > > -- > > > > 2.37.0.rc0.161.g10f37bed90-goog > > > >