From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2FC6C433EF for ; Thu, 12 May 2022 15:53:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7C7F6B0074; Thu, 12 May 2022 11:53:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E2AFB6B0075; Thu, 12 May 2022 11:53:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF2596B0078; Thu, 12 May 2022 11:53:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C15E16B0074 for ; Thu, 12 May 2022 11:53:35 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2535F30F49 for ; Thu, 12 May 2022 15:53:35 +0000 (UTC) X-FDA: 79457535990.13.0428A0F Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf25.hostedemail.com (Postfix) with ESMTP id 8FE8FA00AF for ; Thu, 12 May 2022 15:53:14 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R681e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=34;SR=0;TI=SMTPD_---0VD.QluQ_1652370797; Received: from 30.30.107.56(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0VD.QluQ_1652370797) by smtp.aliyun-inc.com(127.0.0.1); Thu, 12 May 2022 23:53:26 +0800 Message-ID: Date: Thu, 12 May 2022 23:53:16 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Thunderbird/101.0 Subject: Re: [PATCH v5 10/13] mm/madvise: add MADV_COLLAPSE to process_madvise() Content-Language: en-US To: Zach O'Keefe Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org, Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , calling@linux.alibaba.com References: <20220504214437.2850685-1-zokeefe@google.com> <20220504214437.2850685-11-zokeefe@google.com> <502a3ced-f3c6-7117-3b24-d80d204d66ee@linux.alibaba.com> From: Rongwei Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: g9uwf1ix76jnwmrz5fyofybm6qf4p4aj Authentication-Results: imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8FE8FA00AF X-HE-Tag: 1652370794-859315 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/11/22 11:34 PM, Zach O'Keefe wrote: > Hey Rongwei, > > Thanks for taking the time to review! > > On Tue, May 10, 2022 at 5:49 PM Rongwei Wang > wrote: >> >> Hi, Zach >> >> Thanks for your great patchset! >> Recently, We also try to collapse THP in this way, likes performance >> degradation due to using too much hugepages in our scenes. >> >> And there is a doubt about process_madvise(MADV_COLLAPSE) when we test >> this patchset:. It seems that process_madvise(MADV_COLLAPSE) rely on >> madvise(MADV_HUGEPAGE)? If the vma wasn't marked with 'hg', >> process_madvise(MADV_COLLAPSE) will fail to collapse. And if I miss >> something, please let me know. >> > > I tried to have MADV_COLLAPSE follow the same THP eligibility > semantics as khugepaged and at-fault: either THP=always, or > THP=madvise and the vma is marked with MADV_HUGEPAGE, as you point > out. > > If I understand you correctly, the usefulness of > process_madvise(MADV_COLLAPSE) is limited in the case where > THP=madvise and a CAP_SYS_ADMIN user is requesting a collapse of > behalf of another process since they don't have a way to mark the > target memory as eligible (which requires VM_HUGEPAGE). Hi Zach Yes, that's my means. It seems that MADV_[NO]HUGEPAGE for process_madivse(2) just needs little codes. Anyway, looking forward to your next patchset. Thanks -wrw > > If so, I think that's a valid point, and your suggestion below of a > supporting MADV_[NO]HUGEPAGE for process_madvise(2) makes sense. For > the sake of exploring all options, I'll mention that there was also a > previous idea suggested by Yang Shi where MADV_COLLAPSE could also set > VM_HUGEPAGE[1]. > > Since it's possible supporting MADV_[NO]HUGEPAGE for > process_madivse(2) has applications outside a subsequent > MADV_COLLAPSE, and since I don't see process_madvise(MADV_COLLAPSE) to > be in a hot path, I'd vote in favor of your suggestion and include > process_madvise(MADV_[NO]HUGEPAGE) support in v6 unless others object. > > Thanks again for your review and your suggestion! > Zach > > [1] https://lore.kernel.org/linux-mm/CAHbLzkqLRBd6u3qn=KqpOhRcPZtpGXbTXLUjK1z=4d_dQ06Pvw@mail.gmail.com/ > >> If so, how about introducing process_madvise(MADV_HUGEPAGE) or >> process_madvise(MADV_NOHUGEPAGE)? The former helps to mark the target >> vma with 'hg', and the collapse process can be finished completely with >> the help of other processes. the latter could let some special vma avoid >> collapsing when setting 'THP=always'. >> >> Best regards, >> -wrw >> >> On 5/5/22 5:44 AM, Zach O'Keefe wrote: >>> Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has >>> CAP_SYS_ADMIN or is requesting collapse of it's own memory. >>> >>> Signed-off-by: Zach O'Keefe >>> --- >>> mm/madvise.c | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/madvise.c b/mm/madvise.c >>> index 638517952bd2..08c11217025a 100644 >>> --- a/mm/madvise.c >>> +++ b/mm/madvise.c >>> @@ -1168,13 +1168,15 @@ madvise_behavior_valid(int behavior) >>> } >>> >>> static bool >>> -process_madvise_behavior_valid(int behavior) >>> +process_madvise_behavior_valid(int behavior, struct task_struct *task) >>> { >>> switch (behavior) { >>> case MADV_COLD: >>> case MADV_PAGEOUT: >>> case MADV_WILLNEED: >>> return true; >>> + case MADV_COLLAPSE: >>> + return task == current || capable(CAP_SYS_ADMIN); >>> default: >>> return false; >>> } >>> @@ -1452,7 +1454,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, >>> goto free_iov; >>> } >>> >>> - if (!process_madvise_behavior_valid(behavior)) { >>> + if (!process_madvise_behavior_valid(behavior, task)) { >>> ret = -EINVAL; >>> goto release_task; >>> }