From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3503CC433EF for ; Thu, 10 Mar 2022 20:25:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD4498D0002; Thu, 10 Mar 2022 15:25:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A839E8D0001; Thu, 10 Mar 2022 15:25:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 925198D0002; Thu, 10 Mar 2022 15:25:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0049.hostedemail.com [216.40.44.49]) by kanga.kvack.org (Postfix) with ESMTP id 80EB48D0001 for ; Thu, 10 Mar 2022 15:25:15 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 35F7A9CD61 for ; Thu, 10 Mar 2022 20:25:15 +0000 (UTC) X-FDA: 79229606190.17.1888B57 Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) by imf18.hostedemail.com (Postfix) with ESMTP id CBA031C0023 for ; Thu, 10 Mar 2022 20:25:14 +0000 (UTC) Received: by mail-lf1-f52.google.com with SMTP id 3so11471985lfr.7 for ; Thu, 10 Mar 2022 12:25:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zuLUNC0xSFkDI3HLbLg+BYCkKKMyOyvDq7xZGsXzJKM=; b=tMeHGzcTG3j0Qt1pshsq7DZSRGrrwYC6z6Gn9TutqICqaVk0Ra/Hq7jnmdDO5AMc8p +SCzNd/jwnOrDPpVbMveYiKXsTKbdbjhou28+OTaGUaIhdczBOwkoICipx/bWI/iHDVh 7GgUgE3IuqRDRmJ/84XpqOmjH2+MlrPktnlNSuCvfpL1pOLa3Q0hpdsujreY+TCmnAjN Z5JjOeahbRxNhRFTewvMbyAm/KZd3F1m4VN0qs1513rNRI5KghzPqIj5BqZmQBj6zJkH tzVcnUQJEcd/SoctrAuraNZVW6SiAuzhtELdKCVpeFkgkXR/hCAlbClc6guTitlHsbmy 7H2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zuLUNC0xSFkDI3HLbLg+BYCkKKMyOyvDq7xZGsXzJKM=; b=XpkiDP0TuIaWj6qIqVW1F9SJ/lhmqxdQ25iY4ZwtqDjifj/9QcxKGHlyhngUAnQWbG aRxj+UGf/6Iuo9jOsb/7R7YkwKe6nHoyl4jMy/sQav1Vp4c/Q9Ea2iXqtQett7wwteRV bSZa07TSnC4uTs/cv5YE1ZReeqYy3D3giabJW76lXEqCGfUHnimjgPTTzvv6VkhLggta Y6wAR3NyypgpVeVXl8kZUMCrKpDo97Y09gmHSud/z3RsOcRfs7HIUzY2MW6VpXYvh8xZ dUhALpFB8PG/KgzdS8cTPX+zGvad8GDKOPkEadzAqSUtRgnWGXW0lys1WMTLQWeiapsx hm5Q== X-Gm-Message-State: AOAM532LgAfSNzHlzz+iE5aLJyIoTF8T8DCgkxy71uzsNPxVC10ZIG62 JWBgoJXUdQATNuO6nDpCWTO1v0nyEwSeAqf0yKXFlg== X-Google-Smtp-Source: ABdhPJxx1TFhcm246R/ZleKD0ZKcUwFYmxrb89c249FH4LsrGGCFqyCQoepzMRZ1u/yYCOiEumPPcWyVRPzCsB4WTw0= X-Received: by 2002:a05:6512:3caa:b0:448:33f9:35ec with SMTP id h42-20020a0565123caa00b0044833f935ecmr3972328lfv.354.1646943912883; Thu, 10 Mar 2022 12:25:12 -0800 (PST) MIME-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> <20220308213417.1407042-8-zokeefe@google.com> In-Reply-To: From: "Zach O'Keefe" Date: Thu, 10 Mar 2022 12:24:35 -0800 Message-ID: Subject: Re: [RFC PATCH 07/14] mm/khugepaged: add vm_flags_ignore to hugepage_vma_revalidate_pmd_count() To: Yang Shi Cc: David Rientjes , Alex Shi , David Hildenbrand , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: CBA031C0023 X-Stat-Signature: tzsqrdbj79947b1jhr86owouocry3kyb Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=tMeHGzcT; spf=pass (imf18.hostedemail.com: domain of zokeefe@google.com designates 209.85.167.52 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1646943914-372700 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 10, 2022 at 11:54 AM Yang Shi wrote: > > On Thu, Mar 10, 2022 at 10:46 AM David Rientjes wrote: > > > > On Thu, 10 Mar 2022, Yang Shi wrote: > > > > > > This separates "async-hint" vs "sync-explicit" madvise requests. > > > > MADV_[NO]HUGEPAGE are hints, and together with thp settings, advise > > > > the kernel how to treat memory in the future. The kernel uses > > > > VM_[NO]HUGEPAGE to aid with this. MADV_COLLAPSE, as an explicit > > > > request, is free to define its own defrag semantics. > > > > > > > > This would allow flexibility to separately define async vs sync thp > > > > policies. For example, highly tuned userspace applications that are > > > > sensitive to unexpected latency might want to manage their hugepages > > > > utilization themselves, and ask khugepaged to stay away. There is no > > > > way in "always" mode to do this without setting VM_NOHUGEPAGE. > > > > > > I don't quite get why you set THP to always but don't want to > > > khugepaged do its job. It may be slow, I think this is why you > > > introduce MADV_COLLAPSE, right? But it doesn't mean khugepaged can't > > > scan the same area, it just doesn't do any real work and waste some > > > cpu cycles. But I guess MADV_COLLAPSE doesn't prevent the PMD/THP from > > > being split, right? So khugepaged still plays a role to re-collapse > > > the area without calling MADV_COLLAPSE over again and again. > > > > > > > My only real concern for MADV_COLLAPSE was when the span being collapsed > > includes a mixture of both VM_HUGEPAGE and VM_NOHUGEPAGE. Does this > > collapse over the eligible memory or does it fail entirely? > > > > I'd think it was the former, that we should respect VM_NOHUGEPAGE and only > > collapse eligible memory when doing MADV_COLLAPSE but now userspace > > struggles to know whether it was a partial collapse because of > > ineligiblity or because we just couldn't allocate a hugepage. > > Yes, I agree we should just try to collapse eligible vmas. > > Since we are using madvise, we'd better follow its convention. We > could return different values for different failures, for example: > 1. All vmas are collapsed successfully, return 0 (success) > 2. Run into ineligible vma, return -EINVAL > 3. Can't allocate hugepage, return -ENOMEM > > Or just simply return 0 for success or a single error code for all > failure cases. > Different codes has a benefit (assuming -EINVAL takes precedence over -EAGAIN (AFAIK madvise convention for mem not available)): A lazy user wouldn't need to read smaps if -EAGAIN, they could just reissue the syscall again over the same range, at a later time. > > > > It has the information to figure this out on its own, so given the use of > > VM_NOHUGEPAGE for non-MADV_NOHUGEPAGE purposes, I think it makes sense to > > simply ignore these vmas as part of the collapse request.