From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3A9EC433F5 for ; Tue, 19 Apr 2022 22:43:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E6766B0072; Tue, 19 Apr 2022 18:43:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66D3F6B0073; Tue, 19 Apr 2022 18:43:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 535026B0074; Tue, 19 Apr 2022 18:43:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 3FE986B0072 for ; Tue, 19 Apr 2022 18:43:20 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0067A278E for ; Tue, 19 Apr 2022 22:43:19 +0000 (UTC) X-FDA: 79375106160.13.AE15BF8 Received: from mail-lj1-f170.google.com (mail-lj1-f170.google.com [209.85.208.170]) by imf24.hostedemail.com (Postfix) with ESMTP id 6188E180019 for ; Tue, 19 Apr 2022 22:43:18 +0000 (UTC) Received: by mail-lj1-f170.google.com with SMTP id bn33so22288932ljb.6 for ; Tue, 19 Apr 2022 15:43:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3cw/2NfPSf9p91eBmPHT+dQ14soiHcsnrwD6mKufZDo=; b=YolvSSHBMHVkiwla964ZvK3gHzSh+q8tizD2rptUgq03MF5UAirWzoUmkOcB+AC2aC 1jE/6djW8XI67lgxgdASH58WFwSt7PZSAicc2Gzwm1/MD5sbaBSr1mH4QsO3ON3sVdxE WFx8WISgVloXSLjRbXgF3fqrNH4qCR3QyJNibba1InwhZime7Jix8EDaK9WZ4gv3zx/K 4/KIBfyhMuWujAmn5nbczyApWyqcjko1rc8Fe37PJG8rdK47Etqoux20DTA6xJ1HkdDl 1epAQk5x1TZRqkP8vLxfzSZ3P0ptHF/VtangAVItPhdlj2jWqf3dhi6XSpznBdW/WLSg 7sCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3cw/2NfPSf9p91eBmPHT+dQ14soiHcsnrwD6mKufZDo=; b=gD0xVJvtDFBG5NpWCZ/rfSL2JqX2v0JyKPkKyWuYkWnxG+cobn77zjJk5cxDL7LNpa esN/Z2wivuAQnklHTllNiQReB4CsUat+mTn6J59EwCfofQJsbK//Bwj6XzacQPHZOKWZ 2tvCOuSIVdf/t5pGOwuY3tYQzEYY8JrxS4Z5N7lZW93nqNGNO+OQ2s/Awu4ytwTsj/gN 8Qh+yUBPI9q80FPapmkHqXkFivXPlPLqw0lJYExYG8r6Anmu2ch7RR3g3wfgqCslXG5E NH/StPOQ2CNAy/BIxZVSHTl1lCxh6KWS3thQrMlRS9YaKDX4Y8ox1QrK3xbUH209AW0f Q0rQ== X-Gm-Message-State: AOAM532shHQWfp8CZZyHf569XQm8PCypn6n7BnUhab8d3dn945tIIHwP 9XQjDXi09XRq6qcdOmnQi4yz4IYc4yqyc2TjtGD/lg== X-Google-Smtp-Source: ABdhPJx7k0fl5TAYrQI4ukOGPX+D6fRloVK1gian4Nkp3BqYagog0dZfOxBQ6vhoGEezNGoeF1OQKhIrveJYbWpAyWc= X-Received: by 2002:a05:651c:b0e:b0:24d:9edf:ccbc with SMTP id b14-20020a05651c0b0e00b0024d9edfccbcmr11356903ljr.466.1650408197336; Tue, 19 Apr 2022 15:43:17 -0700 (PDT) MIME-Version: 1.0 References: <20220414180612.3844426-1-zokeefe@google.com> <8d8da2fb-aed9-96d0-47ed-94806e190250@redhat.com> <0cb08671-52b2-608a-74f1-eb6fdce5f100@redhat.com> In-Reply-To: <0cb08671-52b2-608a-74f1-eb6fdce5f100@redhat.com> From: "Zach O'Keefe" Date: Tue, 19 Apr 2022 15:42:40 -0700 Message-ID: Subject: Re: [PATCH v2 00/12] mm: userspace hugepage collapse To: David Hildenbrand Cc: Peter Xu , Alex Shi , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org, Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: 6188E180019 X-Stat-Signature: d7adtqodydjtwxcj1fmyzbt1aiosufen Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=YolvSSHB; spf=pass (imf24.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.170 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam01 X-HE-Tag: 1650408198-339318 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 19, 2022 at 1:03 PM David Hildenbrand wrote: > > >> E.g., have with a very sparse memory layout, we don't want to waste > >> memory by allocating memory where we actually have no page populated yet > >> -- could be user space won't reuse that memory in the foreseeable > >> future. With too many swap entries, we don't want to trigger an > >> eventually unnecessary overhead of swapping in entries if user space > >> won't access them in the foreseeable future. Something similar applies > >> to max_ptes_shared, where one might just end up wasting a lot of memory > >> eventually in some applications. > >> > >> So IMHO, with MADV_COLLAPSE we should ignore/disable any heuristics that > >> try figuring out what user space might be doing. We know exactly what > >> user space asks for -- and that can be documented properly. > >> > > Just a thought, if we ever want to implement khugepaged in user space, > it could theoretically obtain similar information using e.g., the > pagemap. It wouldn't be race-free, but the question is if it would matter. > > I consider the primary use case of giving an application more precise > control over actual THP placement. > Good point about the pagemap and agree about the primary use case - I'll make that clear in v3 cover letter. > > > > Sounds good to me. Would you also be in favor of decoupling allocation > > semantics from khugepaged? I.e. we'll pick some default gfp flags and > > not depend on /sys/kernel/mm/transparent_hugepage/khugepaged/defrag? > > Good question. It's not really a heuristic like that other stuff. > > Easy answer: we're not dealing with khugepaged, so anything in > /sys/kernel/mm/transparent_hugepage/khugepaged/ shouldn't apply? > That's what I'm thinking now too. If there's no objections, I'll proceed in that direction for v3. > Sure, we could have a separate toggles for MADV_COLLAPSE. > > Maybe we simply want a dedicated syscall where we can specify additional > options ... but maybe that simply over-complicates the problem. > Thankfully process_madvise(2) has flags, and madvise(2) users can always migrate to using process_madvise(2) on self. Piggy-backing off madvise infrastructure for these "non-advice actions" (e.g. MADV_PAGEOUT) seems to be the norm. Thanks as always for your time and thoughts! Zach > -- > Thanks, > > David / dhildenb >