From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F264C433EF for ; Wed, 17 Nov 2021 19:50:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A7FC61B95 for ; Wed, 17 Nov 2021 19:50:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3A7FC61B95 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id CBB886B0073; Wed, 17 Nov 2021 14:50:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C69D66B0074; Wed, 17 Nov 2021 14:50:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B322D6B0078; Wed, 17 Nov 2021 14:50:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0215.hostedemail.com [216.40.44.215]) by kanga.kvack.org (Postfix) with ESMTP id A3EC76B0073 for ; Wed, 17 Nov 2021 14:50:45 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6FF5D8249980 for ; Wed, 17 Nov 2021 19:50:35 +0000 (UTC) X-FDA: 78819464514.01.2CCB058 Received: from mail-io1-f45.google.com (mail-io1-f45.google.com [209.85.166.45]) by imf27.hostedemail.com (Postfix) with ESMTP id 1A41670000B0 for ; Wed, 17 Nov 2021 19:50:33 +0000 (UTC) Received: by mail-io1-f45.google.com with SMTP id p23so4832861iod.7 for ; Wed, 17 Nov 2021 11:50:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=87/aCIYRViE4sqAqPWWehJIW8qss9o3shFfDpFVgLYk=; b=fp2OhtiS0fXO2pOUqVzeeQXECteP/WGOKq2Vv2DaHOcupDqecq6vkULti9AEHtuGz2 3oE5AMlLykZjhKgcyXVDh99m/ylPGzuAIN7B7BPiagMntY8aN7GIds+frKyYN1GvslJM Y9e0EERuBpDsvC/ULXf1jy/PImABK53n0jVxAjQr4h/s1YGqRI3qgjM0rK7OO90QBr0G vnqxkGAwG7KgsvXZjNQGMmIUQuP1kad/KxyFqy1wJeMFy4/gmsLcBpAAv0pfhMNiVOL5 7uOiHaUFuuK6AvD3s3Dn2+FWcIqUxEc3dSseWVE1WACU9gSNEHFsoACw12ltrmju3nGx uYgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=87/aCIYRViE4sqAqPWWehJIW8qss9o3shFfDpFVgLYk=; b=ZfMKIpHDtMp1qX4VkZgap5nlyKtFbc01VwstzClJgwvFnQsO87irAX3OIbdbLHKB2O F5+VvSe3uSwV18q1V7H0nCsewbyNzCbGIav/XtegXuQzPf1iw9RODrsTIKxBv8P799CL KHhFkzB2RmKhZtBQaQvlUcwuzw5MbvcTnd6DChihx2hqM+RCwp/AFp+H/aWJXXCPu0PX H47E2sIbmAcc21It9Up30aB3MhC9R/6v192LCD78U5V6A/BoINuveuNIoFzYG++cMlkN thM9uXpB8QNvbTgDQZoDHPPae3eT/cOlJWMfLuCLB5F0fpUlzP2E8Cerb4lozfs4Swxo Dgig== X-Gm-Message-State: AOAM532TfwZxJ6XRYANG6e2SQC5Lgarv6+4aqd5WJGyIVavwyMxUXD3F crXDdO5h/10wCgZ/Hlel+dPrbowkhYeaLtqEbjBtcQ== X-Google-Smtp-Source: ABdhPJwKxkjTGO1T/S5ZPZGTSMMo0deomcO62FYkNj+OiRO2YyBIWst5MD6HCKGcbYrw5W9Rpjt5YbQ/PdpE1YSDBGs= X-Received: by 2002:a05:6638:160c:: with SMTP id x12mr15414855jas.60.1637178634261; Wed, 17 Nov 2021 11:50:34 -0800 (PST) MIME-Version: 1.0 References: <20211107235754.1395488-1-almasrymina@google.com> In-Reply-To: From: Mina Almasry Date: Wed, 17 Nov 2021 11:50:23 -0800 Message-ID: Subject: Re: [PATCH v4] mm: Add PM_HUGE_THP_MAPPING to /proc/pid/pagemap To: Peter Xu Cc: David Hildenbrand , Matthew Wilcox , "Paul E . McKenney" , Yu Zhao , Jonathan Corbet , Andrew Morton , Ivan Teterevkov , Florian Schmidt , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1A41670000B0 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=fp2OhtiS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of almasrymina@google.com designates 209.85.166.45 as permitted sender) smtp.mailfrom=almasrymina@google.com X-Stat-Signature: 7ebaccy53g5su98wdekmba3y8taq77uh X-HE-Tag: 1637178633-133547 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 15, 2021 at 5:41 PM Andrew Morton wrote: > > On Wed, 10 Nov 2021 14:11:20 -0800 Mina Almasry wrote: > > > Add PM_HUGE_THP MAPPING to allow userspace to detect whether a given virt > > address is currently mapped by a transparent huge page or not. Example > > use case is a process requesting THPs from the kernel (via a huge tmpfs > > mount for example), for a performance critical region of memory. The > > userspace may want to query whether the kernel is actually backing this > > memory by hugepages or not. > > > > PM_HUGE_THP_MAPPING bit is set if the virt address is mapped at the PMD > > level and the underlying page is a transparent huge page. > > > > A few options were considered: > > 1. Add /proc/pid/pageflags that exports the same info as > > /proc/kpageflags. This is not appropriate because many kpageflags are > > inappropriate to expose to userspace processes. > > 2. Simply get this info from the existing /proc/pid/smaps interface. > > There are a couple of issues with that: > > 1. /proc/pid/smaps output is human readable and unfriendly to > > programatically parse. > > 2. /proc/pid/smaps is slow. The cost of reading /proc/pid/smaps into > > userspace buffers is about ~800us per call, and this doesn't > > include parsing the output to get the information you need. The > > cost of querying 1 virt address in /proc/pid/pagemaps however is > > around 5-7us. > > > > Tested manually by adding logging into transhuge-stress, and by > > allocating THP and querying the PM_HUGE_THP_MAPPING flag at those > > virtual addresses. > > > > --- a/tools/testing/selftests/vm/transhuge-stress.c > > +++ b/tools/testing/selftests/vm/transhuge-stress.c > > @@ -16,6 +16,12 @@ > > #include > > #include > > > > +/* > > + * We can use /proc/pid/pagemap to detect whether the kernel was able to find > > + * hugepages or no. This can be very noisy, so is disabled by default. > > + */ > > +#define NO_DETECT_HUGEPAGES > > + > > > > ... > > > > +#ifndef NO_DETECT_HUGEPAGES > > + if (!PAGEMAP_THP(ent[0])) > > + fprintf(stderr, "WARNING: detected non THP page\n"); > > +#endif > > This looks like a developer thing. Is there any point in leaving it in > the mainline code? I used this to test locally and I thought it may be useful, but on second thought probably not worth it. Removed in v6 I just sent. On Mon, Nov 15, 2021 at 5:59 PM Peter Xu wrote: > > On Mon, Nov 15, 2021 at 02:50:26PM -0800, Mina Almasry wrote: > > PM_THP_MAPPED sounds good to me. > > > > TBH I think I still prefer this approach because it's a very simple 2 > > line patch which addresses the concrete use case I have well. I'm not > > too familiar with the smaps code to be honest but I think adding a > > range-based smaps API will be a sizeable patch to add a syscall, > > handle a stable interface, and handle cases where the memory range > > doesn't match a VMA boundary. I'm not sure the performance benefit > > would justify this patch and I'm not sure the extra info from smaps > > would be widely useful. However if you insist and folks believe this > > is the better approach I can prototype a range-based smaps and test > > its performance to see if it works for us as well, just let me know > > what kind of API you're envisioning. > > Yeah indeed I haven't yet thought enough on such a new interface, it's just > that I think it'll be something that solves a broader range of requests > including the thp-aware issue, so I raised it up. > > That shouldn't require a lot code change either afaiu, as smap_gather_stats() > already takes a "start" and I think what's missing is another end where we just > pass in 0 when we want the default vma->vm_end as the end of range. > > I don't have a solid clue on other use case to ask for that more generic > interface, so please feel free to move on with it. If you'll need a repost to > address the comment from Andrew on removing the debugging lines, please also > consider using the shorter PM_THP_MAPPED then it looks good to me too. > Awesome, thanks! PM_THP_MAPPED sounds good to me and I just sent v6 with these changes. > Thanks! > > -- > Peter Xu >