From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3806C433F5 for ; Fri, 11 Feb 2022 19:08:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 384716B0078; Fri, 11 Feb 2022 14:08:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 30D756B007B; Fri, 11 Feb 2022 14:08:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AC786B007D; Fri, 11 Feb 2022 14:08:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 05E206B0078 for ; Fri, 11 Feb 2022 14:08:52 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 890818249980 for ; Fri, 11 Feb 2022 19:08:51 +0000 (UTC) X-FDA: 79131436062.06.8FC4B50 Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) by imf07.hostedemail.com (Postfix) with ESMTP id 2433440006 for ; Fri, 11 Feb 2022 19:08:50 +0000 (UTC) Received: by mail-io1-f51.google.com with SMTP id s18so12544716ioa.12 for ; Fri, 11 Feb 2022 11:08:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6JLd28e+NLvwE6yBfdYj8gXKTpz4aPz6HX1YLob6zz0=; b=jIFPSDD96puT+vw0LNhv65wJ7YEr8bY0vCv0LpBkiGlx1SUSYLO/sLfHDFyIHImAzG /h9SSrDWWiozdtKRfir2pG3HSAc90J3C/9KERDnq9jG3lTGrtP7aVb3ziTBtGoHdfj/q 7LmJIVTzpgc5HNz+bMGjINrAQPe2fKiCjgFUFNXz7yw9a3jgFEbvUsnsRUu8cuh8bZ8K rL2t2vQ3cizHS10jO8/dyGu1UC8D5FYveJq2yDA+rJDK+bfzVc9wryyGK9LlPvHBJyFM 6MdgIi1bXRDUPMpUcQjkmrAXjgqi5mhTK9Bk7CBw52FEu+qGUiO32Vq4extSiJGWreYY 4Dog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6JLd28e+NLvwE6yBfdYj8gXKTpz4aPz6HX1YLob6zz0=; b=hFjLjhZ/P0a4GbeG6TpgQI4w5VxDRvdaBKAzhq7Ll99uSKiTQ9ry+UvGBDRpPcNDrP HpnnirKdQ37NyfG0XJHpbdLs6AVGlBA6+fH4vF4aKwF1zkFDkl1ONEqttkoBG5aU+rZ5 kX7yTrhokuzajQzU0AZayF6GtJdHIUhKy7r7+qjsVIoPrxJ+fej2m24BvLqPoq5viCRB pZ+OXJh7Cv0Y3RuzWWJii96qSqzg/i8tooW2u/iB2AEDHGRPAOCAWNT4w2wns18w01Pi zIJvHh80kUs363SwdpAVxhEqTfgBkjIGKfCYCafTd3v+rZ6mO5bxgeBfZQvSi3txXgyF aXnQ== X-Gm-Message-State: AOAM532si+3BhyX2Rm3IDjIN9IyxBN+J+u3RVJdCcYRaHTrb1lt7RCh0 TMIaJgEDAmuff3CQNZ+WrMzUDQfF58iy1e2ZZG9O3g== X-Google-Smtp-Source: ABdhPJwjPPoE2UoJiEd8CP47TnFE6isPn/wYwMgodNPtAkYB93NtYOfaYYWutD3xTmMDsmWURym++djWJJ9Mz1VKR0Y= X-Received: by 2002:a5e:a70e:: with SMTP id b14mr1625729iod.171.1644606530262; Fri, 11 Feb 2022 11:08:50 -0800 (PST) MIME-Version: 1.0 References: <20220202014034.182008-1-mike.kravetz@oracle.com> <20220202014034.182008-2-mike.kravetz@oracle.com> In-Reply-To: From: Axel Rasmussen Date: Fri, 11 Feb 2022 11:08:14 -0800 Message-ID: Subject: Re: [PATCH v2 1/3] mm: enable MADV_DONTNEED for hugetlb mappings To: Peter Xu Cc: Mike Kravetz , Linux MM , LKML , Naoya Horiguchi , David Hildenbrand , Mina Almasry , Michal Hocko , Andrea Arcangeli , Shuah Khan , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2433440006 X-Stat-Signature: q5hyohh5k5uugp98s14wtpwbxugt5qgj Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jIFPSDD9; spf=pass (imf07.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.166.51 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1644606530-665993 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 10, 2022 at 6:29 PM Peter Xu wrote: > > On Thu, Feb 10, 2022 at 01:36:57PM -0800, Mike Kravetz wrote: > > > Another use case of DONTNEED upon hugetlbfs could be uffd-minor, because afaiu > > > this is the only api that can force strip the hugetlb mapped pgtable without > > > losing pagecache data. > > > > Correct. However, I do not know if uffd-minor users would ever want to > > do this. Perhaps? I talked with some colleagues, and I didn't come up with any production *requirement* for it, but it may be a convenience in some cases (make certain code cleaner, e.g. not having to unmap-and-remap to tear down page tables as Peter mentioned). I think Peter's assessment below is right. > > My understanding is before this patch uffd-minor upon hugetlbfs requires the > huge file to be mapped twice, one to populate the content, then we'll be able > to trap MINOR faults via the other mapping. Or we could munmap() the range and > remap it again on the same file offset to drop the pgtables, I think. But that > sounds tricky. MINOR faults only works with pgtables dropped. > > With DONTNEED upon hugetlbfs we can rely on one single mapping of the file, > because we can explicitly drop the pgtables of hugetlbfs files without any > other tricks. > > However I have no real use case of it. Initially I thought it could be useful > for QEMU because QEMU migration routine is run with the same mm context with > the hypervisor, so by default is doesn't have two mappings of the same guest > memory. If QEMU wants to leverage minor faults, DONTNEED could help.). > > However when I was measuring bitmap transfer (assuming that's what minor fault > could help with qemu's postcopy) there some months ago I found it's not as slow > as I thought at all.. Either I could have missed something, or we're facing > different problems with what it is when uffd minor is firstly proposed by Axel. Re: the bitmap, that matters most on machines with lots of RAM. For example, GCE offers some VMs with up to 12 *TB* of RAM (https://cloud.google.com/compute/docs/memory-optimized-machines), I think with this size of machine we see a significant benefit, as it may take some significant time for the bitmap to arrive over the network. But I think that's a bit of an edge case, most machines are not that big. :) I think the benefit is more often seen just in avoiding copies. E.g. if we find a page is already up-to-date after precopy, we just install PTEs, no copying or page allocation needed. And even when we have to go fetch a page over the network, one can imagine an RDMA setup where we can avoid any copies/allocations at all even in that case. I suppose this also has a bigger effect on larger machines, e.g. ones that are backed by 1G pages instead of 4k. > > This is probably too out of topic, though.. Let me go back.. > > Said that, one thing I'm not sure about DONTNEED on hugetlb is whether this > could further abuse DONTNEED, as the original POSIX definition is as simple as: > > The application expects that it will not access the specified address range > in the near future. > > Linux did it by tearing down pgtable, which looks okay so far. It could be a > bit more weird to apply it to hugetlbfs because from its definition it's a hint > to page reclaims, however hugetlbfs is not a target of page reclaim, neither is > it LRU-aware. It goes further into some MADV_ZAP styled syscall. > > I think it could still be fine as posix doesn't define that behavior > specifically on hugetlb so it can be defined by Linux, but not sure whether > there can be other implications. > > Thanks, > > -- > Peter Xu >