From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 447D7C00A5A for ; Tue, 17 Jan 2023 23:01:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BF34C6B0071; Tue, 17 Jan 2023 18:01:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BA2BB6B0072; Tue, 17 Jan 2023 18:01:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1CAC6B0073; Tue, 17 Jan 2023 18:01:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8B3D16B0071 for ; Tue, 17 Jan 2023 18:01:11 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 68F8E120143 for ; Tue, 17 Jan 2023 23:01:11 +0000 (UTC) X-FDA: 80365813542.13.D77E775 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf03.hostedemail.com (Postfix) with ESMTP id C68032001A for ; Tue, 17 Jan 2023 23:01:09 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jaENkkvK; spf=pass (imf03.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673996469; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=36ojNrGFeLlO1WRePHzVkOKkUUAt+mzyuzgpCW6PfVQ=; b=mkobJAM4FcrCCRJGs3BLTxwS3RdbEVh7tYaiDFG6yeTtXxeDlwEWIbEu7H9bfqOHsCXBqF 1Dj3hRc2rsYjuRodZUe7Obk/Hw7TqvIddLpBa4/qQmcYwSkT0eU+YKa3AHk0uzH8CwWeF9 WfBBlzVzAyYrPYMVUGQc+dPHjoGXNZE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jaENkkvK; spf=pass (imf03.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673996469; a=rsa-sha256; cv=none; b=TwCQ0KFg8yYLT5LYvonJzhCUP7nJigMuLNwA9eNbGbNuMtVqCHSBk2KmE7IcTI/8ylvsfh LMW6YIkGsNZG8mYcH0MrUvh9oRLK37T7lBFUY1yxkyC+u2Cz5cPMk22O81pbu8rUa7F6R2 vqb00VHZwXn5TKquZpRFZajPiUlx9d4= Received: by mail-ed1-f42.google.com with SMTP id x36so12618649ede.13 for ; Tue, 17 Jan 2023 15:01:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=36ojNrGFeLlO1WRePHzVkOKkUUAt+mzyuzgpCW6PfVQ=; b=jaENkkvKRPa/iJ4S60jFT8yAwmEyoPQlMWJNJ7EzqWGC+3vb+MJp6NKWHSK8JfZ+rt CLjgK6fHyYjB0MJznyvWCSLjfcIYVpoSWp4bhJyCMXHTJ4XsXfk+ptd2OpscHNGfmMhu MouGs4GP5Op8L/PuwT2YF9+S9w8YlnjcSehWgqvBMVQFDoUMR2JqqnCfmnrqN4zpZsEO YJCdFF3fHlTdk4dlz0/I0mBGONaEezELtG1D/Lg9BtAJC9dt8zQsPPNCg6mn15OdSIAz HwYhSUqyzEcxgQdmbPWSajo9Y1yhX6HpGnDZnKxqtXP7MB3wmnWMjdHP2hyVbp9lRJn+ AOYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=36ojNrGFeLlO1WRePHzVkOKkUUAt+mzyuzgpCW6PfVQ=; b=eBzSZ3P/SLldozhQK4qUo5D+y4QyDbL7AP/K7c0DZddAtWwXAL/pNfSVyEq2/29R3b 531NEjMvg4BvLhWOitP8cme+A9Ml+cMzWbuJbHLSOTfFpKTR9eG1hbMnjOoDaeWohBYN RmvKpzNgeuEgADcV9G68r7YW37ba1DnAAuQornFrjUUfYkffKM9lOOuvs56lyAGUCe3d IePD8QLRXsNaF0EXwOs7abNjXN5D2ln0jalIThB8GyDm34arOP+J9u/92/2K2b+ucbBj 2gq7Utlv+2cqjURbRvg8ko74/7pbIjqAbwFxiovZdQYqNzxa4Dn829+W44aJvM5xKM6B vh7A== X-Gm-Message-State: AFqh2ko8peeqrjSdq0fsYuZ4CrvX8ZS74sPRGsoqTrJ7wivoicHVDTEP XG/gDGOWUW69/EW7029HnLKRmpIHxJk42do7HjKsQQ== X-Google-Smtp-Source: AMrXdXsGqLZTAXgdag+HHZacewzf5CT33O5WG5+Gy6NMYX06iOrS3y2ryIkO3mO8fcLcm9Kc17Q8hcS3vt42FRZ1e/I= X-Received: by 2002:aa7:db98:0:b0:49e:160c:e9b0 with SMTP id u24-20020aa7db98000000b0049e160ce9b0mr502864edt.425.1673996468046; Tue, 17 Jan 2023 15:01:08 -0800 (PST) MIME-Version: 1.0 References: <32be06f-f64-6632-4c36-bed7c0695a3b@google.com> <7ff97950-b524-db06-9ad6-e98b80dcfefa@redhat.com> <86d5f618-800d-9672-56c4-9309ef222a39@redhat.com> In-Reply-To: From: "Zach O'Keefe" Date: Tue, 17 Jan 2023 15:00:30 -0800 Message-ID: Subject: Re: [PATCH] mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma To: David Hildenbrand Cc: Yang Shi , Hugh Dickins , Andrew Morton , Jann Horn , Song Liu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C68032001A X-Stat-Signature: 8ugabfg3ymn3fhpjdxt896ojtcbtzb8n X-HE-Tag: 1673996469-672268 X-HE-Meta: U2FsdGVkX1+PtezkrxkddgKmtEeDLulXGPZMPiK4GkYiIDGnzGjCgkGdDAka/gkgcS1REZxLEqyverPv6I0k5gXki5T+L84nGI5+ut+Ag07izRnhsLDEnjb6JeZiZDJu+pTt8ZoUb6ygIidrRxP3d+W0wqkkNb0pMGR6xGfSEUamTdwsulYPtBTK7JDfIKeRqukXWdndzsmv4kjCsWF3NMIZOlxbweEvWMLXEXMR/MPuZeT/J1CR2UJ9HT7hE4mixV3HQqpfUttv4eLDXPGIkcOKW1PUFkW98kLQZF2R867pK1LfW+iPaZ6bzRmk9EtEwHWvoWXTIPFg5IZNREavi23UyoU64OHarhVaGr8QJVl9mucXAemcQ57wtZPFcY16Miu2q3Rvrxb8U5V2zaVcfjCabAx1ufoyS2hCkSVOuPK8ib9Dbu92LtEhfAkq94J8HKL57h4pCSMB6ugYgrB7zmYJbAb9bmdLZsmz9dcBF7FRkHgYzVxQ4hl8hzWTR1ohoiYl4OfOIlBOXNOl+Oa6vnSPuOORGdhFsG9O4bfk2z+RI+tclcMqqDm/SXXyBEsekDWBLvmJBnN4w4wML0X+mh//+5CoAIh+mOIxLPHRIXBjvPCNKqHypd2KuY0G5STcOJq42UoFrvuE54fNrodGYHdEBO5sB1niqduKIIegNGsf2+j3D0QKz7ycTAdbCoYtPG6O7hATXB+N7Bo41AR4zTmG7yVbzcrjDWAFTYQmwSU5nrevW2j9kCVc2zl+aRTgbP9kajvzPh75bc5Vl0jccX49MZIFUdFyzpGQhgv3LxbDyrZd3ChtnENJpkq+3gy5F7gCiIjGq2HO02EHX6c+kzwEpRQO/G0Yj7Y0HnSjFxYgmGD8INVjh/OsTQ02CKr1IiY+BXvzsSnjM6QMtH3DlT7gmb1I0qSNwz4e/0yqZyCnH/I8lDvAouMVLbFk0CaSuYKBT7nXOiIsoC3OLVR bKGMicCT 1w1nc/aLvkxXAbM/fWXiie5MnFe//mr3XI5bXi6nDuZ2Sa3yiOrBdbMEsjg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 9, 2023 at 12:50 AM David Hildenbrand wrote: > > >>>>>> > >>>>>> Side note: set_huge_pmd() wins the award of "ugliest mm function of early > >>>>>> 2023". I was briefly concerned how do_set_pmd() decides whether the PMD can be > >>>>>> writable or not. Turns out it's communicated via vm_fault->flags. Just > >>>>>> horrible. > > Hey David, Sorry for the late response here. > > My first Linux award! :) At least it's not "worst mm security issue of > > early 2023". I'll take it! > > Good that you're not taking my words the wrong way. > > MADV_COLLAPSE is a very useful feature (especially also for THP tests > [1]). I wish I could have looked at some of the patches earlier. But we > cannot wait forever to get something merged, otherwise we'd never get > bigger changes upstream. > > ... so there is plenty of time left in 2023 to cleanup khugepaged.c :P > > > [1] https://lkml.kernel.org/r/20230104144905.460075-1-david@redhat.com Yes, thank you for these tests. I have them open in another tab along with a mental TODO to take a closer look at them, and response half-written. In-place collapse of anonymous memory *is* something that I was interested in exploring later (I have a use-case for it; hugepage-aware malloc() implementations). I'm taking so long on it (sorry) b/c I need to review your point (2) (all PTE's mapping exclusively). Hopefully I can get to it shortly. > [...] > > > >> For example: why even *care* about the complexity of installing a PMD in > >> collapse_pte_mapped_thp() using set_huge_pmd() just for MADV_COLLAPSE? > >> > >> Sure, we avoid a single page fault afterwards, but is this *really* > >> worth the extra code here? I mean, after we installed the PMD, the page > >> could just get reclaimed either way, so there is no guarantee that we > >> have a PMD mapped once we return to user space IIUC. > > > > A valid question. The first reason is just semantic symmetry for > > MADV_COLLAPSE called on anon vs file/shmem memory. It would be nice to > > say that "on success, the memory range provided will be backed by > > PMD-mapped hugepages", rather than special-casing file/shmem. > > But there will never be such a guarantee, right? We could even see a > split before just before we return to user space IIRC. Absolutely. But at least we are *attempting* for symmetry here; though admittedly, even a successful return code provides no guarantees. Perhaps this is a weak argument by itself, though. > > > > The second reason has a more practical use case. In userfaultfd-based > > live migration (using UFFDIO_REGISTER_MODE_MINOR) pages are migrated > > at 4KiB granularity, and it may take a long (O(many minutes)) for the > > transfer of all pages to complete. To avoid severe performance > > degradation on the target guest, the vmm wants to MADV_COLLAPSE > > hugepage-sized regions as they fill up. Since the guest memory is > > still uffd-registered, requiring refault post-MADV_COLLAPSE won't > > work, since the uffd machinery will intercept the fault, and no PMD > > will be mapped. As such, either uffd needs to be taught to install PMD > > mappings, or the PMD mapping already must be in-place. > > That's an interesting point, thanks. I assume we'd get another minor > fault and when resolving that, we'll default to a PTE mapping. Yes-ish; I think it depends on how userspace decides to deal with the event. At least in my own test cases, IIRC (hazy memory here), we ended up in some loop of: done faulting all 512 pages -> MADV_COLLAPSE -> fault -> copy page -> done faulting all 512 pages -> ... Thanks, Zach > -- > Thanks, > > David / dhildenb >