From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 893CDC4708E for ; Tue, 3 Jan 2023 20:41:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E82BA8E0003; Tue, 3 Jan 2023 15:41:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E32C38E0001; Tue, 3 Jan 2023 15:41:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFA748E0003; Tue, 3 Jan 2023 15:41:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BF0908E0001 for ; Tue, 3 Jan 2023 15:41:06 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5E1C312033E for ; Tue, 3 Jan 2023 20:41:06 +0000 (UTC) X-FDA: 80314657332.28.3FCB3F7 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) by imf25.hostedemail.com (Postfix) with ESMTP id B3D01A0009 for ; Tue, 3 Jan 2023 20:41:04 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Ydt5SNFE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of hughd@google.com designates 209.85.219.54 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672778464; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N0SQYHeEbnE9z4Qjm2C7vh/kCROxi2guPCmpYkchnHY=; b=Ri5zrCeoWDf03LP4AUGmwZ040epOWM3oR8Hg29wcZzxv00FHoDMvNaO2M3xgmJ9avBPFsx srb24ihPZ5b4jMyOiSt+9so5iHhLY9+/NL0v331LAJL8Cs96VhVvKxbVqnV4Cxd1veUCRf BEeDgUXANo/rPycJVsZYSroMVlVy46U= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Ydt5SNFE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of hughd@google.com designates 209.85.219.54 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672778464; a=rsa-sha256; cv=none; b=H89Xk7EsFINroJpyo6PibNrYIxU8vjrIeXa3dFOXCGpFFcBmEqmuaASyr5aknulh3CAgYY xRE9BUjuACqOsm8CwKkKCBKCZN8jCVtts93ba3u2nWDm+JdeZaI6ce8d5/Hs4r4xtPZqVY injcgQKOsoknMq6vY0bw4DsVmpMr8DU= Received: by mail-qv1-f54.google.com with SMTP id i12so22227588qvs.2 for ; Tue, 03 Jan 2023 12:41:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=N0SQYHeEbnE9z4Qjm2C7vh/kCROxi2guPCmpYkchnHY=; b=Ydt5SNFEjKlAC0TY0teiqV18lbgicsulT9jqucck86pLsLHJ7x6EMNy08kfv1ueZYH 9NVYxW0ADVye0azxdLDjWeyIEsNuXX4c8FflSPLY5K58mmuYnPupJT5f6NeGukAey6c+ RJ5k7aly5L6VwVCKjEYHDUGCJYwJvfLZYbvxDO9vhfE2vzQU7dJikknj5+pKeuDUA1jP VAvT1P8mDwk2GD/xBP6XeJSIwtV++WrhLqoK25YJsMPDjwivIuSTtWtGZgrD8cNq7bkV E0rA9X7Z/CnQJuT17SoczLLC0wAQxGJZrh78EBt2Tqr9thwQXz/pwghfg1Ijekx6egcl qQRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=N0SQYHeEbnE9z4Qjm2C7vh/kCROxi2guPCmpYkchnHY=; b=APTh2U5lfyoaqcYy5vcxaMP35ytviIYEI9+EVHVJpsODIIQN/A5pLY1ZnQSHrE/aLQ Zdjcw+iizeoINzpWddySJiOFBSSoRVJHgpFAH3nBIhVUgRrZ1wCs7G2Og80rxgNLYNy+ Qx1prdAHBRdh0YqbM8exU46GTFblJEv+FC1HMJgC5bd22B4GGyyB3PHWAczI95XQHNNp E+WkKV+BrnAqNW2kvDZb/vg4g5Id7GYM+Ibf0C+MPzxLahQ7eF9YxSxpa+doqLaYc1D3 vclI3NIFncIs8FR2STn+ac7yvfl4tgg1nihit1CylAYNBsCIMDNBLXzV1TUOmDLRWOSe Sgtg== X-Gm-Message-State: AFqh2krr0FWZ2h6AhmMqdJF9AbVgrDsuvq+fNwEadCNDiEmZU5xjERzy s6iOHzypaSL0EBvSVvhoqBQfjw== X-Google-Smtp-Source: AMrXdXvt2F1RytVLctXfREjFUMsxuc+TDxv7E9uU7y4X6xZs/JRcJyyex0g0Xd5QWKJR4BBDRfZHhg== X-Received: by 2002:a0c:f783:0:b0:4c7:53b0:2a3c with SMTP id s3-20020a0cf783000000b004c753b02a3cmr63107090qvn.11.1672778463796; Tue, 03 Jan 2023 12:41:03 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v2-20020a05620a440200b006fed2788751sm23239857qkp.76.2023.01.03.12.41.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Jan 2023 12:41:03 -0800 (PST) Date: Tue, 3 Jan 2023 12:40:53 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: David Hildenbrand cc: Hugh Dickins , Andrew Morton , Jann Horn , Yang Shi , Zach O'Keefe , Song Liu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma In-Reply-To: Message-ID: <32be06f-f64-6632-4c36-bed7c0695a3b@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B3D01A0009 X-Stat-Signature: ch7ktzdj6brc96twierj5cc6wuoo5u99 X-HE-Tag: 1672778464-666831 X-HE-Meta: U2FsdGVkX1+HQeSr0k+2hxbTqpIQfGmnzSkvXUdYJDPVSlwp6VSYKSVExz6+dcGffLTW2Nfl+4Q7bS2KwrMIEjtsgV6fSwgmXK+LI1OUkxqjUzDqM8n+nNrcWcGxceGkmAhFL7NvZwZExMbuCw46rC8LPMf++YGGFa6O/YR6M2tkfeUKktjZ7ZKVyDPmz01Jgm5uL5cIJqVhM1sjuw7PNrfaowf723VgpG0asQJVVRU//BW3jb5TUi49WcF1NFeQ1wKQvttQ2m3z+Pilc5nU4EF/m3Q6WXmNOl+OIoECCyvi1KOpdz51QkSJ6oyYIfSWRZoF60rN+DBY+VA9QxSqacvr6o0Esdq6wx2b+6+EVFEsBRIgUoxv281IOQ7cfBXjXeM/r9Gny4iijxiwqIzNhD71t/i/y00HCsfncWQdCgHYdumOH+phKjwYCF+l8Me4/tGRiYQnt6giuIJOeKsNmXv+O3oetIDoV6KjlphDl7YnIqcpdyHb0VpMECbHNX+0Nd3xTVdCoud3xhjxXYRMVsHhReVhdyeOyq2PSS8u0vLw1M/OVOHJD89UREoPHNoo3RO7DTN18MjBBi97h3bheGHtYjucx0I3KC7GGddxdnsMMthD9A1JDZ6ah0q2wr4VBDrYd68XPtRd2uiR/m7Lv/u2f3pconON9MMz+DOWvl3AoeIERangsgJffIOe3tJ7zuxhxI2MyFtBCUSQWLqy0sMFZR1WF45AhopD/zHblFH/AnH2m/nW/hV7XMUfWr2iG6VjlZqOj/1LNv+vppb5Lyvqp9LFfxngaS+h0oXxRGy2XWyNMn6lPG4BX7nlu1dNm490JVv8Suhg92jjU6hEy5DPwKvOXvh3g6nHEEciDnQA+XXKCFrZcjj0rjhbUgjJSmVaeQk/y7Sadnp3N9wVaBet46Jb3lrHOYx5RmzsQC9zwh50UfgHfpR4cSaektfDxRb3Wy1+UZWl0jkWOZQ VdQ9/MwQ 0aVs9ZmzFJNY8LUfUbh5xkqxYlFbroi0DTuYY+QQZE57a5YXjx/igHiUP6ltjQI0oFRWOWJbwd96VNE83z1HWZM+IvOVYeo24697MzN98Z38lXv7S8oMibYOwyxPPvI7uHiRAkDEXikBVf4o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 2 Jan 2023, David Hildenbrand wrote: > On 22.12.22 21:41, Hugh Dickins wrote: > > uprobe_write_opcode() uses collapse_pte_mapped_thp() to restore huge pmd, > > when removing a breakpoint from hugepage text: vma->anon_vma is always > > set in that case, so undo the prohibition. And MADV_COLLAPSE ought to be > > able to collapse some page tables in a vma which happens to have anon_vma > > set from CoWing elsewhere. > > > > Just so I get this correctly: the degradation is that we won't re-collapse a > THP after removing a breakpoint. Certainly "sub-optimal", but I guess nothing > that particularly matters in practice. Yes, it is not a correctness issue: see what I wrote "below the dashes": What this fixes is not a dangerous instability! But I suggest Cc stable because uprobes "healing" has regressed in that way, so this should follow 8d3c106e19e8 into those stable releases where it was backported (and may want adjustment there - I'll supply backports as needed). Which Andrew moved "above the dashes": I expect he shares, or thinks others may share, your scepticism about it - and deserve explanation. > > Or am I wrong? > > > Is anon_vma lock required? Almost not: if any page other than expected > > subpage of the non-anon huge page is found in the page table, collapse is > > aborted without making any change. However, it is possible that an anon > > page was CoWed from this extent in another mm or vma, in which case a > > concurrent lookup might look here: so keep it away while clearing pmd > > (but perhaps we shall go back to using pmd_lock() there in future). > > > > Note that collapse_pte_mapped_thp() is exceptional in freeing a page table > > without having cleared its ptes: I'm uneasy about that, and had thought > > pte_clear()ing appropriate; but exclusive i_mmap lock does fix the problem, > > and we would have to move the mmu_notification if clearing those ptes. > > > > Fixes: 8d3c106e19e8 ("mm/khugepaged: take the right locks for page table > > retraction") > > Signed-off-by: Hugh Dickins > > Cc: Jann Horn > > Cc: Yang Shi > > Cc: David Hildenbrand > > Cc: Zach O'Keefe > > Cc: Song Liu > > Cc: [5.4+] > > --- > > What this fixes is not a dangerous instability! But I suggest Cc stable > > because uprobes "healing" has regressed in that way, so this should follow > > 8d3c106e19e8 into those stable releases where it was backported (and may > > want adjustment there - I'll supply backports as needed). > > If it's really something that doesn't matter in practice (e.g., -1% > performance while debugging :) ), I guess no CC is needed. If there are real > production workloads that suffer, I guess ccing stable is fine. It's about recovering performance *after* debugging. It is not something that is of any value to me personally, nor (so far as I know) to anyone whom I work with. But it is something which Song Liu went to the trouble to make possible in his "THP aware uprobe" series three years ago, and it is something which Jann unintentionally regressed in his recent commit: so I thought it proper to reinstate where regressed. (What I do have more of an investment in, is for MADV_COLLAPSE to be able to collapse some extents in a large vma where some other extent got CoWed, so giving the whole vma an anon_vma. But that's not an issue for -stable, and I cannot tell you offhand whether undoing this anon_vma exclusion is enough to enable that or not - I suspect not, I suspect a result code or switch statement needs to be adjusted too.) > > > > > mm/khugepaged.c | 14 ++++++-------- > > 1 file changed, 6 insertions(+), 8 deletions(-) > > > > --- 6.2-rc/mm/khugepaged.c > > +++ linux/mm/khugepaged.c > > @@ -1460,14 +1460,6 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, > > unsigned long addr, > > if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false)) > > return SCAN_VMA_CHECK; > > - /* > > - * Symmetry with retract_page_tables(): Exclude MAP_PRIVATE mappings > > - * that got written to. Without this, we'd have to also lock the > > - * anon_vma if one exists. > > - */ > > - if (vma->anon_vma) > > - return SCAN_VMA_CHECK; > > - > > /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ > > if (userfaultfd_wp(vma)) > > return SCAN_PTE_UFFD_WP; > > @@ -1567,8 +1559,14 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, > > unsigned long addr, > > } > > > > /* step 4: remove pte entries */ > > + /* we make no change to anon, but protect concurrent anon page lookup > > */ > > + if (vma->anon_vma) > > + anon_vma_lock_write(vma->anon_vma); > > + > > collapse_and_free_pmd(mm, vma, haddr, pmd); > > + if (vma->anon_vma) > > + anon_vma_unlock_write(vma->anon_vma); > > i_mmap_unlock_write(vma->vm_file->f_mapping); > > > > maybe_install_pmd: > > > > That code is 99% black magic to me, but staring at the original fix and at > collapse_and_free_pmd() makes me assume that grabbing that lock most certainly > won't hurt and should be the right thing to do > > Acked-by: David Hildenbrand Thanks! > > > Side note: set_huge_pmd() wins the award of "ugliest mm function of early > 2023". I was briefly concerned how do_set_pmd() decides whether the PMD can be > writable or not. Turns out it's communicated via vm_fault->flags. Just > horrible. I firmly disagree - it's from 2022! and much too small to be ugliest; but I haven't thought about the aspect that is bothering you there. What's bothered me most about it, is the way its name, and the naming of the do_set_pmd() it interfaces with, give no hint that they are entirely about file (or shmem) vmas, and would not work right on anon vmas (I forget whether it's just a matter of which stats updated, or more). Hugh > > -- > Thanks, > > David / dhildenb