From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F33DAD2E008 for ; Wed, 23 Oct 2024 02:04:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 72A366B00B5; Tue, 22 Oct 2024 22:04:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D9E46B00BA; Tue, 22 Oct 2024 22:04:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57A156B00BB; Tue, 22 Oct 2024 22:04:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 389F86B00B5 for ; Tue, 22 Oct 2024 22:04:17 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 73B07406AA for ; Wed, 23 Oct 2024 02:04:07 +0000 (UTC) X-FDA: 82703221818.04.81C2797 Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) by imf26.hostedemail.com (Postfix) with ESMTP id 741D3140002 for ; Wed, 23 Oct 2024 02:04:02 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=eXIZhoc9; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf26.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729649004; a=rsa-sha256; cv=none; b=ftzA3FvvXcnYCcnqcHNNtZXyAq1EkHapDGzo8w9EMGLZRcjnjJOtF5aPkrQF+Xg1cJ/X1E YBi6/fCjnQHI3En+WbH2IAgEXwKYlp7bjyrSAiZIxAVVu1t/uW5wO2yFDEqFXBw27QLP00 glqLEfO3Gzrl9cKId9L2Yz0mHHAYu2Y= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=eXIZhoc9; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf26.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729649004; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WTDEQxtL/e15npVRRblX1oHt3rcZenqRAA28VnZ+D7Q=; b=KINcmuKsXzf8RgFE9+mxOJLQU93t90oGd4E5udcTX3V2/9I3tdskpC2tI+fnUlhcYYIQhH sdJz+qHP3EjI4C2SdJKSEx+9x3VaglhUYNLpHfyK/e1Bgm28uMIei1RtgmOiqbgrjl5RMm ZyZ6IEWlOBISememLxBroU3IFSBoeUY= Date: Wed, 23 Oct 2024 02:04:07 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1729649053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WTDEQxtL/e15npVRRblX1oHt3rcZenqRAA28VnZ+D7Q=; b=eXIZhoc9TWuSBnwsavc9/J5Z7v7U1C5BMrOQyHH3rqg/Ek6KpEz+5L98tc+uhzAH3yAWew 6Hl+Zj+c9bioi5qtX17tmHvMgtXbdotK3zuBFfXmw4HIMxi5xd/NcYahSK69rhSlxjEn0Q IMghs7iVcnFSvyFfoopgoQaGPL0eoVc= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Sean Christopherson Cc: Yosry Ahmed , Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, Vlastimil Babka , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Hugh Dickins , kvm@vger.kernel.org, Paolo Bonzini Subject: Re: [PATCH v2] mm: page_alloc: move mlocked flag clearance into free_pages_prepare() Message-ID: References: <20241021173455.2691973-1-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: 741D3140002 X-Rspamd-Server: rspam01 X-Stat-Signature: 9f93poax61iojjxc4apmp3kefrjur4iw X-HE-Tag: 1729649042-628013 X-HE-Meta: U2FsdGVkX1/zS1JyiNmmzFiNlyNFja69juZB+uTi6zNDJ4DwZNaqZEcafZC55VvDAu9EAHuBQ+QxB2cvPMswORuKZxc1VJ0Ey/NYJk4H5ldC/VvMnLNh6P26pO5DRI048D8vm9kPnT+kiGdB2etrGv8u5/+NPZy8aZ0ba67EMEvOs0Veu993UeLb1QcS3+BMfuP2A8CiLa1e7FtittS4KItLZN/CKyBQx160kNbnnfGwh3sC8Hf+qi1o2TJEHeBx+NvzcEXj7Cr9ZB2KmjUUQSDkTiD5C+6f+kSOvANvR7AjF3kpx/15eNe6f3bm+vYkN6l3BLFYXCTcoCVfoXDO3tdc/W8m+MO7omeZRxh1QKDSQSp4W8CvpuRemtXasbrB9s91ay3gL6EN4nJXB0F0uqFNzX4nfOwmuQaNVl8O95mHj3EvVFZqHgIcc/r/gs602OVgRlGhUC9QNsp9p3AnHcDAhz/MTWRm0yxysWVENpNIWKvj+lUi3PAE/Xb/lmECOYnPt8UVA5adrwpf1TGLMa/iwPj/GDPdYSnbpuQEuNCER313wROhIsUQ48y6py85ZwMyLQNBFoUP2yZyNaxv2vjlotQKpZKAXsiETH5Cp99bKmbad05ALHa655+Wz5VxRmusVP9yTvgeFHFukSo6o/E5bNklyggdVe5PuC5hHCq37VGJH9PDCvpW1vIMBUzB4IUaLXN490DTookwux5wHMEuEEsuNmIOkrGJJYi2zfpEMuGFQW7em0o/VqxFQyvrVsALR4QcdsO48cYX3g56QtOYUquSEKZSu/jxscSP9t9yCKbtltPPDO7S78rgFs8AwHAYERkMtEq50e8dhw5/9fXeU9RkDqUoK6818fYhM9fPxRMMiwBmrwxkWucav5u14wTmzvut/VOde4Y9QoIfRYBfhTIQfjV90RjJhpVVzTOsk2EbDOi7kn8yGVK4nFBosMXKGGCxOCyxv6ytsUr /9IE6JVN BuOXCAH0XtAKQiYeTkzSBHlu5Oj5Z+M1mTra7Ps28dkKsj0bK+4fqfRFKSJ62AycpHlVfS0c2d8/S7BHQw6JCpe+q8d1zQ64aXRU9skfsaHJ4zh5wsM3qNnIUEKT23S5bqzKJG+IU1U+N8s0tcA+EtLMmd/zHDpfft8BOwH3q189jLBSBIJKaXyFW3Vy7mPxia4wTGGkskKTeJCK9VzPw6prhQcP5xtRmD92nicqUnloUipAyAFXBivRL6WWhF/lXJ1dRN3F7DLsvaO8pXp3eUhfbqohsTO1k2Dxi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 22, 2024 at 08:39:34AM -0700, Sean Christopherson wrote: > On Tue, Oct 22, 2024, Yosry Ahmed wrote: > > On Mon, Oct 21, 2024 at 9:33 PM Roman Gushchin wrote: > > > > > > On Tue, Oct 22, 2024 at 04:47:19AM +0100, Matthew Wilcox wrote: > > > > On Tue, Oct 22, 2024 at 02:14:39AM +0000, Roman Gushchin wrote: > > > > > On Mon, Oct 21, 2024 at 09:34:24PM +0100, Matthew Wilcox wrote: > > > > > > On Mon, Oct 21, 2024 at 05:34:55PM +0000, Roman Gushchin wrote: > > > > > > > Fix it by moving the mlocked flag clearance down to > > > > > > > free_page_prepare(). > > > > > > > > > > > > Urgh, I don't like this new reference to folio in free_pages_prepare(). > > > > > > It feels like a layering violation. I'll think about where else we > > > > > > could put this. > > > > > > > > > > I agree, but it feels like it needs quite some work to do it in a nicer way, > > > > > no way it can be backported to older kernels. As for this fix, I don't > > > > > have better ideas... > > > > > > > > Well, what is KVM doing that causes this page to get mapped to userspace? > > > > Don't tell me to look at the reproducer as it is 403 Forbidden. All I > > > > can tell is that it's freed with vfree(). > > > > > > > > Is it from kvm_dirty_ring_get_page()? That looks like the obvious thing, > > > > but I'd hate to spend a lot of time on it and then discover I was looking > > > > at the wrong thing. > > > > > > One of the pages is vcpu->run, others belong to kvm->coalesced_mmio_ring. > > > > Looking at kvm_vcpu_fault(), it seems like we after mmap'ing the fd > > returned by KVM_CREATE_VCPU we can access one of the following: > > - vcpu->run > > - vcpu->arch.pio_data > > - vcpu->kvm->coalesced_mmio_ring > > - a page returned by kvm_dirty_ring_get_page() > > > > It doesn't seem like any of these are reclaimable, > > Correct, these are all kernel allocated pages that KVM exposes to userspace to > facilitate bidirectional sharing of large chunks of data. > > > why is mlock()'ing them supported to begin with? > > Because no one realized it would be problematic, and KVM would have had to go out > of its way to prevent mlock(). > > > Even if we don't want mlock() to err in this case, shouldn't we just do > > nothing? > > Ideally, yes. > > > I see a lot of checks at the beginning of mlock_fixup() to check > > whether we should operate on the vma, perhaps we should also check for > > these KVM vmas? > > Definitely not. KVM may be doing something unexpected, but the VMA certainly > isn't unique enough to warrant mm/ needing dedicated handling. > > Focusing on KVM is likely a waste of time. There are probably other subsystems > and/or drivers that .mmap() kernel allocated memory in the same way. Odds are > good KVM is just the messenger, because syzkaller knows how to beat on KVM. And > even if there aren't any other existing cases, nothing would prevent them from > coming along in the future. Yeah, I also think so. It seems that bpf/ringbuf.c contains another example. There are likely more. So I think we have either to fix it like proposed or on the mlock side.