From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93BE1D2CE05 for ; Tue, 22 Oct 2024 16:59:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C71836B0082; Tue, 22 Oct 2024 12:59:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C215D6B0083; Tue, 22 Oct 2024 12:59:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0FF86B0085; Tue, 22 Oct 2024 12:59:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8F7406B0082 for ; Tue, 22 Oct 2024 12:59:48 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 25884A031A for ; Tue, 22 Oct 2024 16:59:18 +0000 (UTC) X-FDA: 82701849552.30.A0CCF9F Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf10.hostedemail.com (Postfix) with ESMTP id C880DC001B for ; Tue, 22 Oct 2024 16:59:37 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=A8W+Kt2x; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729616235; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M3403l0++IC5vLibquK9W7dEl1jgdPyJMeNaJ8pe8Uc=; b=LwVJagVgb3R2ZsWo+TH9ZYYlrUhrpVFPVJrqUvgG4j0XbXfTtsssD1krBGT/wXRuHZqwBX IIFHjZFG5FUC/QAVtY5Sn2PfMkXHEJlXN6mzb0uBrfSw/RQTnA0lTND/+fQh04/1fFtEb2 l7XNxBAQFndeyTRAOv09SnNjUFHNnpY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729616235; a=rsa-sha256; cv=none; b=1E2bScvdfHZVFRXQce4Nt6gxAYCnmWVhkc1cIjoxJoGCjVZ9ZRX3qxpxg2SU+K6eiT0Ggc BRjgUcyvjZ6QxAnZJ54HJBDoYHwG1WKbElz6XzlxHuBS35koAo6AbXQKs6DPtDb6daHORO JcGnmpZAoh/y/DbPhMZxG3Vy0x/VSTU= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=A8W+Kt2x; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=M3403l0++IC5vLibquK9W7dEl1jgdPyJMeNaJ8pe8Uc=; b=A8W+Kt2xkOU0oftggZLkqP2b7l 7LLOyP22CczZYEpkuT60nFiydOm4HBeusR+5hwVnnBNvlXOoJgb29xJ6XIQ3VaBRbQLwa6+TZEe8o o0oMCLPO1LpwlF6aGur6JlQ9DbRi2zMpa23n3SgxchjVzp9329qt1JA2l66KLQMfjIf46RJXJu/+N EsUo9bCl1ICOHlH9CkZMGNDSTSZPJ3yPXql/W8pgVxTSmVLAXBTx3FkIruD1ltX2JPcs13CkVvzel 0xlsMWn75NcsG9vpRTMij6SPA1Zw2Ng+4YDJDEO0RBReAM2Y0w/B2gJ4NJ/b42jA0Zlt7hSdEqRur W4FLD3FQ==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1t3IEP-00000001pLb-1W8R; Tue, 22 Oct 2024 16:59:41 +0000 Date: Tue, 22 Oct 2024 17:59:41 +0100 From: Matthew Wilcox To: Sean Christopherson Cc: Yosry Ahmed , Roman Gushchin , Andrew Morton , linux-mm@kvack.org, Vlastimil Babka , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Hugh Dickins , kvm@vger.kernel.org, Paolo Bonzini Subject: Re: [PATCH v2] mm: page_alloc: move mlocked flag clearance into free_pages_prepare() Message-ID: References: <20241021173455.2691973-1-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C880DC001B X-Stat-Signature: kdemqsf3g4736owo418e9onuo95ezz76 X-HE-Tag: 1729616377-872931 X-HE-Meta: U2FsdGVkX18kIUGg6c3gBt6/LuuVS7cM0LlbGYJsw+E6D7el6n8DTpRfniM6NWv/5+dGuwKj07uAkOl3n+YZiZR9+FiI3BcLUDNQdUB0u9WCIFclCtW9mYgt6sJDhvSulAFXbSMyyMf12/0YpKn1g7y+ym6TrFvKka9JYM9Xh6kXyvdmOu+N/kyCg5BRM2NzXNv4yP6kO7SQkYv539RRKSTpenozphFOe/dLoG7dyxs4igpQH4pQZHS66+33Q9mzQhi2kBkQ7Fb+N2xemap9ueVj6/UcpeDLUBG/OEClyeWBJbjaaJqyg98rWCrdQsN9HXPNw52YQFfjDDfY4+CSeSmIPOFSSGI/J53HlQnujW0laPVmCDhMiLTgp1Xdj78chq+ThTHcsUrnv3MsEig9ddD6oap5UMvkuaBVqn0aMsbCCLm8WTuj/jtwzjvpWfwHrX9KIPUBd0TmJysYG2UxbPyykJfciKb8nTXR7vSbaA5uiS+mZN4pUjpytquMM5m7hGfUXZnYbKrxRYt85PtGu9SK/5fVWX/L22pXq7Flstpbb6H4WF6nG1WyD2EIIe79XmZg7FlQy0INE/ezzzfUpRVpQbRGXQAYzrIKTXSS1iO7etG9QiMbiQgLhcfSFzUr7LPjTIJLuqWQajdeyeARuSvRUrlr2WWyYIpRerumruUcLKZQWfgHt4s5SWZWk5KgC4rCReMHDR/c+i0/kZCetfTLzrqSMJebudoU2X7tnIbbv0atu0iVy7EzKfh4GYsKxnVbnFn3QvcVlwYFHufUC6p8zkpnP8uLQ+KCtq9SvY8kt6gFDs6F+gw5ahw10RAvNTnif8qnVXIIuM3+CHgnv3cJ5mU18SOPp05isUQnzbm+U1qzWDVQgbbew4hwrCZHX7IuFJTdeixyV6TUwbPLgP3pDeAzejI7nnFrERobTzTEKhyF1UR8HxBXGFn4Jl9veVLGUu+WaoTc6cJQwWq 3kImYISW P8c+ankcwZK28oRHc8+im8RwrYb0qrC1rSltliSj7U5dXTdLSRCVj3uub/X84X9cZRMyzsyNM13fTSeECylg/h0K9znD9s8cLrNnBICruvlMFIyprtwIYTqXGjIwX2tf+W/U6L51l0hKiRFTsGeweNaZmGHwqUbH/Ib5aZJpamVfYA51lLFb2YWmIYF9kP7WFn39ZCQisvkJQSRzza503EyR1Hdkom+OjTC/JVkEdxdQFBOa8rYJilpWUS58uRtaRb/HPXuJiEC3X1yHvuNsvGvjEZw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 22, 2024 at 08:39:34AM -0700, Sean Christopherson wrote: > On Tue, Oct 22, 2024, Yosry Ahmed wrote: > > Even if we don't want mlock() to err in this case, shouldn't we just do > > nothing? > > Ideally, yes. Agreed. There's no sense in having this count against the NR_MLOCK stats, for example. > > I see a lot of checks at the beginning of mlock_fixup() to check > > whether we should operate on the vma, perhaps we should also check for > > these KVM vmas? > > Definitely not. KVM may be doing something unexpected, but the VMA certainly > isn't unique enough to warrant mm/ needing dedicated handling. > > Focusing on KVM is likely a waste of time. There are probably other subsystems > and/or drivers that .mmap() kernel allocated memory in the same way. Odds are > good KVM is just the messenger, because syzkaller knows how to beat on KVM. And > even if there aren't any other existing cases, nothing would prevent them from > coming along in the future. They all need to be fixed. How to do that is not an answer I have at this point. Ideally we can fix them without changing them all immediately (but they will all need to be fixed eventually because pages will no longer have a refcount and so get_page() will need to go away ...) > > Trying to or maybe set VM_SPECIAL in kvm_vcpu_mmap()? I am not > > sure tbh, but this doesn't seem right. > > Agreed. VM_DONTEXPAND is the only VM_SPECIAL flag that is remotely appropriate, > but setting VM_DONTEXPAND could theoretically break userspace, and other than > preventing mlock(), there is no reason why the VMA can't be expanded. I doubt > any userspace VMM is actually remapping and expanding a vCPU mapping, but trying > to fudge around this outside of core mm/ feels kludgy and has the potential to > turn into a game of whack-a-mole. Actually, VM_PFNMAP is probably ideal. We're not really mapping pages here (I mean, they are pages, but they're not filesystem pages or anonymous pages ... there's no rmap to them). We're mapping blobs of memory whose refcount is controlled by the vma that maps them. We don't particularly want to be able to splice() this memory, or do RDMA to it. We probably do want gdb to be able to read it (... yes?) which might be a complication with a PFNMAP VMA. We've given a lot of flexibility to device drivers about how they implement mmap() and I think that's now getting in the way of some important improvements. I want to see a simpler way of providing the same functionality, and I'm not quite there yet.