From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 147C2D2C56A for ; Tue, 22 Oct 2024 15:39:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 551E06B0098; Tue, 22 Oct 2024 11:39:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5021E6B009A; Tue, 22 Oct 2024 11:39:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C9556B009D; Tue, 22 Oct 2024 11:39:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 204C36B0098 for ; Tue, 22 Oct 2024 11:39:40 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E4421C022D for ; Tue, 22 Oct 2024 15:39:21 +0000 (UTC) X-FDA: 82701647280.07.CF905B1 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf13.hostedemail.com (Postfix) with ESMTP id 135F320011 for ; Tue, 22 Oct 2024 15:39:20 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=xNFO2qjY; spf=pass (imf13.hostedemail.com: domain of 3OMcXZwYKCGUVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3OMcXZwYKCGUVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729611427; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h3y119MD9wmKtbCuck0m1HoKm8DFO9OxY9WfoV49VvA=; b=mT6/AbyP/urxOeBGL/u1IoeGytomFaOsc1NrzN2/YqpgJvR5N9F24z4eUT0JgP06PUxwUc 1h0yzogzdUZxrTXjXG8NZHFYPvoSdOpb5zBfbncL+gOods3kHIT3ECos7/a98bPuaNhSWj Iq0PI+dJiQc6kiDMgLXMSVX8XJaujbc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729611427; a=rsa-sha256; cv=none; b=CMKOmhgItTsi5VWC1kDPJE1j2dvxBWUWa6FkuLMsYkS2mgMgZzZaBh0z7WEmZwIDQaC4nR H2q7DB5FXrdNfqJcAz/f2xaRIrIKkVNd3P2YDiv/zDTzpFEKrfUyv42UdkGmMjUKZ3b5Y8 YROdvrTgrGpr41zGU8DGFnK7pItHZqY= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=xNFO2qjY; spf=pass (imf13.hostedemail.com: domain of 3OMcXZwYKCGUVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3OMcXZwYKCGUVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7e9b2d75d6dso4783376a12.1 for ; Tue, 22 Oct 2024 08:39:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729611576; x=1730216376; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=h3y119MD9wmKtbCuck0m1HoKm8DFO9OxY9WfoV49VvA=; b=xNFO2qjYgCCECWHfWtnmOiK81T1RFxJE9UohOlZMGIVoDesCGTX7T3mwBoKxzeQvy7 u5TxhZMH2lxj5i8LV6Wf9bWhXF2oSzzr8yv3uf7FAPv7yZnHgXokivXULNn0OOwGi3Ye RH49Q20wXtKNEAD3X1J2UTBcmxqJPigq+LIPYtwEJ4l4Sw8AEeW6201FWX4H5MbsdCyd 7zm+9sB0hFnWvN9zyXVd1xMZDIrHxQZJFipy4XO1dyZCJ5dnb7MkcHadei1Kom+8dyZo NohMUpbzTdvAEOmamxUf2Yb1FVFC8U6oYItZgUoVlQ8ZV+1nA8WuSE9MrKR7/Fsj6aXK ThFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729611576; x=1730216376; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=h3y119MD9wmKtbCuck0m1HoKm8DFO9OxY9WfoV49VvA=; b=cnlR8xc80rWrCBBGCYnavx+XgaE4H0a0C2kAdbqUKh399kyVANB0JwkaCt4A37p9i8 D1EABooDXMLHwLgZwZn1gr9vLFddreCrIpGC442Cwak+YlPKRK/nma7DDR4Wp2gk4wOW ZUEB/dsVaQXiTNigWJa3GW80bbZTYV1YojR8LgVADxxk7tUcRsKJNHHmTmCWDWRcHIAG GGJHVZ9AcUTGX6iRnJMfVbCVqsHYRxbpLmu5F0QoC0pxzcryU2fG6Ix43itneyWCg38O 8JVw1jTumJtCWGLf59FcoJdlwFZoJFMdasigVtuX65570lVUhe4MjZNH3W3n1IuDLPrJ M5yw== X-Forwarded-Encrypted: i=1; AJvYcCX4uJeUL5JY7ELg5TsdxGslG8XzI7DoqNjEcNgEL79jWZD8O/L+r8zBb3ItEZ6ZMxHiyNvLD0/Jsw==@kvack.org X-Gm-Message-State: AOJu0YxV2XlTx2c/QSd73voJpFklzv08bCZyrkrq6n4rraKkEb3mBO6z kVgBL0DZcSu/I4PwIuQDXRCIE8n3a6G8xC5PprMcf7e4xjwcqozbq7x2b/LGjW3flpmR+/A2hDh lug== X-Google-Smtp-Source: AGHT+IGC9vYjja2RJe4yfi4oSZXSGryKcMpLP6CLXouCF0l5UeX3c76IyljH5E7eZYNj8rn5Bmx44Cfiq2k= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a63:f545:0:b0:7ea:67a0:9651 with SMTP id 41be03b00d2f7-7eacc6e6631mr17501a12.3.1729611576172; Tue, 22 Oct 2024 08:39:36 -0700 (PDT) Date: Tue, 22 Oct 2024 08:39:34 -0700 In-Reply-To: Mime-Version: 1.0 References: <20241021173455.2691973-1-roman.gushchin@linux.dev> Message-ID: Subject: Re: [PATCH v2] mm: page_alloc: move mlocked flag clearance into free_pages_prepare() From: Sean Christopherson To: Yosry Ahmed Cc: Roman Gushchin , Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, Vlastimil Babka , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Hugh Dickins , kvm@vger.kernel.org, Paolo Bonzini Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 135F320011 X-Stat-Signature: 9r8u8z85r1moeh6y78q6o7og7y9xep98 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1729611560-616253 X-HE-Meta: U2FsdGVkX19xQtOuHXVg8DohdKJ7ZQljERtVP4dcGspIoiWMPH8HKmsUV+nK3pfuAbv4ISRgg6uyzsJPfJRg60ge+YAKIW5Q/LszAc5wtx0W//lRI9eIzbnN/vNqVfYAyAEO1s4ckFebi8jGmvmxQWvLLFawo2rd+S8gLGi42coAQBwG40hHTYKV0vmPkgWOQSKv6ewF980w0iLhxcEtjc1CbGx/aNaWawuz+KwKcOya0UnwbgJ1bV5GUimGfTQ7CJAYryWhJ0fexIz4TWy177FYf81IDS9RnoJYcoFohbnyg1lbbh/+HrTr4vGNDl9uG0UA2/HDL2h92BhHhd2BUsUdDQccdDlpkPpvUh1gm64gX2s7IsulU3zUbd5pqHf+njLC1Mx2AbjE6V6ZmKD0caAuAVUjWXcV0NEMmBkt/OhRzEKxz0qWXZhWMMytvYdKhrTdiJiG8hXNkVTbkW0flEUpg0KtMGomk+UDTXR5fHkOdUlFPR0vOf4Bu7mciAWdRpBVgkrSnTCRRuD+TnYLzQw4qEONzMRrWVP+0ZfZLP9g6YzxmdvVkx9sjXvRmZS32IfoYXv/qMvu0f4IMCaYVeh+vpXL2ytC+z9RdILhDpwnPTWPyP33ZSfGBvu4bmzGQavQhCGw7euTOnWIuw64C0bqyOCOxsHDoB/EVNPQuoyPaTfcyapDCGO+roZO0jauLWTK+IWDhumz9Nsn9QdMWL4yk+ATKTEkNu8KIOccinEbRZ+dtNkUeMz60tAnHH0Cb4Vmt46bJUQqRD5Mfje8uT+t8EmWAGcdXj64tQg/My0bI8KSvmFxgNUs+tFIzR8zWoc/c9F6eoCV5y+lIhUZwIRuxAdmlIwpALrgBdyVQj+2gvoQd2758wPEOaqAIsvVr6kVwdA/zJvw1TE2BwwFS0EBpRb1psxs7+5Acz9iGAKjQ7tUDPoRnVeRSTatItL5+iEpvzY2Lg6DvN2nLyV Lj42XacE c7hCVi7h0JXiDazuavKrzhW1RnkyCdxUpybZhjroreyyvOooWAOGSCHB6PZFgfYlyOsxB4XWBZPRGy6SEOeE1IGdzHljGcn6UsoN/lXkbw8KoIA62F/0UTvSYkw9s4zFLsggcEzHzrYhhWDHjMlWJoJc9sPB43Vqg7HRdSH+lEX+9fGo3Ajua8gAY4ILoRDDGpKj2cZQtv17pTxA/VBxC3S3Jr2Rv8o1HbLOAqwAKc2q0Tw3knyOoTNKe5saoLpwPk5OfRFVGI4Kffq66QDG1/Pgff7pfUhCLtQsSbEb8PXa5VkLiz3ouh6FV9sAFw0Bq7VpcqKSJ64ooOOUag4jV4DyLDCVJ2UHaIxwWBu0MeLwpi6hcvgPqyMdPYJC3ZPjtq4BXvBksETFLYv+qS9fWxLu4//ukPJhWcoQO57LUyV/SuxQi8dGM6HIwm5mmPGYdBVBgTnAIkObSnt+QIeacDS1qTHxgYUrGhnekgxjKzHrf/k5ndS3uOxceJuUuLqetEjTpi9iNHcPF0noN/enoYvBuM1JbJ6Fz8GmTCaPR1pWuKAVXSyIfNvFunHwJhZB5HSpYK5UJfFSvcvMyXV5fAicUJTdsJ8szCiiecivcqMQx8P5AtQno3kHAujklHoXlNWPHHS3lx4PK20I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 22, 2024, Yosry Ahmed wrote: > On Mon, Oct 21, 2024 at 9:33=E2=80=AFPM Roman Gushchin wrote: > > > > On Tue, Oct 22, 2024 at 04:47:19AM +0100, Matthew Wilcox wrote: > > > On Tue, Oct 22, 2024 at 02:14:39AM +0000, Roman Gushchin wrote: > > > > On Mon, Oct 21, 2024 at 09:34:24PM +0100, Matthew Wilcox wrote: > > > > > On Mon, Oct 21, 2024 at 05:34:55PM +0000, Roman Gushchin wrote: > > > > > > Fix it by moving the mlocked flag clearance down to > > > > > > free_page_prepare(). > > > > > > > > > > Urgh, I don't like this new reference to folio in free_pages_prep= are(). > > > > > It feels like a layering violation. I'll think about where else = we > > > > > could put this. > > > > > > > > I agree, but it feels like it needs quite some work to do it in a n= icer way, > > > > no way it can be backported to older kernels. As for this fix, I do= n't > > > > have better ideas... > > > > > > Well, what is KVM doing that causes this page to get mapped to usersp= ace? > > > Don't tell me to look at the reproducer as it is 403 Forbidden. All = I > > > can tell is that it's freed with vfree(). > > > > > > Is it from kvm_dirty_ring_get_page()? That looks like the obvious th= ing, > > > but I'd hate to spend a lot of time on it and then discover I was loo= king > > > at the wrong thing. > > > > One of the pages is vcpu->run, others belong to kvm->coalesced_mmio_rin= g. >=20 > Looking at kvm_vcpu_fault(), it seems like we after mmap'ing the fd > returned by KVM_CREATE_VCPU we can access one of the following: > - vcpu->run > - vcpu->arch.pio_data > - vcpu->kvm->coalesced_mmio_ring > - a page returned by kvm_dirty_ring_get_page() >=20 > It doesn't seem like any of these are reclaimable, Correct, these are all kernel allocated pages that KVM exposes to userspace= to facilitate bidirectional sharing of large chunks of data. > why is mlock()'ing them supported to begin with? Because no one realized it would be problematic, and KVM would have had to = go out of its way to prevent mlock(). > Even if we don't want mlock() to err in this case, shouldn't we just do > nothing? Ideally, yes. > I see a lot of checks at the beginning of mlock_fixup() to check > whether we should operate on the vma, perhaps we should also check for > these KVM vmas? Definitely not. KVM may be doing something unexpected, but the VMA certain= ly isn't unique enough to warrant mm/ needing dedicated handling. Focusing on KVM is likely a waste of time. There are probably other subsys= tems and/or drivers that .mmap() kernel allocated memory in the same way. Odds = are good KVM is just the messenger, because syzkaller knows how to beat on KVM.= And even if there aren't any other existing cases, nothing would prevent them f= rom coming along in the future. > Trying to or maybe set VM_SPECIAL in kvm_vcpu_mmap()? I am not > sure tbh, but this doesn't seem right. Agreed. VM_DONTEXPAND is the only VM_SPECIAL flag that is remotely appropr= iate, but setting VM_DONTEXPAND could theoretically break userspace, and other th= an preventing mlock(), there is no reason why the VMA can't be expanded. I do= ubt any userspace VMM is actually remapping and expanding a vCPU mapping, but t= rying to fudge around this outside of core mm/ feels kludgy and has the potential= to turn into a game of whack-a-mole. > FWIW, I think moving the mlock clearing from __page_cache_release () > to free_pages_prepare() (or another common function in the page > freeing path) may be the right thing to do in its own right. I am just > wondering why we are not questioning the mlock() on the KVM vCPU > mapping to begin with. >=20 > Is there a use case for this that I am missing? Not that I know of, I suspect mlock() is allowed simply because it's allowe= d by default.