From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17B58D1CDB3 for ; Tue, 22 Oct 2024 08:27:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D8E96B007B; Tue, 22 Oct 2024 04:27:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 788646B0082; Tue, 22 Oct 2024 04:27:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64FF06B0083; Tue, 22 Oct 2024 04:27:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4809B6B007B for ; Tue, 22 Oct 2024 04:27:02 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9D1FD160F2D for ; Tue, 22 Oct 2024 08:26:43 +0000 (UTC) X-FDA: 82700557590.11.7E2BA38 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf11.hostedemail.com (Postfix) with ESMTP id 1FEB04001D for ; Tue, 22 Oct 2024 08:26:40 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c4CdQwoM; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729585544; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n0GiliwIaLCyGRQWXYx66I75LIQA84ct+U2DkcDdwKs=; b=CS81IGLjqj9D2Xhvr/50TXdp4bXNi6yV6eAihra4dRab+76ZB3Oo80NdIez9JozbWuQSMN cTHNaxGHprD1qzD/d1qHQ0VbSiwP1MVz2yaFqQ+96fCyX5ynT9pzChrSj9hjkIMuacQU9k AHBxyi1E43DwCwNDIvH3X1W41EWDp8A= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c4CdQwoM; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729585544; a=rsa-sha256; cv=none; b=T5uRPuLecqIClEApbuuz8RobeDkitCMxrejMDulWtHADraLof/owscH3RdBI3UEOvplriA cZ3MGDWFPcPI/d5ncXnSlWe1E8pAR6pvw8ad78r4MU6LWtWq+WOnXSlNRca+HFNgCfQ7F8 2oJ0KWfit+bBsLj0tDMUlEPU1qjeiQI= Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-539e6c754bdso5085244e87.2 for ; Tue, 22 Oct 2024 01:26:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729585618; x=1730190418; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=n0GiliwIaLCyGRQWXYx66I75LIQA84ct+U2DkcDdwKs=; b=c4CdQwoM/IUFVY6MJsypwBs1qqwWJjC6VTdPWl45dWzsiVeeP3JaB5jpUsl2NcB9kT wWb9F6mVz0+iqwNGqWU+QDgyn8V2AAv56ZaDteAJo04BAgcsufy2QavdU8VLvZdkrWeG WQVBR2ETDcuTJ+OZGLcojGC3wrkse+ybWLKvbN624ERbMZEIH/qx3rPj0wFY2LXwIbKv jp+lzcuWv5L1+iqMhNEMj7SZcBJNQJ6a6KVA9M/T1o6eIFMjEkcUf5yuhYPIZkK8skeI N7P3VnwX2J46QudgxUh7kd0ElvSb58LqTjhcnPKB98iTBLt+lWZBYleEHNSbrBTh+kyB S6Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729585618; x=1730190418; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n0GiliwIaLCyGRQWXYx66I75LIQA84ct+U2DkcDdwKs=; b=mMQtDpS3YVloWU3RCev9CY9javeGkOo25Hk/Gv3ixf+BeoFTvJnuhgU7oPtFfr1Xbb dDCdhIEAbolgftoh/RNVPvzNALRLp7+CfMJDf3M0PaBykF4rLSVykpMd54skbOmPPyoc bjfzvpzG8FoOipKxopaqUJXAZJLAJCVt7idN6LQwPc+DBeNd/g3illRQizPdPcPCE0sm W5GwZdO4eUpXgZYo7SwCG/wi2ginsCWdWN9jJBYoIuaU7Mj+nlsQsbtWN1TcMYerPGu9 xOxIwxhmVHWR2eNnRPhSUfY7sNhc0LbUGSiTW8DYt0AlZXufUKe8KbwOWcKwP2qDtyds lRew== X-Forwarded-Encrypted: i=1; AJvYcCW7VE0Mhw/iX0nXqQom5XU/OVSCtZiUp1Iy5b6vNM1OmLMyge8LWV9M48TQnBojdpvndsrxESVxVA==@kvack.org X-Gm-Message-State: AOJu0YwOohJvDjQortE6OpHv5x/F61lCQVHc7XEo1C13Veh9C2uG1ud0 jl9bwG0MYRAyZVWTHVNY0Unq3cgbqssRzJrYduTLh4/g+I4XVpNRCQjnLLjYOg3p2+TcoyUt3S8 RWez+xVNNE/gLue5Dt3m/+T/mwQqxEcdyJv7l X-Google-Smtp-Source: AGHT+IEQucMr9xh3AiasaWZklpACkjyVewVAf4ZNFK0lBETmdd0R2zeParDjP7G/u52mxSejv4EmSK84cRfEwEw36kY= X-Received: by 2002:a05:6512:3f04:b0:539:ee04:2321 with SMTP id 2adb3069b0e04-53b139f1c95mr847888e87.33.1729585618041; Tue, 22 Oct 2024 01:26:58 -0700 (PDT) MIME-Version: 1.0 References: <20241021173455.2691973-1-roman.gushchin@linux.dev> In-Reply-To: From: Yosry Ahmed Date: Tue, 22 Oct 2024 01:26:20 -0700 Message-ID: Subject: Re: [PATCH v2] mm: page_alloc: move mlocked flag clearance into free_pages_prepare() To: Roman Gushchin Cc: Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, Vlastimil Babka , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Hugh Dickins , kvm@vger.kernel.org, Sean Christopherson , Paolo Bonzini Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 1FEB04001D X-Stat-Signature: 4eb1gipe6zui6dpfo1yu54tjdm15byan X-HE-Tag: 1729585600-840531 X-HE-Meta: U2FsdGVkX1/80q/5s7OiTX/WrpjrE93DVLXVRqu7uOowG4TTMHnvDrNXaMUgzkCByuJABr+nJnMG7JxLRcrUpT5qP1bKDgJP/tYrnjRRbxt2rkpz++1Aih7ZwUEGyWVc6mFlFzzxGFvPG245YCg8Qmx9tpCEVhXANssnwEH1NsrPUAdrv+vk6GtC9ae+bc/EbhopKR1CQEVzaBZFyPBENfxaKvrJQZ+PhM4v+nW0KtbXOC/rpdzAlhYiRyohj3vlZovO2iQavEeidteCDSsU4mHORKZDiHfitOyx/ATPtiqwasjclSAKX4aXhWVkqECfTBz9m5dC/Xpla+PocqkugP33fU+BaZGhXIkmdEEpRLI8icniuF4NlyGvPV/3XzY2nmRI4JRiULOPC32NSro0JpLcW1s1HRWY9IkSdLoSVIuZuLpatWIvpt8tCn70fQ1BBbhzY+fQuV2cA4w5BDIPCc8XqH41QjIPCFlSUBD0FODcRTcLJOHLh23d74OFn/t6K3vGQjUnB1vVwwSqOic9viGQHJ44GHXHeQoICYQYTvtSXGkqtprBa05C5npZ9a0QMhFcSogMGoPu+CijzVcVsnHQ0Ty3aZx6wHvGgKP99m+aoxbsGsNhlExP1QFGGSX4rzYw3LLhmI7pqp5lKVdj3FCgwf7uL0tsH6qHtYbo++KKrmYBxlQEWe+dO/ggZEdLeoJ6XCHrLbjsP32PjPwS0PA7DHuQZCYimToELrCi5eEHVu8xldwekR2wiRBwSONwpYeS4kzVU8oUx9+72+vBqEi3XvXRMmYwe6chQJ5xTd3x9b+qtJdKyRnJlLC1OIdzYETbh8fYpVh0YMoFbkwkqpOWtp91tDK/6M9ctiQ89Os+WIUVxVH8bif4qTyAr0TCjChc22b42gUk+jiwZN15k0hDKrN1qL2uNFhx2w6CNVPMQP27OnpWLfmSDa3y8I7fnyuZX5CUthzK7Fu5P70 OqcUtgYI EPym2cHif6zFM9ILyF+WGQLJvANaKfemsaqZyxILgYxCRa0t/iJ/BcHYH8ft7WOq56k4p5CQn29AEJWA8CO7CuLhQmuA11UqS95GafolQPxXpeRlQQzKaZDASPzCkYQ2YOTjSKEN5URisyQnR83iQJc0jRFqH/3IcoqF0fWiAKU9GDHhn7psALa8b5fC+CIewKSxeoNgFhwS4RbXuK09nsSljTLUvZf71FsuzCYw6IueNai6u5a6Gogwlmw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 21, 2024 at 9:33=E2=80=AFPM Roman Gushchin wrote: > > On Tue, Oct 22, 2024 at 04:47:19AM +0100, Matthew Wilcox wrote: > > On Tue, Oct 22, 2024 at 02:14:39AM +0000, Roman Gushchin wrote: > > > On Mon, Oct 21, 2024 at 09:34:24PM +0100, Matthew Wilcox wrote: > > > > On Mon, Oct 21, 2024 at 05:34:55PM +0000, Roman Gushchin wrote: > > > > > Fix it by moving the mlocked flag clearance down to > > > > > free_page_prepare(). > > > > > > > > Urgh, I don't like this new reference to folio in free_pages_prepar= e(). > > > > It feels like a layering violation. I'll think about where else we > > > > could put this. > > > > > > I agree, but it feels like it needs quite some work to do it in a nic= er way, > > > no way it can be backported to older kernels. As for this fix, I don'= t > > > have better ideas... > > > > Well, what is KVM doing that causes this page to get mapped to userspac= e? > > Don't tell me to look at the reproducer as it is 403 Forbidden. All I > > can tell is that it's freed with vfree(). > > > > Is it from kvm_dirty_ring_get_page()? That looks like the obvious thin= g, > > but I'd hate to spend a lot of time on it and then discover I was looki= ng > > at the wrong thing. > > One of the pages is vcpu->run, others belong to kvm->coalesced_mmio_ring. Looking at kvm_vcpu_fault(), it seems like we after mmap'ing the fd returned by KVM_CREATE_VCPU we can access one of the following: - vcpu->run - vcpu->arch.pio_data - vcpu->kvm->coalesced_mmio_ring - a page returned by kvm_dirty_ring_get_page() It doesn't seem like any of these are reclaimable, why is mlock()'ing them supported to begin with? Even if we don't want mlock() to err in this case, shouldn't we just do nothing? I see a lot of checks at the beginning of mlock_fixup() to check whether we should operate on the vma, perhaps we should also check for these KVM vmas? or maybe set VM_SPECIAL in kvm_vcpu_mmap()? I am not sure tbh, but this doesn't seem right. FWIW, I think moving the mlock clearing from __page_cache_release () to free_pages_prepare() (or another common function in the page freeing path) may be the right thing to do in its own right. I am just wondering why we are not questioning the mlock() on the KVM vCPU mapping to begin with. Is there a use case for this that I am missing? > > Here is the reproducer: > > #define _GNU_SOURCE > > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > > #ifndef __NR_mlock2 > #define __NR_mlock2 325 > #endif > > uint64_t r[3] =3D {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffff= ffff}; > > #ifndef KVM_CREATE_VM > #define KVM_CREATE_VM 0xae01 > #endif > > #ifndef KVM_CREATE_VCPU > #define KVM_CREATE_VCPU 0xae41 > #endif > > int main(void) > { > syscall(__NR_mmap, /*addr=3D*/0x1ffff000ul, /*len=3D*/0x1000ul, /*prot= =3D*/0ul, > /*flags=3DMAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=3D= */-1, > /*offset=3D*/0ul); > syscall(__NR_mmap, /*addr=3D*/0x20000000ul, /*len=3D*/0x1000000ul, > /*prot=3DPROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul, > /*flags=3DMAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=3D= */-1, > /*offset=3D*/0ul); > syscall(__NR_mmap, /*addr=3D*/0x21000000ul, /*len=3D*/0x1000ul, /*prot= =3D*/0ul, > /*flags=3DMAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=3D= */-1, > /*offset=3D*/0ul); > intptr_t res =3D syscall(__NR_openat, /*fd=3D*/0xffffff9c, /*file=3D*/"= /dev/kvm", > /*flags=3D*/0, /*mode=3D*/0); > if (res !=3D -1) > r[0] =3D res; > res =3D syscall(__NR_ioctl, /*fd=3D*/r[0], /*cmd=3D*/KVM_CREATE_VM, /*t= ype=3D*/0ul); > if (res !=3D -1) > r[1] =3D res; > res =3D syscall(__NR_ioctl, /*fd=3D*/r[1], /*cmd=3D*/KVM_CREATE_VCPU, /= *id=3D*/0ul); > if (res !=3D -1) > r[2] =3D res; > syscall(__NR_mmap, /*addr=3D*/0x20000000ul, /*len=3D*/0xb36000ul, > /*prot=3DPROT_SEM|PROT_WRITE|PROT_READ|PROT_EXEC*/ 0xful, > /*flags=3DMAP_FIXED|MAP_SHARED*/ 0x11ul, /*fd=3D*/r[2], /*offse= t=3D*/0ul); > syscall(__NR_mlock2, /*addr=3D*/0x20000000ul, /*size=3D*/0x400000ul, > /*flags=3D*/0ul); > syscall(__NR_mremap, /*addr=3D*/0x200ab000ul, /*len=3D*/0x1000ul, > /*newlen=3D*/0x1000ul, > /*flags=3DMREMAP_DONTUNMAP|MREMAP_FIXED|MREMAP_MAYMOVE*/ 7ul, > /*newaddr=3D*/0x20ffc000ul); > return 0; > } >