From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDD58C369C2 for ; Fri, 25 Apr 2025 17:27:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBB466B000C; Fri, 25 Apr 2025 13:27:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C6EDE6B000D; Fri, 25 Apr 2025 13:27:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0B576B000E; Fri, 25 Apr 2025 13:27:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 913A76B000C for ; Fri, 25 Apr 2025 13:27:00 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id CBE22C02CE for ; Fri, 25 Apr 2025 17:27:01 +0000 (UTC) X-FDA: 83373246642.21.04F82D6 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf08.hostedemail.com (Postfix) with ESMTP id EBC2A16000B for ; Fri, 25 Apr 2025 17:26:59 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DdxJGNYM; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745602020; a=rsa-sha256; cv=none; b=usKt3Au6qPT9A7UrLCzMrvK/i3OWEXozr+/BraBWRDOB9ZDH4bZHJQo7sx4ODsvi86hF5M 2nOjbK0kGI8FimJ29pHX1Gon+U5jeYaapbxHi4I52vejSL3zd2Lk/dnWjRtuMI7+D7wPOn kbi+huEj4HO2zfsTJlD1oQWNvexrPBE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DdxJGNYM; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745602020; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WVMnvFn9n+P1CKhwUPkCgqII5/m+lypiQFfGIbxcLVc=; b=qbdGqUbF1x6Efa1hyZjcuPxbtU4d3omftQPVXwGvw7x1fBpDh1U7ZKpI3Ur1hZEzdiDFnP 3LCNv3dL1kj30gkB2UedT6lfpZvz2FCaJ32ivQU7F6wHCOvnetcFqThYqPoSZ5S+OKyXio +v6aZbhQT6lI2QzWP2Rup7J4lV2gbmI= Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-47e9fea29easo9501cf.1 for ; Fri, 25 Apr 2025 10:26:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745602019; x=1746206819; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WVMnvFn9n+P1CKhwUPkCgqII5/m+lypiQFfGIbxcLVc=; b=DdxJGNYMMvx/lctb4C76AvBkioDy549ObLph1V9wlpJgxMrGzKmLlac0C+sbMQF5j3 m9lctVve2SXOsZZsPW6Zq4uSNNr6JxP+bFj1SAo8iNkJ1tz+3jWMHsukqSfydL4uE3ox UcQBtNPIyT8WIQtuJJL3FLZKFl3aLohg6kqJAYdEWmeoDeeeTbhkS9E3Rk/397GM45hR FkK+Jcp35gXfFzvzhGlOdXkIseKAPE17jy0WKMJQgHsM+N9SIlwWXGgiF320IGYYeXx6 WJv3jwYr7l4lKl2Pjnq797Yvl9JsnX3H/4iEnEL9GaMKDO8j9T61MVv8KnfC3wLFThQ8 kymg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745602019; x=1746206819; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WVMnvFn9n+P1CKhwUPkCgqII5/m+lypiQFfGIbxcLVc=; b=weZ0rAPypE9eWfqE/fCnaF5lcjAJQGlbR7Pdkll99B/MoEF1haqR34EMK+eDk4E3dX WwS0Yz4H4/41EZbECcdrk0QG2xZB6tURSnN+TfR4uNLCV7NSk4SpbPB15do9RYjYky5C LxCxibOngqxdbFQ+jsjlkinMQAgPGOtJ2M7MACvNyay06pw3jwXlTUIBNvFxpsAq70mf bt4F4E6gKrVcOHA8pEFVQrnjvOr8fT1VGtAWl/+pynh3GM/KogHnJtH7pJdx07Gh+V5D wFrMbWHNPRnPbPwe/bQ2fLUMXvwK3eL7vfH0kYuMHrQtqsrKOTzd0e6ldIIan/Yhirfl T9FQ== X-Forwarded-Encrypted: i=1; AJvYcCUb0XEm97490zNd8ezMbTu3lPNuezH05QADfTIajuHmNntUpN77NL/w55yXRW7AlC8qq6+bkZATZA==@kvack.org X-Gm-Message-State: AOJu0YyOfvK0hTzoIPqcougbmJC39rWhT+fcF6BXCDyjsr7WQqkplrX+ 21SZWdoLySUJk5dNnXr8T0dg0vrQ3u+L4col7Ix2j6BQ8gyUT6hfDShQs7IkhGEkLJowsUyx2ar LEVQTXFVqcJbhFuWq8SCVsNXUro9CnJDtOvje X-Gm-Gg: ASbGncvLvLPqNqgGPwEpDXMxUPZK9qWO46Cu300cHKKnu36hyh9719pQubMRzRp5c5l rX4apK7HK86rqSPaQrZ3kwYCsXcpkVE1Uu9Xr4SdweQj+dOFW0lS9SShamJuIYxosPcZVEFvGI0 T7Ee7m207reNx6SpI6vQYU X-Google-Smtp-Source: AGHT+IG2cM1yVyej3lzRTanwt7H6lzpj/LO2wjSzm/XIby/6ONbcRTy0GQn4YFZVRCdWJ666A7314P2+DMcWcIOFy4s= X-Received: by 2002:a05:622a:5a0e:b0:476:d668:fd1c with SMTP id d75a77b69052e-48140acfd95mr30411cf.2.1745602018724; Fri, 25 Apr 2025 10:26:58 -0700 (PDT) MIME-Version: 1.0 References: <0f848d59f3eea3dd0c0cdc3920644222c40cffe6.1745528282.git.lorenzo.stoakes@oracle.com> <51903B43-2BFC-4BA6-9D74-63F79CF890B7@kernel.org> <7212f5f4-f12b-4b94-834f-b392601360a3@lucifer.local> <202504251010.C5CCE66@keescook> In-Reply-To: <202504251010.C5CCE66@keescook> From: Suren Baghdasaryan Date: Fri, 25 Apr 2025 10:26:47 -0700 X-Gm-Features: ATxdqUFI25tBLuuz7impmOSN8LJFpyEkWMIodbUWLISW0FgBbB8BqB9ZvMK9h3U Message-ID: Subject: Re: [PATCH 2/4] mm: perform VMA allocation, freeing, duplication in mm To: Kees Cook Cc: "Liam R. Howlett" , Lorenzo Stoakes , Andrew Morton , Vlastimil Babka , Jann Horn , Pedro Falcato , David Hildenbrand , Alexander Viro , Christian Brauner , Jan Kara , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: EBC2A16000B X-Stat-Signature: btwmzba1w4p1udy3mbpmnydt38npo1dg X-Rspam-User: X-HE-Tag: 1745602019-194931 X-HE-Meta: U2FsdGVkX18xneVz960V4ozDfgIoU0NQUpW1Qe6FZTHZ9/vGz3yhCnpHuynebkb5z1GdaXWEG9JyPFQKk9xUBFt89+NZoQXZzHFYvx2ScK9yJgt0v7t5ohwbUlyeOAZF+rSLMvGQBW31gbkrxU3Bjk/vNG025YYjCVeSI1mpIuuQneMJlQSG/zX79W/pPMsaNPKB27pIjCMXgYpZmnGEX2qMBVnKwBEwa6H6w8A+yLAg4l1zD2h2yUINrqYOz73Iph0mKjTqvi9VNW1kwvRMlWrZjqpPhloQdAPGsZaQo6db159Z9xrx5f+kNh8RIY+YmUfXjU6i/0f+pn8Pcvd1fMiOr453hcdPAJzgmanZ9fthdi/zMwJOtzf3mTEP8UCmIVlIEBhObaEnzxhLTpTG6DCmHoCK1uaSZVWPTMIB7Sofy3K3/RunHM22w1OFZ3em8zizzKn5j0fMA2Z13pqQLWdM2tPBE1n3bc1kx/Zg8kd5NypdlmxBG1ZWM9n5VPV85eMl/hXbLtTaSIitSgX2s3Q6XtFBfRiiG6ScEPnQjL1zOB8NQ3nttljnBe9HUW1Wf1DJ0R6J3eTTdWHJVIuqGyu6PNEC7ecNL1UXYe5rwNlEqDV5XAjZgqbcU1LF6mCzALAmfVsWAdfd+dxaykgBa/k2IcVFEmxqiCXDp/6+a7WMCFk0dVhFLUvXQZ3UDm5borSkkvi3IbT3j1wPqQjGDWOEVaT2EA/QpZnKz6tgEdbaO8P/IfeR1pCax8EtnVCEwP+3fUZAXwwwsyMgml++pq/Y3AZE2aCtysfcSjt1MgYQzLC5FC2sdOQKZ0u67IUStftl0KyQrYyL6/5p/llzPeDiGpCykpMBM1oNaTlpj3q0uWgBZRnwrMs6eMrIvNIGUZl0XhFWGeKxR8onuzEwfPfRGIBGovVxqt4F6/oibnCehbPuJ4IWW7/jiNkrRbr/wrpEItcRgyfSw77JFci Xg/xTJVS +/pI3wMIyX8w26F5Ea9rJEGpQRUDiNTBt+AbyIzmM6Vncl/mXnJBDBZ9HIlPaVwHOUeKJ85c5xwcpk+2OssFTSI04azgRcCSmgMki0MCzqMW0tWtrW1UbgCPcPDHIH1CgtiZZ632oDoCKaDC3TJoqSw4qYloryWkN3qdemHH6i2aVwX4lKwgyZ5bQjv/wadpcWY+z6znH14dE9mTqBHvLjEN9CI+ut/tQJ1eq0/8reyuo7HZc50n1Ip3aMZfWQXihfWNqyv9MqJLMEBpTAparcUp2VerzYwI1fAOqBWcdk9dnrhI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 25, 2025 at 10:12=E2=80=AFAM Kees Cook wrote: > > On Fri, Apr 25, 2025 at 08:32:48AM -0700, Suren Baghdasaryan wrote: > > On Fri, Apr 25, 2025 at 6:55=E2=80=AFAM Liam R. Howlett wrote: > > > > > > * Lorenzo Stoakes [250425 06:40]: > > > > On Thu, Apr 24, 2025 at 08:15:26PM -0700, Kees Cook wrote: > > > > > > > > > > > > > > > On April 24, 2025 2:15:27 PM PDT, Lorenzo Stoakes wrote: > > > > > >+static void vm_area_init_from(const struct vm_area_struct *src, > > > > > >+ struct vm_area_struct *dest) > > > > > >+{ > > > > > >+ dest->vm_mm =3D src->vm_mm; > > > > > >+ dest->vm_ops =3D src->vm_ops; > > > > > >+ dest->vm_start =3D src->vm_start; > > > > > >+ dest->vm_end =3D src->vm_end; > > > > > >+ dest->anon_vma =3D src->anon_vma; > > > > > >+ dest->vm_pgoff =3D src->vm_pgoff; > > > > > >+ dest->vm_file =3D src->vm_file; > > > > > >+ dest->vm_private_data =3D src->vm_private_data; > > > > > >+ vm_flags_init(dest, src->vm_flags); > > > > > >+ memcpy(&dest->vm_page_prot, &src->vm_page_prot, > > > > > >+ sizeof(dest->vm_page_prot)); > > > > > >+ /* > > > > > >+ * src->shared.rb may be modified concurrently when called fr= om > > > > > >+ * dup_mmap(), but the clone will reinitialize it. > > > > > >+ */ > > > > > >+ data_race(memcpy(&dest->shared, &src->shared, sizeof(dest->sh= ared))); > > > > > >+ memcpy(&dest->vm_userfaultfd_ctx, &src->vm_userfaultfd_ctx, > > > > > >+ sizeof(dest->vm_userfaultfd_ctx)); > > > > > >+#ifdef CONFIG_ANON_VMA_NAME > > > > > >+ dest->anon_name =3D src->anon_name; > > > > > >+#endif > > > > > >+#ifdef CONFIG_SWAP > > > > > >+ memcpy(&dest->swap_readahead_info, &src->swap_readahead_info, > > > > > >+ sizeof(dest->swap_readahead_info)); > > > > > >+#endif > > > > > >+#ifdef CONFIG_NUMA > > > > > >+ dest->vm_policy =3D src->vm_policy; > > > > > >+#endif > > > > > >+} > > > > > > > > > > I know you're doing a big cut/paste here, but why in the world is= this function written this way? Why not just: > > > > > > > > > > *dest =3D *src; > > > > > > > > > > And then do any one-off cleanups? > > > > > > > > Yup I find it odd, and error prone to be honest. We'll end up with = uninitialised > > > > state for some fields if we miss them here, seems unwise... > > > > > > > > Presumably for performance? > > > > > > > > This is, as you say, me simply propagating what exists, but I do wo= nder. > > > > > > Two things come to mind: > > > > > > 1. How ctors are done. (v3 of Suren's RCU safe patch series, willy m= ade > > > a comment.. I think) > > > > > > 2. Some race that Vlastimil came up with the copy and the RCU safenes= s. > > > IIRC it had to do with the ordering of the setting of things? > > > > > > Also, looking at it again... > > > > > > How is it safe to do dest->anon_name =3D src->anon_name? Isn't that = ref > > > counted? > > > > dest->anon_name =3D src->anon_name is fine here because right after > > vm_area_init_from() we call dup_anon_vma_name() which will bump up the > > refcount. I don't recall why this is done this way but now looking at > > it I wonder if I could call dup_anon_vma_name() directly instead of > > this assignment. Might be just an overlooked legacy from the time we > > memcpy'd the entire structure. I'll need to double-check. > > Oh, is "dest" accessible to other CPU threads? I hadn't looked and was > assuming this was like process creation where everything gets built in > isolation and then attached to the main process tree. I was thinking > this was similar. Yeah, it's process creation time but this structure is created from a SLAB_TYPESAFE_BY_RCU cache which adds complexity. A newly allocated object from this cache might be still accessible from another thread holding a reference to its earlier incarnation. We need an indication for that other thread to say "this object has been released, so the reference you are holding is pointing to a freed or reallocated/wrong object". vm_refcnt in this case is this indication and we are careful not to override it even temporarily during object initialization. Well, in truth we override it later with 0 but for the other thread that will still mean this object is not what it wants. I suspect you know this already but just in case https://elixir.bootlin.com/linux/v6.14.3/source/include/linux/slab.h#L101 has more detailed explanation of SLAB_TYPESAFE_BY_RCU. > > -- > Kees Cook