From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FFBEE7717D for ; Fri, 13 Dec 2024 04:49:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8EC56B0083; Thu, 12 Dec 2024 23:49:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D173F6B0085; Thu, 12 Dec 2024 23:49:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B90C26B0088; Thu, 12 Dec 2024 23:49:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 961526B0083 for ; Thu, 12 Dec 2024 23:49:07 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1CCF712056A for ; Fri, 13 Dec 2024 04:49:07 +0000 (UTC) X-FDA: 82888705326.25.E70FA25 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf12.hostedemail.com (Postfix) with ESMTP id 42D154001A for ; Fri, 13 Dec 2024 04:48:54 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0tsabwUS; spf=pass (imf12.hostedemail.com: domain of surenb@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734065328; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SyKXtJmevA9J1gjLkDCWf2rvJGPBPi1c5QO3oGVTNNs=; b=ipz0Bil2xgv+tFCVS5HuSCr1rx2rCN1NHFMW5OP4s2EabpRbfmWAIQoFRDz+X917GqUDEo 73eMyfZ+naOx4mLIsg6OO8Fqp2vZ9YD46f3vXv9cSwnd2Jr5YS7MuBnlcWx3rImQCQe4RN TG6t1sHg4aD0WC0BeWCTySZANNJcg3I= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0tsabwUS; spf=pass (imf12.hostedemail.com: domain of surenb@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734065328; a=rsa-sha256; cv=none; b=L5nfHm26rKcJB7yw6m0E+DhQLO8nNS+QIof5u6JrU+MXYOHJX2IWZWPXjW4/Pva5d6aO0U Ou+fPZtHGXUo+ZO1O7KCd5EyY+SDVW/Vkea1ms7VlJpepbBdQIM8UEFGzKzFo4moo/p0dV YDdSqUYZPTiog5q87dcJN/2a37fp4C8= Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-4679b5c66d0so100451cf.1 for ; Thu, 12 Dec 2024 20:49:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734065344; x=1734670144; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SyKXtJmevA9J1gjLkDCWf2rvJGPBPi1c5QO3oGVTNNs=; b=0tsabwUSFOs01zTN2Nm853DJDk25j54GKIoBkV2YlJRyzlEsS5oyNygAXtgWJ8oAeM mVEdJcx3Q8UW26a4MjoAWv/+Y4pjvEm7wYdPcXqGMnt6EuXJZjR7vZv0jk8zilJGiKFU /n9x7rE0Lb+Stvymjyg1oEU0ITF3ONdY491hpKH3FN4ABkmfmEtEjEFDxnJI1CPvZQqt k5W0MU38b/mCRlHvzyl6sOLcRKg0ng3tuTz+kC4HfblVKKH9V5v/Q/CuGTlfmhEcpKCw Szq8C4hnuyhCRxyT6uHf/8hS2/MeGrCsun1MpvcQRauOrex77uo1sHFS57u6Khk57Y7U NLcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734065344; x=1734670144; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SyKXtJmevA9J1gjLkDCWf2rvJGPBPi1c5QO3oGVTNNs=; b=r5pzwoHZizDNSQYdtABJ4chwPswhQ3jsfmBLwU8wQ9j8dMuHT9s7kxHvdWBVobRznN Cy9YrAtZvQ+gW/nk8e4Fjd7WvAsfebKqv8J8AcSflMqhoghvuY2bjQ+ZQ4rAns5Wrcxn Lw6w0g09bbaPA/Ckr6qqR8Dk+rujGP4vlFXg1u07q9xDBSKuoLFRcYJ//Tn8LMO+xQBc S0JiLdAUG7zo+zh+h4BPH2VAHjWxuL8etpXkaBw5wePusyFWNUl32YT0PjVEs51de2X/ L099hZ/czrcdu06Tra+8WtVlOglvUw8TXzbNEVRsl9cBM9xOu3V082+QsDnQFqX263m3 bTJQ== X-Forwarded-Encrypted: i=1; AJvYcCUg1OxIc1tEhtFEq0ThWnu6UczEO2zQ7BzscjIRgsIYN2JDLrq7u1DInceeD5lumO5EosXhMTodUA==@kvack.org X-Gm-Message-State: AOJu0YxlKPVH40KxJKr3O4bFnx8bgXX+FawUE/gHWxl+uwk80bVMvBVw 97WJRL2QKOpZafLzHsBImKhsMaaxKleyudaFbJji2BGejF8cg8Bd8xANOrT8lN3ZMmhAgaojIOD Pv0sKxQQQboB0z+kXx6J3arNXyT9tRJ0kMipk X-Gm-Gg: ASbGnctW6e+sO8k0ns4w+I38w8KG2BNMg+oaVATLgsh2lbPBI2yRKZUpnxG6rt7XEg7 utbhg5My2XYB8GcnNc0AJzboWiNxGhr2zor+4mQ== X-Google-Smtp-Source: AGHT+IFVRVj5b+YtZOxsCPfKPMEuiih8mLUGJyZ2MQP37hYAM9kY+0XKaSq6UxN3JVSNRWmfOQDKcyU5UdXVO+YzM/8= X-Received: by 2002:a05:622a:1e92:b0:466:975f:b219 with SMTP id d75a77b69052e-467a424b2e1mr2331841cf.8.1734065344059; Thu, 12 Dec 2024 20:49:04 -0800 (PST) MIME-Version: 1.0 References: <20241111205506.3404479-1-surenb@google.com> <20241111205506.3404479-4-surenb@google.com> <20241210223850.GA2484@noisy.programming.kicks-ass.net> <20241211082541.GQ21636@noisy.programming.kicks-ass.net> <20241212091659.GU21636@noisy.programming.kicks-ass.net> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 12 Dec 2024 20:48:52 -0800 Message-ID: Subject: Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock To: Peter Zijlstra Cc: Matthew Wilcox , akpm@linux-foundation.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Stat-Signature: k6tgeba3kfqfku4n8wnstobo8ui1kh5w X-Rspamd-Queue-Id: 42D154001A X-Rspam-User: X-HE-Tag: 1734065334-229358 X-HE-Meta: U2FsdGVkX19uIcBjKURhCwZFKOzO/wHBhg9pmPJGKCUkbCswKb5xRtAqnIEShJ7S9Unp36k3+YcCGvE55UBBt6FcvteFz3k0EWgc8hZd2sVyVJSoZUDmzmhjOI6nxzQk/tUoQVk5M2NRabAk2uy0EkTwloF1apaHDQrfWaYywpXxrcSmAQVsXKjVskFRLVp4snxUkt34xqQ0FXxIVmU72Ip9C0K96/jxc7Yngk/i3wzxsTZPRJLMBb9t2DvxrNIUfCefnpfx9rocBJog0CmIkoNFkZZyHVIhIXj1vVyXJZvKN5hVP9jqUJcL7xgsBMB6+uqxUyliZa0pS9YPrukNoXoWR9cuASah1KRXdHa2nVZ0GPdzXhTjo4ryf9LWXqfwXUl7T/+nBwNle0B3yg3tfZbrjTXqjsAaZwUimWqG54RuDfPMKofhp8A+GZbn8nBIULc30oJhcHBBZ/19oARTrN0yOWfOyJ8Noggo7LiRqTUeERstVAA8bOOh+mXpyerKEloS5Eltx6UOW56KeYlDUYM5DEQtmF7+QCqeMXazzMEAbn36WGYP32PaEEqQXFBQjB72qu/Mek4PVlVAu7vLynqC7Ny/cczQnQcU7zDAuJPTFk8iPr2kzkVbgXBkXdYlQtSKu6esYQnSghzK0S+mhX1K4yJlKEke9S8coMKWAqibiIKbkVWJHdDrLOJt7Ila7XIlx5u/q95QSWz9TptDPjTHC0WEdjbv7RR/pCYSdKPSnvYwz/HFE2KAhEkFLwM9wJ6OZB0Welux2SqsVp9b/w8uZ6KpdLc0zDQm5xCaXWubyqB5CkSAIMlEHxam/NSrirNKNOcTWmAhNcStNQlAwQPUygLedMvtiqbtdt0O0DLv5IeMzfpUHrifVcegofKFtxTMaPx09Oesn+5VNQ5IYjpony796Dp7tO70eCegCmkFMawLO883t+aYRV+y77x8ilb5ykCt0n3SW8q+0Ey VneOiGLi hMhGydn/znznTzMbx7W8kiEONLB/D71/C8AOhczXyV50f8Xw7wdXptrtjvS8SZ/96QyrfKzwVrr8zehq0SK+tljEXvq+o+sMwOQvDYQ8GpWnlQxpQZzfABWzCM21rXUR8XB+5ctLID1IvD/pfq8hvdy2z6tIfldz7xunsfFe6cL9rk63FArP29f1ftIMy1J8qvAqTQAJSS9kIDIAUxVi7w9t+qSRGYuU/P+XxjFZo+oZl13LdjMfIyhR/rQtN52OME00HmeVCl9FC717jtMdEIwa4Yt/PZj6pPfQfzdBpC4tkVl5VTpg/9k1o5cEFE1uWLKqIwjUDJ2ofH7U8DPrEj7X6s6vJqtRs2j71afUzV5j7BCbm1hOP3ikgeJw49zbon067 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000010, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 12, 2024 at 6:19=E2=80=AFAM Suren Baghdasaryan wrote: > > On Thu, Dec 12, 2024 at 6:17=E2=80=AFAM Suren Baghdasaryan wrote: > > > > On Thu, Dec 12, 2024 at 1:17=E2=80=AFAM Peter Zijlstra wrote: > > > > > > On Wed, Dec 11, 2024 at 07:01:16PM -0800, Suren Baghdasaryan wrote: > > > > > > > > > > I think your proposal should work. Let me try to code it and = see if > > > > > > > something breaks. > > > > > > > > Ok, I tried it out and things are a bit more complex: > > > > 1. We should allow write-locking a detached VMA, IOW vma_start_writ= e() > > > > can be called when vm_refcnt is 0. > > > > > > This sounds dodgy, refcnt being zero basically means the object is de= ad > > > and you shouldn't be touching it no more. Where does this happen and > > > why? > > > > > > Notably, it being 0 means it is no longer in the mas tree and can't b= e > > > found anymore. > > > > It happens when a newly created vma that was not yet attached > > (vma->vm_refcnt =3D 0) is write-locked before being added into the vma > > tree. For example: > > mmap() > > mmap_write_lock() > > vma =3D vm_area_alloc() // vma->vm_refcnt =3D 0 (detached) > > //vma attributes are initialized > > vma_start_write() // write 0x8000 0001 into vma->vm_refcnt > > mas_store_gfp() > > vma_mark_attached() > > mmap_write_lock() // vma_end_write_all() > > s/mmap_write_lock()/mmap_write_unlock() > > > > In this scenario, we write-lock the VMA before adding it into the tree > > to prevent readers (pagefaults) from using it until we drop the > > mmap_write_lock(). In your proposal, the first thing vma_start_write() > > does is add(0x8000'0001) and that will trigger a warning. > > For now instead of add(0x8000'0001) I can play this game to avoid the w= arning: > > > > if (refcount_inc_not_zero(&vma->vm_refcnt)) > > refcount_add(0x80000000, &vma->vm_refcnt); > > else > > refcount_set(&vma->vm_refcnt, 0x80000001); > > > > this refcount_set() works because vma with vm_refcnt=3D=3D0 could not b= e > > found by readers. I'm not sure this will still work when we add > > TYPESAFE_BY_RCU and introduce vma reuse possibility. > > > > > > > > > 2. Adding 0x80000000 saturates refcnt, so I have to use a lower bit > > > > 0x40000000 to denote writers. > > > > > > I'm confused, what? We're talking about atomic_t, right? > > > > I thought you suggested using refcount_t. According to > > https://elixir.bootlin.com/linux/v6.13-rc2/source/include/linux/refcoun= t.h#L22 > > valid values would be [0..0x7fff_ffff] and 0x80000000 is outside of > > that range. What am I missing? > > > > > > > > > 3. Currently vma_mark_attached() can be called on an already attach= ed > > > > VMA. With vma->detached being a separate attribute that works fine = but > > > > when we combine it with the vm_lock things break (extra attach woul= d > > > > leak into lock count). I'll see if I can catch all the cases when w= e > > > > do this and clean them up (not call vma_mark_attached() when not > > > > necessary). > > > > > > Right, I hadn't looked at that thing in detail, that sounds like it > > > needs a wee cleanup like you suggest. > > > > Yes, I'll embark on that today. Will see how much of a problem that is. Ok, I think I was able to implement this in a way that ignores duplicate attach/detach calls. One issue that I hit and don't know a good way to fix is a circular dependency in the header files once I try adding rcuwait into mm_struct. Once I include rcuwait.h into mm_types.h leads to the following cycle: In file included from ./arch/x86/include/asm/uaccess.h:12, from ./include/linux/uaccess.h:12, from ./include/linux/sched/task.h:13, from ./include/linux/sched/signal.h:9, from ./include/linux/rcuwait.h:6, from ./include/linux/mm_types.h:22, ./arch/x86/include/asm/uaccess.h includes mm_types.h. But in fact there is a shorter cycle: rcuwait.h needs signal.h since it uses uses inlined signal_pending_state() signal.h needs mm_types.h since it uses vm_fault_t The way I worked around it for now is by removing signal.h include from rcuwait.h and wrapping signal_pending_state() into a non-inlined function so I can forward-declare it. That requires adding linux/sched/signal.h or linux/sched/task.h into many other places: arch/x86/coco/sev/core.c | 1 + arch/x86/kernel/fpu/xstate.c | 1 + block/blk-lib.c | 1 + block/ioctl.c | 1 + drivers/accel/qaic/qaic_control.c | 1 + drivers/base/firmware_loader/main.c | 1 + drivers/block/ublk_drv.c | 1 + drivers/crypto/ccp/sev-dev.c | 1 + drivers/dma-buf/heaps/cma_heap.c | 1 + drivers/dma-buf/heaps/system_heap.c | 1 + drivers/gpio/gpio-sloppy-logic-analyzer.c | 1 + drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 1 + drivers/iommu/iommufd/ioas.c | 1 + drivers/iommu/iommufd/pages.c | 1 + .../staging/vc04_services/interface/vchiq_arm/vchiq_dev.c | 1 + drivers/tty/n_tty.c | 1 + fs/bcachefs/fs-io-buffered.c | 1 + fs/bcachefs/journal_reclaim.c | 1 + fs/bcachefs/thread_with_file.c | 1 + fs/bcachefs/util.c | 1 + fs/btrfs/defrag.h | 1 + fs/btrfs/fiemap.c | 2 ++ fs/btrfs/free-space-cache.h | 1 + fs/btrfs/reflink.c | 1 + fs/exfat/balloc.c | 1 + fs/gfs2/ops_fstype.c | 1 + fs/kernel_read_file.c | 1 + fs/netfs/buffered_write.c | 1 + fs/zonefs/file.c | 1 + include/linux/fs.h | 2 +- include/linux/rcuwait.h | 4 ++-- io_uring/io_uring.h | 1 + kernel/dma/map_benchmark.c | 1 + kernel/futex/core.c | 1 + kernel/futex/pi.c | 1 + kernel/rcu/update.c | 5 +++++ kernel/task_work.c | 1 + lib/kunit/user_alloc.c | 1 + mm/damon/vaddr.c | 1 + mm/memcontrol-v1.c | 1 + mm/shrinker_debug.c | 1 + net/dns_resolver/dns_key.c | 1 + 42 files changed, 48 insertions(+), 3 deletions(-) I'm not sure if this is the best way to deal with this circular dependency. Any other ideas?