From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC820E7717D for ; Thu, 12 Dec 2024 03:01:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64EAD6B0095; Wed, 11 Dec 2024 22:01:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D6FE6B0098; Wed, 11 Dec 2024 22:01:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 450546B0099; Wed, 11 Dec 2024 22:01:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 22FE16B0095 for ; Wed, 11 Dec 2024 22:01:31 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CBE7FA198A for ; Thu, 12 Dec 2024 03:01:30 +0000 (UTC) X-FDA: 82884805584.30.9D176F8 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by imf25.hostedemail.com (Postfix) with ESMTP id D2F03A0005 for ; Thu, 12 Dec 2024 03:01:11 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=B5vVd1JO; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of surenb@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733972466; a=rsa-sha256; cv=none; b=PQKaHDIRU/Nze+hrCnT2tw2nfSf+p443FVsTM056ssGjGGAxfzZvQqDL4h0X5W6jAq/wdJ cVmf1lh9nWSQyfsX14NfeDpNUJG/d7avfe0hEod28fteJw68CiTjYzsZVfN+5t4u/iJEcD F3JjyKpPYTGta43Re7dS8nw55SCja5o= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=B5vVd1JO; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of surenb@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733972466; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fXguPs9/u7xacodqeOh7BKGSfUT5UVdPBVuE8pC7yn8=; b=b7TeeZstg/ly7C7Y6jduoNwrBVBnh+xfpz4YSClKkdJH/nA33JcB2ES5tbc4KbcJHEnL0U S9s4GsGrTUiVIwLC/w8mjw6Q8o55lbtyMVvVH+h9c0zapYSAFaQPFZhJIjFjiSbzvpK6hN o7f20ej9PFlloyOJHaYL+wwNZU8qeyI= Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-467896541e1so79521cf.0 for ; Wed, 11 Dec 2024 19:01:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733972488; x=1734577288; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fXguPs9/u7xacodqeOh7BKGSfUT5UVdPBVuE8pC7yn8=; b=B5vVd1JObI+lg01JR5VuDPqkDiVXFlf0A75USQzkwmmZjN8htHIEnJDhtLVw6ee3HZ QjwAbb0YDQaydiZDRKQ31gvWL85T0mWMr1jZ8Wv7NCWZKOcs0kY62YZahT9A/MslG/FE 4W5Y5unRKLOjsdjRlJLCXIwtxdWhWDMeAj6UUx58SS0+dSCgrVGSP0pFjuDlW0/gx0yJ gTjWTuo5fggvyPNtOaRAccpm8pBLs5+fizaz5o5V2tF6OoxAtFi/aAiurA6Wlc9UaO0y zZzMAO60dzB8135eW32UmQHKi4iPYXwGijklkkwpG2vceoOs4sJaeuDJaXNnbIEY3Ihs hTYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733972488; x=1734577288; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fXguPs9/u7xacodqeOh7BKGSfUT5UVdPBVuE8pC7yn8=; b=vrLVYCH7C4w41/99JBBjBW3rwa2tYQEjw0h1PqDdmVTsLEhd4HGONnSuWoeQV8xYIA yPxI5O3le/m/TXzCRZfdRNB5JdcqX9vlHhTxaUUtdd5QL0s2iby90SFyEz+mHoISdxlj U4fjOxqJBC6PXd2KBrmciZKzcHykbq5313iGmjTMsINao6Dea2X8YXGdZxRp9AkrEjbE 8U5J9/1kkiV7NPKAocWyo6lH+xhL3YvnPfaAhtVj+PQ62bMWa7PVAOMM5mlf7KEZshXe utFgq3inJ22Cqge/Tl/G4ba7qnhgYWENiyIIAJ0j0AqV1g/tQ2oXYMrR9DTakBgeIlzM EQnw== X-Forwarded-Encrypted: i=1; AJvYcCV2uTtCvM2oHdN1aHYZPKI5Sv80n1cbw93tukZQwVqTKS++mtdtOEOGRg+JlCXx7WlV7ojWlZy/Pw==@kvack.org X-Gm-Message-State: AOJu0YxfEBx0KuBfunLLeT46ubFcoYkUxlTS0G7Vum+Bl2fioQ2UrGQU E/WGYWMucb/jMfRZq0PFE9lsao07v+9JaOlk9DxRjor32Izv7k8aDAVR666LS1Wwc9DTg6mcODi Ks1w8jxeAJF7eu+Mr3mb4Q/y6u8HBMCwmAQGK X-Gm-Gg: ASbGncvhDvfnBsPMEDVWy89SCb0lyjPz+LCpLfNvBaWIAx3k6F4Qx4ZWEFzPqnl+4V0 4+5YN35qmVqW63zSkNZiSAqQEJumoxjc/Fv0SLK6Yl86rWDYpJ2iPGGjVLT6d/KjCjJ3j X-Google-Smtp-Source: AGHT+IHpj3jpVRoMz+xnVrcuUcoE0Wz2BdvQqBZ7aptyXt8bFeS01OnOmjKmsAbYjWH2WPp11bMrk3Uw90Ab7Sp2xZw= X-Received: by 2002:a05:622a:5a16:b0:466:923f:a749 with SMTP id d75a77b69052e-467981befbcmr1102041cf.15.1733972487746; Wed, 11 Dec 2024 19:01:27 -0800 (PST) MIME-Version: 1.0 References: <20241111205506.3404479-1-surenb@google.com> <20241111205506.3404479-4-surenb@google.com> <20241210223850.GA2484@noisy.programming.kicks-ass.net> <20241211082541.GQ21636@noisy.programming.kicks-ass.net> In-Reply-To: From: Suren Baghdasaryan Date: Wed, 11 Dec 2024 19:01:16 -0800 Message-ID: Subject: Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock To: Peter Zijlstra Cc: Matthew Wilcox , akpm@linux-foundation.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D2F03A0005 X-Stat-Signature: e9n6ojqqpskzc4wt39jwbsbdykjoekde X-Rspam-User: X-HE-Tag: 1733972471-7182 X-HE-Meta: U2FsdGVkX1+aNVBcLJsdyw6UkNzwAhvHY8Sn/HbuzwkTr8LGOxCq8XZxE3AhqnsjL2uDq5Snxv25Ngt0mxLlPJjM7lVQwOEAfRmkjHuGSdE2TclxHESzOW9Z/i5HF04Ie6fyyWR+RwDWv8dDtU0hveFli5vd4WuvorPl86Q8+OWWp9ea0370kJ98C773FisZipw+IGHABHpvtSbANHY5jMxDIBE4lryxX4Tea1I/znwIED7obOVH+smv47tYHg6odqn613jNRqyOgZOfV5mdzHXufVpYUoZ6iC/UgvDByfoFb8X3uFycsR+ha+rYGRAQpRDKHPm3WbjsssZebSAiW183Zc7hlAlG8lTem8cWXhVuEMtyIRyvoCTMILq1Am9EB0+PjE6w4/l7PQ2UgJijBeZPnkLKDFyokdp1Nz/plD2icnkcTUh05Jq2WxBMzJVBNkarhNyYSiJLXIOenS0oEwGjr9dgN7iTbU1mAXVg59ykXp/9j6pfmyMwV2LNoQ25Y+fUrspDHc8nQx5/g2h3ahyg0ouiXU6b3eqH+zR7o4vCuWG4nzZJDGkJAOxMzvNSQs1ZkOm5UjhIgmLnBRZFD4r9WOQLA8Q5jtbZlMjMypHNfTuC+W/PRgBR3PLLloTGjSPfmqZJUR+JdGW/+1a4223Zl2mUdV0eifB/gWYYC6lyJY3ZZSZ0TSuYphinwjlsAkd6lgYsItfqGt0eSF0+WUa6XBlShb3OnX3fZnIa1C+6LTYOvuCAz6PhObAZUi8gWBYDypm5ZIiW7uWBPVMEtSb3rS/RQMMv93ErBxRcHk3pM+CHcMAGLiisFa553ZX8pr/SfNFRIzFy9uMN3f2Jh8XwvRWCm8VWGJhqpM2bm7zUUN1376xDsFQbpICIhohu4h3j+ZBcHPmh/bm3KpFL1CZnhIWErMFaLu2vVHgu5RrOs5H68yY3aMcUW9GCBsL3sYbSN2jfujtAnyG43N6 lTDdPFRW 1e2vB8ZumpqnVHcneHLvyo6kuAiTroL9ACLjs84lltmyhH3u4WUnvDUeZ6GmnBq2B+fPtNiDzf1711dId6mL+VYmBqDx9TWfBr9CYC8UwG6ws70KomKZwg4CtiLWjFVY0uzV30KHMCtVlo7eQCu5vPo3drW6/JN/Z8nv/Ovrd4ViDzV3jJR8mx8rtTFeNWQU50H+z/8Q8M9qOGt3iBtm2qjYr7EO2vnMf0f4Z05dl/+XDkHglYkK3GgrgZzgiqgvpP9dgKUVkEmonnVxJYnSXcTewPykmEotZJU4FOd+ZbZ/hXJaSA4xCOAjfHtdvRAhp7QQLMfB/IjvF4g8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000066, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 11, 2024 at 7:20=E2=80=AFAM Suren Baghdasaryan wrote: > > On Wed, Dec 11, 2024 at 12:25=E2=80=AFAM Peter Zijlstra wrote: > > > > On Tue, Dec 10, 2024 at 03:37:50PM -0800, Suren Baghdasaryan wrote: > > > > > > Replace vm_lock with vm_refcnt. Replace vm_detached with vm_refcnt = =3D=3D 0 > > > > -- that is, attach sets refcount to 1 to indicate it is part of the= mas, > > > > detached is the final 'put'. > > > > > > I need to double-check if we ever write-lock a detached vma. I don't > > > think we do but better be safe. If we do then that wait-until() shoul= d > > > accept 0x8000'0001 as well. > > > > vma_start_write() > > __is_vma_write_locked() > > mmap_assert_write_locked(vma->vm_mm); > > > > So this really should hold afaict. > > > > > > RCU lookup does the inc_not_zero thing, when increment succeeds, co= mpare > > > > mm/addr to validate. > > > > > > > > vma_start_write() already relies on mmap_lock being held for writin= g, > > > > and thus does not have to worry about writer-vs-writer contention, = that > > > > is fully resolved by mmap_sem. This means we only need to wait for > > > > readers to drop out. > > > > > > > > vma_start_write() > > > > add(0x8000'0001); // could fetch_add and double check the h= igh > > > > // bit wasn't already set. > > > > wait-until(refcnt =3D=3D 0x8000'0002); // mas + writer ref > > > > WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); > > > > sub(0x8000'0000); > > > > > > > > vma_end_write() > > > > put(); > > > > > > We don't really have vma_end_write(). Instead it's vma_end_write_all(= ) > > > which increments mm_lock_seq unlocking all write-locked VMAs. > > > Therefore in vma_start_write() I think we can sub(0x8000'0001) at the > > > end. > > > > Right, I know you don't, but you should :-), I've suggested adding this > > before. > > I'll look into adding it. IIRC there was some issue with that but I > can't recall... > > > > > > > vma_start_read() then becomes something like: > > > > > > > > if (vm_lock_seq =3D=3D mm_lock_seq) > > > > return false; > > > > > > > > cnt =3D fetch_inc(1); > > > > if (cnt & msb || vm_lock_seq =3D=3D mm_lock_seq) { > > > > put(); > > > > return false; > > > > } > > > > > > > > return true; > > > > > > > > vma_end_read() then becomes: > > > > put(); > > > > > > > > > > > > and the down_read() from uffffffd requires mmap_read_lock() and thu= s > > > > does not have to worry about writers, it can simpy be inc() and put= (), > > > > no? > > > > > > I think your proposal should work. Let me try to code it and see if > > > something breaks. Ok, I tried it out and things are a bit more complex: 1. We should allow write-locking a detached VMA, IOW vma_start_write() can be called when vm_refcnt is 0. In that situation add(0x8000'0001) leads to "addition on 0; use-after-free". Maybe I can introduce a new refcnt function which does not complain when adding to 0? 2. Adding 0x80000000 saturates refcnt, so I have to use a lower bit 0x40000000 to denote writers. 3. Currently vma_mark_attached() can be called on an already attached VMA. With vma->detached being a separate attribute that works fine but when we combine it with the vm_lock things break (extra attach would leak into lock count). I'll see if I can catch all the cases when we do this and clean them up (not call vma_mark_attached() when not necessary). > > > > Btw, for the wait-until() and put() you can use rcuwait; that is the > > simplest wait form we have. It's suitable because we only ever have the > > one waiter. > > Yes, Davidlohr mentioned that before. I'll do that. Thanks!