From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB2EDE77187 for ; Wed, 18 Dec 2024 20:01:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 382926B007B; Wed, 18 Dec 2024 15:01:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 309836B0082; Wed, 18 Dec 2024 15:01:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 183376B0083; Wed, 18 Dec 2024 15:01:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id EAB866B007B for ; Wed, 18 Dec 2024 15:01:11 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A51261610F6 for ; Wed, 18 Dec 2024 20:01:11 +0000 (UTC) X-FDA: 82909147272.19.6AF56E1 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by imf05.hostedemail.com (Postfix) with ESMTP id 2F28710002A for ; Wed, 18 Dec 2024 20:00:08 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Uf/JgY8R"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of surenb@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734552039; a=rsa-sha256; cv=none; b=hAd/MgxXKchkwb3Z6Yr9a1SxyOGL1pcqxg6VvLqK0o4LC1s84KjXL0kPxscMxOPsdTiavZ b16AzCJarHhkQyOYgPpUwTv23PRTnQGgLYjDwe22GnENsMG71H5IHV7aNYF4oTiNAuc0qw x+1MAw2fO1ezP2iBqOSV7e3rsZX/B2k= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Uf/JgY8R"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of surenb@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734552039; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/R2Azbdkaj9TcrRiCPTKBbgal58RMiWaSFnNN0SVUfQ=; b=Ygxst8I9HuxtK/utaquTqU+MhIsQ2bi+Jl71lIcNszdX1FQGvtZJFuceTrZPJqMXMmQhLo ybr7ovziqy/ItHI53Mh0nJlw7IXZgUoHVESnEj+9YS8Q/5kqit27lPgyyjiC7GTrlCQAvn f0GrMkQG3WDg+VrwvBc11sROPU84faA= Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-4679b5c66d0so33811cf.1 for ; Wed, 18 Dec 2024 12:01:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734552068; x=1735156868; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/R2Azbdkaj9TcrRiCPTKBbgal58RMiWaSFnNN0SVUfQ=; b=Uf/JgY8RjbpC70iRdpgtiwtiQblSLYaQzGKXYwnc7jyRZD/9yxh0MwkgLBtD+xTjyx epQVWmMfA80p7bgz5OL5nPSnhryHtr+MiQ664ogNVkKk/0HiteMqmpZ9fdTTMu5HbQC/ mHGxFT1pSC8iEQwygVB2cIYDYclWRxoqbqoTHzE7mkqilNMmUbw/tp4+1EBr5hxu1/HU wUv3f7O1+FFn93hI/RiMgAOKWfCNUd2Yal3+vRzdzCiOjIZ526yJ7GXYP0H2/dlP+B6x PJniaQQTOagBYBcQliZ84KGITyDPmflbdFlPZyjiMDb/v6wPZb3/pbAX2a2V+UKhsELb a14Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734552068; x=1735156868; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/R2Azbdkaj9TcrRiCPTKBbgal58RMiWaSFnNN0SVUfQ=; b=v9gYHszXphc/M7Ie0DAlvzJMOk3679aMERwLf1ZYoWBKMjvJ2Zsw3Dh3DiHEgF0guI 8wooW8s/deU0qc99yTvYkS076KDNVm+3acEWaLhyEXPbUwjskSBBj7HLyKFyQ2crA1nm 6Y1QRxzuISwvnp/8nBpaUMmXTe+g1XkUIrObBNOIs/bE33IhCoZdd53NHFw0Jg5ZZh6c podcgrrZgbwJ4gvQga2bdXy8JGzUUvyK4DrtKLF/GbZMeXY0ti5R3aEe5IzXZpEaKAYk ez9WbHFuNvmcqhQ1Yhz6rpKeCDuotQftf0wlGg0ZCVrYaiATHSUiT1Hd2h56XPYDo0Sr fNuw== X-Forwarded-Encrypted: i=1; AJvYcCXN+fdXLjpjSWr0NITrC9SV8ATZQoC0RkvNgil08bi/sIbQMGfy3XwwSdJQK2Jcx3umU4vgtf6vFA==@kvack.org X-Gm-Message-State: AOJu0YwbDkhj1TwdGnH8V+8YxHiXnop+HlLwGYYLYKQmqTGlqF3h0ZwA gWaAJYR2ruNTuFN6gO+y8Bdu/HF5bXit7qXkQiVfeBelNUe1fPr/x+gTrblDw0E9OHWvbdibinT MlrdSRVSmWw79X8NhycQT3E6ApAug0K4AGKsD X-Gm-Gg: ASbGncsYF4xpWdijCntb9ORLmuvYvZMa1k+9NmsO9DaqbJWWktXyH2f6NtvO5AI2gFM Pwz0QVYsEvp4xIP0KcPuBC/sVzvFcOXTaK9x546VUsooShkukXCvBs6T3dEA1WlSmMUpf X-Google-Smtp-Source: AGHT+IHGCFsVgJFmpVC4q4ly5hhcScG7IXhIrf2ICPiDSy35AQoMyktvXGdKUL5cv5mHtIQe9QMJ+JYWrsDRBcQgFQo= X-Received: by 2002:a05:622a:4d4:b0:466:8887:6751 with SMTP id d75a77b69052e-46a3bc2ffb2mr252221cf.23.1734552068350; Wed, 18 Dec 2024 12:01:08 -0800 (PST) MIME-Version: 1.0 References: <20241218100601.GI12500@noisy.programming.kicks-ass.net> <20241218161850.GG2354@noisy.programming.kicks-ass.net> <20241218174428.GQ2354@noisy.programming.kicks-ass.net> In-Reply-To: From: Suren Baghdasaryan Date: Wed, 18 Dec 2024 12:00:56 -0800 Message-ID: Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count To: "Liam R. Howlett" , Suren Baghdasaryan , Peter Zijlstra , akpm@linux-foundation.org, willy@infradead.org, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2F28710002A X-Stat-Signature: okec3rt37jp1ne41aqf9shrswy78bsy6 X-Rspam-User: X-HE-Tag: 1734552008-424867 X-HE-Meta: U2FsdGVkX1+E9GQNbUFrIWjbXGs+/tUKNTKD9fCTduFLjYx7Mo9z/LBC0TU1iY4XQdNCsNW2XKhadUDb1/GfWnATIjlZqFiinttsWhFfUAIhcY9SGIl9g8mrMZk4/TJjmYc1gfRPR/MYdP/+Mfg6oejQv18VoE6DCroDjBYzLJ5p6mtNJGN/iUvBJzt4Fri7TTMZg7TansAmmcvmQDM3M1RmkOdDWK1Z/gWDcnxXp8rGVh3isbgj/v53Z//MnwUUu8ETuogTZUmo79ZtPAE5KgtGwoHgUhkYdZCRpl2En903WLTa8wVBv9h3K8eg+rdbJ2lbDIjMA8yXA0eDltBDUMuGm8mQMaPappkKSb5Nkqa6i2dpnv8lLBhsoFfMcH+ZMlrlGi1tI/n7ydKCmE+5bNA2HhTXejgKEo08i92FnYBgoa4CsogsNLOw5ecGvPFaOqoZawS9KRHezvTVBa73gEH+a+R8Nd+/sGXcZZRMofj1TgrXCtd4/Geh3mVYjP7a/JJRXfsJnopnwSYLwDcZCBtIA1stRfEGiteTkISU7h/PCu4Hz30xKO8xCKIz8RFK/rXx+TXq8UDcV3qxaQC6Avf9QrljSA99BEwoQGoljZ54mUPXg6XVU1Eny4HswyVP2tEqwjlENEy2qjE4ZUYRoiA0qvsnNBU0GHhzkit4e+NN3a+TlBo39gYdpL8fM3guzq7SDVgUqMFhyxW+MLDP4AHlBc0qsQK0IZdoN+186GZQZ8vuQNMq8BvpLJdD120s4wTbBGr3p/mMEHwpO9HnzxKNxBZdWJrpXDbGAeRxjaaaBcMyptlF7gWxwaAV73lO9VoUozEBRCvkpaa5YCQ7V1CeYq2FCWzsLnV2AB7+b9Els/FlrmJkbRVeZh1+TVjnprA5HprlGAOcK6CU87W9QiLFMACzoIA81hV1xiOkLrCkOx50tp1yUOfTbAh2ZlGBCnq+t+pBnHq3S9z6K40 pIFagkuS 3Ln3Qs0IItzO5VP75RGp0MccAXFyXyrPG1oDk+8dNm2Pl7G2YyAfwU7rrZvyI/55inIf3wPuVktlZNt+RJ0yVkAnfu6NluvcDSDXsJQ6KNRb2o2g44NFMBEThaEDaTPDB6rDNc2ZfM5LZVGQZJ8JjOhhF7/UMLV6c0VA11x3SrzxiCp4r0IMNWofujGH0cyWSfeaNaKVZ2SxB8krAHC3aS2ToqRFVvS6QlubfHluWOY4XSlFvPCrV/AICjHeIogURy2tONIwJVcuhjK/BBYTMCsc+/aiamHU88Ioy7zl0yWsRZ86Ll5rYinkDrijVeALSGMzXKnTn4Ri4kEtw9A90yDRfCoZ/XVZC/xYVQ9l2T7oCSnAf7rVG/vPzhGIZixXhtToE8jefOzSbmzI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.019444, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 18, 2024 at 11:38=E2=80=AFAM 'Liam R. Howlett' via kernel-team wrote: > > * Suren Baghdasaryan [241218 14:29]: > > On Wed, Dec 18, 2024 at 11:07=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > On Wed, Dec 18, 2024 at 11:00=E2=80=AFAM 'Liam R. Howlett' via kernel= -team > > > wrote: > > > > > > > > * Suren Baghdasaryan [241218 12:58]: > > > > > On Wed, Dec 18, 2024 at 9:44=E2=80=AFAM Peter Zijlstra wrote: > > > > > > > > > > > > On Wed, Dec 18, 2024 at 09:36:42AM -0800, Suren Baghdasaryan wr= ote: > > > > > > > > > > > > > > You will not. vms_complete_munmap_vmas() will call remove_v= ma() to > > > > > > > > remove PTEs IIRC, and if you do start_write() and detach() = before > > > > > > > > dropping mmap_lock_write, you should be good. > > > > > > > > > > > > > > Ok, I think we will have to move mmap_write_downgrade() insid= e > > > > > > > vms_complete_munmap_vmas() to be called after remove_vma(). > > > > > > > vms_clear_ptes() is using vmas, so we can't move remove_vma()= before > > > > > > > mmap_write_downgrade(). > > > > > > > > > > > > Why ?! > > > > > > > > > > > > vms_clear_ptes() and remove_vma() are fine where they are -- th= ere is no > > > > > > concurrency left at this point. > > > > > > > > > > > > Note that by doing vma_start_write() inside vms_complete_munmap= _vmas(), > > > > > > which is *after* the vmas have been unhooked from the mm, you w= ait for > > > > > > any concurrent user to go away. > > > > > > > > > > > > And since they're unhooked, there can't be any new users. > > > > > > > > > > > > So you're the one and only user left, and code is fine the way = it is. > > > > > > > > > > Ok, let me make sure I understand this part of your proposal. Fro= m > > > > > your earlier email: > > > > > > > > > > @@ -1173,6 +1173,11 @@ static void vms_complete_munmap_vmas(struc= t > > > > > vma_munmap_struct *vms, > > > > > struct vm_area_struct *vma; > > > > > struct mm_struct *mm; > > > > > > > > > > + mas_for_each(mas_detach, vma, ULONG_MAX) { > > > > > + vma_start_write(next); > > > > > + vma_mark_detached(next, true); > > > > > + } > > > > > + > > > > > mm =3D current->mm; > > > > > mm->map_count -=3D vms->vma_count; > > > > > mm->locked_vm -=3D vms->locked_vm; > > > > > > > > > > This would mean: > > > > > > > > > > vms_complete_munmap_vmas > > > > > vma_start_write > > > > > vma_mark_detached > > > > > mmap_write_downgrade > > > > > vms_clear_ptes > > > > > remove_vma > > > > > > > > > > And remove_vma will be just freeing the vmas. Is that correct? > > > > > I'm a bit confused because the original thinking was that > > > > > vma_mark_detached() would drop the last refcnt and if it's 0 we w= ould > > > > > free the vma right there. If that's still what we want to do then= I > > > > > think the above sequence should look like this: > > > > > > > > > > vms_complete_munmap_vmas > > > > > vms_clear_ptes > > > > > remove_vma > > > > > vma_start_write > > > > > vma_mark_detached > > > > > mmap_write_downgrade > > > > > > > > > > because vma_start_write+vma_mark_detached should be done under m= map_write_lock. > > > > > Please let me know which way you want to move forward. > > > > > > > > > > > > > Are we sure we're not causing issues with the MAP_FIXED path here? > > > > > > > > With the above change, we'd be freeing the PTEs before marking the = vmas > > > > as detached or vma_start_write(). > > > > > > IIUC when we call vms_complete_munmap_vmas() all vmas inside > > > mas_detach have been already write-locked, no? > > That's the way it is today - but I thought you were moving the lock to > the complete stage, not adding a new one? (why add a new one otherwise?) Is my understanding correct that mas_detach is populated by vms_gather_munmap_vmas() only with vmas that went through __split_vma() (and were write-locked there)? I don't see any path that would add any other vma into mas_detach but maybe I'm missing something? > > > > > Yeah, I think we can simply do this: > > > > vms_complete_munmap_vmas > > vms_clear_ptes > > remove_vma > > vma_mark_detached > > mmap_write_downgrade > > > > If my assumption is incorrect, assertion inside vma_mark_detached() > > should trigger. I tried a quick test and so far nothing exploded. > > > > If they are write locked, then the page faults are not a concern. There > is also the rmap race that Jann found in mmap_region() [1]. This is > probably also fine since you are keeping the write lock in place earlier > on in the gather stage. Note the ptes will already be cleared by the > time vms_complete_munmap_vmas() is called in this case. > > [1] https://lore.kernel.org/all/CAG48ez0ZpGzxi=3D-5O_uGQ0xKXOmbjeQ0LjZsRJ= 1Qtf2X5eOr1w@mail.gmail.com/ > > Thanks, > Liam > > To unsubscribe from this group and stop receiving emails from it, send an= email to kernel-team+unsubscribe@android.com. >