From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28E0BE7717F for ; Mon, 16 Dec 2024 21:53:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ACF636B00AF; Mon, 16 Dec 2024 16:53:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A57BF6B00B0; Mon, 16 Dec 2024 16:53:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D70E6B00B4; Mon, 16 Dec 2024 16:53:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 67D586B00AF for ; Mon, 16 Dec 2024 16:53:21 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 21C6B1401AF for ; Mon, 16 Dec 2024 21:53:21 +0000 (UTC) X-FDA: 82902172626.13.CC44DB3 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by imf27.hostedemail.com (Postfix) with ESMTP id 3E08E40010 for ; Mon, 16 Dec 2024 21:52:46 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DeLYc4E+; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734385966; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Aq05gPIyw7EB2xqbSOReuVtBjU7kZl4ZHSk4xKdPdrg=; b=Zn8PT0IlQilbb5fYeukk03twqu47n7UBu+FjbyEPQuj6HIOXItV/nKA/HoDt1E7tUPHbv0 hWdHybCeu+R5SUiGiupXeVYydZiWz+Uvm041wSp75C2uSv2ZRg3PbAWRxMYVqtXtEoVdH+ l+t4+ozT0++j41pi5/5THhyDX3RfxgE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734385966; a=rsa-sha256; cv=none; b=M5OfxxtV+BbQxOm46ettQHumP6z49hkQvdBLFaWtPWjk73bWT1l/CaX/Ou5fMzJAjIwPBw x4aZDzd2wm6VtwA5A5hpVAkkzoq6fJHNMZIj9if9oOWU4DQCg1IJv1jKUapXtKFCl9SUs5 EM/Js+UyMURTZrzozF5ouCUhhGXrf1I= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DeLYc4E+; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-4679b5c66d0so21161cf.1 for ; Mon, 16 Dec 2024 13:53:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734385998; x=1734990798; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Aq05gPIyw7EB2xqbSOReuVtBjU7kZl4ZHSk4xKdPdrg=; b=DeLYc4E+5YB0Q9nGGCaW8u3QqqLG79wAYh2pG9nJMvm5yog62PuMmtZApSapbQli6f ILeI3iHU45/xYdlA2K4VwH/EXwt7p7N6QVR5cnzCCujh1kPyfGclyGL2a1cQWiNXQRdU 27JmSLnWAPPDVaPsLyQ/dbM3cHkXw/vo0cYEZVZnLVY7SX4B1uqId24bdXnSPBxXNh5Z yDGXZHXmTxDsvauHevNN2IQobjG37agoHuDefTkm693XprFHDnQCWLQXsg2m5frLrBxs OWn31Wx+IXAMECU8irqp0EM03GxhYnycxtWBr4Mu26QuSIwRY8fS0hn/xLPnmQG0bOON /Gvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734385998; x=1734990798; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Aq05gPIyw7EB2xqbSOReuVtBjU7kZl4ZHSk4xKdPdrg=; b=goxGfUKDglLayGsDj6+8Ic0To36KUZ+wZ8YuR5kOOQgrLB2MzNWT7jrdkMRMZDkvgl igoVaipJYl2wkHLFCRRlkdR8YBHuxQzl+MEQElfQ5+htXsRs/4st+QDfIeq9uHx6R14u V+x3/Xjg3K2ca3g/255z8AVsUzsX4jLX4Jc2TklYDnIrsAkCcbWwdcGIu54INzbvooV/ 6R1Lnag3FbV+z2qBSbrBToX48g+UCKjell+3YXeRRwZd+wlJOyGCfgz/mF3YN49T4s28 WUSaX4orCp0TmH0NjcJRRFblQP+bgcmHluHISJsMKQFsragofjAUfoNhRRBId19z/1d3 q8sQ== X-Forwarded-Encrypted: i=1; AJvYcCUSav9TqD8cNST3zZTiuOOSb2PBt1zaDN7X2aXL1ghxKDvCjxRyDkcEhs/wOPifcOifL1h7qmI+xg==@kvack.org X-Gm-Message-State: AOJu0YzZH4V/jLLqIgHlpEgMWIbAHRvQdbBR/DKXusHKqHFeN4frCiNO B8XIDYC9ZagR6nrntskMwELUkKpO18bVux3gYeLM6/C8twkQMM11Xu43X9MUN+nm6TIFYZ1qIRD sprIcKRLe6ZF45MLMcfcO+FKSWsf5OF06NIFQ X-Gm-Gg: ASbGncudkBPBJNlhT/+e+la4SvFgPs/17DRjKCLHJqWreJCM3i5O0wuMafPFc5D/wZo WHlaahG7XkpVQVcTCQvAYctxcV7laNWkHf/ibZw== X-Google-Smtp-Source: AGHT+IHtfvEBTV4XwO44y08RCoG8lRTuslg34UX+xLARmWTUYFF2aI3xBP0gB1KtSzzFun+sUfSbEsihc7M1sONbcGY= X-Received: by 2002:ac8:574d:0:b0:466:861a:f633 with SMTP id d75a77b69052e-468fb0b4c55mr151201cf.5.1734385998024; Mon, 16 Dec 2024 13:53:18 -0800 (PST) MIME-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> <20241216192419.2970941-11-surenb@google.com> <20241216211520.GB9803@noisy.programming.kicks-ass.net> In-Reply-To: <20241216211520.GB9803@noisy.programming.kicks-ass.net> From: Suren Baghdasaryan Date: Mon, 16 Dec 2024 13:53:06 -0800 Message-ID: Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count To: Peter Zijlstra Cc: akpm@linux-foundation.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3E08E40010 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: hrc7nwe9ndhmjjo9h3uum3siohhfcn17 X-HE-Tag: 1734385966-716793 X-HE-Meta: U2FsdGVkX19+j+gNQlw5QJ6h3iFDxOVsGdlXHrq6sLLeUwf8D3DbBEhPsoTbfVwgZDznFmSBiVjTgGljQHj8KLjIOltQE4iY5DGFBnl2bBaqADil0L2ipDRUBKybCQph26km6F8Et3FSaK19XzrTFo1nKRTsy+jRoyLfmxjni8VMfKe1zJW0yS+JDmP/Ok6hNmHEsri8pVBPfZEJuQ0SyoUUTZwCGsi3R0jkc4pAkTJCQkGG2jhzarARiaC6BLiqeQY05aNs32eW1sJfzmhlDGacqS52CJL3eguKB34+9iATRG/D1mWWbyFGBVaUechNI0jHxZUVk48Ez4oB6JK7gnS9UPkGZW9Mjp6iPcIO1NvuvnXmISfWX8nB6/COYJUHbL9A4ca2iU+pZ0403CVuXlOASBtX5ZUXn8eLMyO0v7kfW1F5QbEXpgHUSiTPjhQxD0O55Kc8L0jWTqjid9cYsfBSy3ZWIVUaxl2qcVwB70MKJAMpojQNuoun9WeAbPmZKs4yXTRhrLBnRwEkau6FzatxGNbDsgXUohVe3ZiSMDhiGtD5KwEFMKAd3PeCy9jEeTNDu6AHSEU7MKY6b6jh3Rvp4UGlJAYyAnigIz6DgxzFV0xTdeOgrPbDq0qXYnzFegdYwcFJ+zAFPkh+vaOq150RBF5bw5l3tUHW3YfshmQCufAvT7u2sZzcW/W1OEq9vI2xymF1GS8PAZCt/FFzQq3gGCDeooB8IY1roQoffEtkaPkrfYaXCPkxsZd6qONoaWQSxIQkKvYvo/guHKtU2F8BdWzRRJ2Sim5EMBLtQ8Edl8wXoUBGNQG/iLL5nC51BzVRjiPrsWuSkWx6Mj7i6mCG8Do2rlCO4JBWuIfR7tgPV7jj/oZjjLtGjIz1KDxeRRV/3VvIml7qrjzyyeVGBFGa/rm8qECVUSWYszdJT9yjiBYsCBaft2GXClGKFtEo4TWMbaPUh1R4GGPPEbG 3f1SgmDx iwhi6r6Zg5uE7FXAS8r3I/PrKMNMRIe63E2H2zTno4xy4Cg3kG4rIVOxmqmHqgBbk6do6faLZIil65yDajTZZa4ZTYXiGHUZbHmcVOl/pNOapFtmKFSfM4HkD6ooNvJ8VHW94lu0/VSk9on0Y5gfLnyNPvYfmQ7IxyUTAsQgzt3yrRZD4iOC2kw3YH2oMMAUVWhxIlMZ0KwL0hIjUYqR4CwTcD7+lmYlGl8Oy/HpcizHa9FpiVDuAEqysd7vzKSU6hV/NWIfCr+Q8FFgpqnNseCP5qT/IHL7PdYKrutlP/y4jZRs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.065473, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 16, 2024 at 1:15=E2=80=AFPM Peter Zijlstra wrote: > > On Mon, Dec 16, 2024 at 11:24:13AM -0800, Suren Baghdasaryan wrote: > > FWIW, I find the whole VMA_STATE_{A,DE}TATCHED thing awkward. And I'm bad with naming things, so any better suggestions are welcome. Are you suggesting to drop VMA_STATE_{A,DE}TATCHED nomenclature and use 0/1 values directly? > perhaps s/VMA_STATE_LOCKED/VMA_LOCK_OFFSET/ ? Sounds good. I'll change it to VMA_LOCK_OFFSET. > > Also, perhaps: > > #define VMA_REF_LIMIT (VMA_LOCK_OFFSET - 2) Ack. > > > @@ -699,10 +700,27 @@ static inline void vma_numab_state_free(struct vm= _area_struct *vma) {} > > #ifdef CONFIG_PER_VMA_LOCK > > static inline void vma_lock_init(struct vm_area_struct *vma) > > { > > - init_rwsem(&vma->vm_lock.lock); > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC > > + static struct lock_class_key lockdep_key; > > + > > + lockdep_init_map(&vma->vmlock_dep_map, "vm_lock", &lockdep_key, 0= ); > > +#endif > > + refcount_set(&vma->vm_refcnt, VMA_STATE_DETACHED); > > vma->vm_lock_seq =3D UINT_MAX; > > Depending on how you do the actual allocation (GFP_ZERO) you might want > to avoid that vm_refcount store entirely. I think we could initialize it to 0 in the slab cache constructor and when vma is freed we already ensure it's 0. So, even when reused it will be in the correct 0 state. > > Perhaps instead write: VM_WARN_ON(refcount_read(&vma->vm_refcnt)); Yes, with the above approach that should work. > > > @@ -813,25 +849,42 @@ static inline void vma_assert_write_locked(struct= vm_area_struct *vma) > > > > static inline void vma_assert_locked(struct vm_area_struct *vma) > > { > > - if (!rwsem_is_locked(&vma->vm_lock.lock)) > > + if (refcount_read(&vma->vm_refcnt) <=3D VMA_STATE_ATTACHED) > if (is_vma_detached(vma)) > > > vma_assert_write_locked(vma); > > } > > > > -static inline void vma_mark_attached(struct vm_area_struct *vma) > > +/* > > + * WARNING: to avoid racing with vma_mark_attached(), should be called= either > > + * under mmap_write_lock or when the object has been isolated under > > + * mmap_write_lock, ensuring no competing writers. > > + */ > > +static inline bool is_vma_detached(struct vm_area_struct *vma) > > { > > - vma->detached =3D false; > > + return refcount_read(&vma->vm_refcnt) =3D=3D VMA_STATE_DETACHED; > return !refcount_read(&vma->vm_refcnt); > > } > > > > -static inline void vma_mark_detached(struct vm_area_struct *vma) > > +static inline void vma_mark_attached(struct vm_area_struct *vma) > > { > > - /* When detaching vma should be write-locked */ > > vma_assert_write_locked(vma); > > - vma->detached =3D true; > > + > > + if (is_vma_detached(vma)) > > + refcount_set(&vma->vm_refcnt, VMA_STATE_ATTACHED); > > Urgh, so it would be really good to not call this at all them not 0. > I've not tried to untangle the mess, but this is really awkward. Surely > you don't add it to the mas multiple times either. The issue is that when we merge/split/shrink/grow vmas, we skip on marking them detached while modifying them. Therefore from vma_mark_attached() POV it will look like we are attaching an already attached vma. I can try to clean that up if this is really a concern. > > Also: > > refcount_set(&vma->vm_refcnt, 1); > > is so much clearer. Ok, IIUC you are in favour of dropping VMA_STATE_ATTACHED/VMA_STATE_DETACHE= D. > > That is, should this not live in vma_iter_store*(), right before > mas_store_gfp() ? Currently it's done right *after* mas_store_gfp() but I was debating with myself if it indeed should be *before* insertion into the tree... > > Also, ISTR having to set vm_lock_seq right before it? Yes, vma_mark_attached() requires vma to be write-locked beforehand, hence the above vma_assert_write_locked(). But oftentimes it's locked not right before vma_mark_attached() because some other modification functions also require vma to be write-locked. > > > } > > > > -static inline bool is_vma_detached(struct vm_area_struct *vma) > > +static inline void vma_mark_detached(struct vm_area_struct *vma) > > { > > - return vma->detached; > > + vma_assert_write_locked(vma); > > + > > + if (is_vma_detached(vma)) > > + return; > > Again, this just reads like confusion :/ Surely you don't have the same > with mas_detach? I'll double-check if we ever double-mark vma as detached. Thanks for the review! > > > + > > + /* We are the only writer, so no need to use vma_refcount_put(). = */ > > + if (!refcount_dec_and_test(&vma->vm_refcnt)) { > > + /* > > + * Reader must have temporarily raised vm_refcnt but it w= ill > > + * drop it without using the vma since vma is write-locke= d. > > + */ > > + } > > }