From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F419E7717F for ; Tue, 10 Dec 2024 17:16:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7DDC18D000D; Tue, 10 Dec 2024 12:16:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 766A28D000B; Tue, 10 Dec 2024 12:16:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 62E6F8D000D; Tue, 10 Dec 2024 12:16:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 44EC68D000B for ; Tue, 10 Dec 2024 12:16:38 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id ECE1A140986 for ; Tue, 10 Dec 2024 17:16:37 +0000 (UTC) X-FDA: 82879703004.22.6763C8A Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by imf21.hostedemail.com (Postfix) with ESMTP id 0277F1C0005 for ; Tue, 10 Dec 2024 17:15:53 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KQdtyoNI; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733850973; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FaOi4QsOb+J5LPb4Q/0lyWfsbMFf1E6fSqgF/905dAo=; b=CrqWckH4SptTqSVyz39prl2zBYD9sh4f6src3/aXlc4KSQefG8I5Vx9TnAqJUaz8CMncUC qXNB1ZnmF1/HAPc1bb5DEyh1YP+KgIzy1g1pJTRgUXLXwLzL9TlzO3Z9rgSYvkfz0/Z/Bi SmSWJXqzyA6sHxUuzKl8LCNHd217H/I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733850973; a=rsa-sha256; cv=none; b=NFQRRNn9N2Gy0OB83ff05IMkuNkqdFqObMXCfWLKov7YivPUNqPA3nd7LYXmNbPtqI4qa1 Ix492OaBDYvMtTOIT1/RDhonAMR3OnZy7cDdySqgnprRKKZ3sJkS8tz6bmG1I+v/prApA1 CzAoEDV9bD4oF3pVZtPuZCdzkoJEm+Q= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KQdtyoNI; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-4675936f333so185411cf.0 for ; Tue, 10 Dec 2024 09:16:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733850995; x=1734455795; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FaOi4QsOb+J5LPb4Q/0lyWfsbMFf1E6fSqgF/905dAo=; b=KQdtyoNIwpYPue0iAXEPOYSiBittLOV0wL5fVmUeo/xfkKcFLF6vnsnsKkBUbaSjf3 iMnockalR1jMVJvWjROTm/v634QsVn74gzf842auQ8kiWnuMgOq7pgbMwu+Kf2zXrCOM W4pHWUQZ9oyIHBvf26Sdc6/eaOMAi26ts6G3HSkSrw3uELdsVOSmw8OTD0YMcaauG7Hs hwD46bQDPF2itw/wFj4rAcEIT4eNQOQ67ug4/bUd1bKqTdNXNj5Yw3czwVyC5lqTKbeu CZ6aLZhTio7ScE0LcV2Ky9mLng576GQFj0eYKKk0Jaice3V7fnh1mZOUgOg0Dy57hTba KWag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733850995; x=1734455795; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FaOi4QsOb+J5LPb4Q/0lyWfsbMFf1E6fSqgF/905dAo=; b=EoKEwvTQVbRFe9jghE0O882LMdcVako560wG7xaHo5ikdMXnkNNnkyJPE6J2OYXNhK Mf50l3Hyz277fzrYm6NTwnuA2D1HweU7NbEseqfQf8hP7icsXllYplsAhT16Jh+8cMSK 36M0S0/SWSHUr/Sz8923sn1T9q4GK0eEvPZLwdJREhCFuLxOC2WyJ0Ot8gc+uqG9EQV0 XgG7LNoOr/CgNH+zkkrGQoxS7kVAMhJKtWbSirTuE8LmgAzDWD0IagGg/h9EcqOUWrE4 FWmTf83NL4D+ocuRzJNlha4Eyex0xuspPXGn0IC7VNAgH6+3IAf1LBfnlM/z7OsvcyZe ufSQ== X-Forwarded-Encrypted: i=1; AJvYcCXF8ZnolxfhdNVoSsSypWDy5qBSUqvT6W61Mv2tkxKb3H0OxtuGr+0kSLiIIEk4niOILG33B5eWuw==@kvack.org X-Gm-Message-State: AOJu0YwYuNn7Cfkqe+CnVEWiuqxKBi9L/rPxWoU44y9ydcTS13LkL3Dy jL/RyOnO0oqfE4CxTHVrGiuA+GfwirTesKkBmmh7D6DzMtgj2MPFwAqk2xfSIOh+252+uoeY6yl LAX655yEYMFCCqiFxBMWMpOoVtxN8G6weJ+zY X-Gm-Gg: ASbGncvFjb036nc9EfkS1+zOaa1lroLU7ywdranAPuXna8XCLQV3fN4j4J2qOE75XCV w0QQBHrpTZs88+xpz7s64k/P4goKZLI3fR+43QMF28grhGGgCj8/1GYktMIaTxZS76A== X-Google-Smtp-Source: AGHT+IEeFcyZ6T1bdtWsKVYAtVQ8wAtPBjDv42QIMAppf4WlYawq+dIfGHO6dImIpr4LCNCMkgNp3LMCeVKsCbb+cx0= X-Received: by 2002:a05:622a:4c18:b0:466:97d6:b245 with SMTP id d75a77b69052e-46776274fdamr4970131cf.22.1733850994869; Tue, 10 Dec 2024 09:16:34 -0800 (PST) MIME-Version: 1.0 References: <20241206225204.4008261-1-surenb@google.com> <20241206225204.4008261-5-surenb@google.com> <6b29b2a5-c244-4930-a5a0-1a24a04e7e35@suse.cz> <643beb6c-4226-46ca-b7e9-292467479aea@suse.cz> In-Reply-To: <643beb6c-4226-46ca-b7e9-292467479aea@suse.cz> From: Suren Baghdasaryan Date: Tue, 10 Dec 2024 09:16:23 -0800 Message-ID: Subject: Re: [PATCH v5 4/6] mm: make vma cache SLAB_TYPESAFE_BY_RCU To: Vlastimil Babka Cc: akpm@linux-foundation.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0277F1C0005 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: zgbmtzx6d1bq397o9wr9d9zaq1kd5tiy X-HE-Tag: 1733850953-397438 X-HE-Meta: U2FsdGVkX1+auHlPcvlWnVL/69LTNJYrXiq4J2/BpRB+q3S7V47z+YxgjEsHVQbldAB7X67GLNtcIKIRU35J7K8mLX8xtSDm4nciQfTtMEU1PV/J2EkD8nD0Io5HQSi68eSwS7bNDzaCqEpBL2L0dzvbmuyL0XhdmS06RrXySUAwvrtp3svYVA6IBewoR/h5oKsMHyA6cBkoVYE8wnY53eR3NrJbgF395KeBcJH3ZtfGBLrRcoOQDrqpg31G2hugvg+LGZ4bItlPzJmpjT1Diltm004JhjNDxjGn4w6E/LJnrAZy5864O3VQsWZGC+PKRiCzJJ8YBHIBPpPGSA+U0594JkiBMQNeHJzBxSpXcaibq9RS2amU0qTMNXHoF1Gll1rJIW7pLzAPMCUcM8Fc3yxJq7AIQdGtNEK9v0HqjLLgVYlZCprIwj8Wt7enWFfwn4HMA9ZU1XygjcKIIGDUooT01/GG7SSRlhVWmDkM60vgwQcJEZ0penh8DVQ9XQRtJFmywEuyfm7v1DQKZlOxcwuuNG64oqzi052C3m+7sEcbCGcie30+Ti2DAXveyvJikc9HB+NZvl+8SZ5u0/QnA04ZycbUhxr181yrtWxkHfFhYUSKJeXpUZZDAV6TObiWEQf1GDmFsx/sqYWR2a2mzzxFiZGr6hfU8Y4wbXKQE9i7tNwF3itL1lnMlyhdEVn0WwTqCzkDw9EC//H4mRMP/7Qe4IWYJRi+PvVW9mQZ1eK8pzqfRuJV6QsKBNQgYhttOt4aPOrdAaBejkAaQB3ksxu0vG+6aOvF/DY3pCj85KjwtW/rFm2Q/vX8xycBO5NoJGx1MvjO8EW3wwDrk8e9GJDn+qRHfHyfpMSKLMEQQhigEvUJbqkc2v3bfFj3PDtgO4sXaczXPzCSQYfWg6JoIDzDFZvL/YywqlcG0SHpQn99x6TxgT1UYcDwB0ALgXx44LbHX+2kF9CqZqtgTmN V+kPbCMg VCNHlDMYQyab26oR/OF6j8VRJMH/Lz9xW2FLsxlVu5VIZ2aw1ZPH5tOXk22hWykt7WzTvpRRXy1U5CZcPFf8fopbfCVu5i9lng2yEGX2bJvEHYw3r6RqpVl1H/0c/vchn24YBgemP07oAXI01aAS0bDQcFXsNUgtwgwEXvqmkb1vlUrbif2QoakDlcB8J/JfN+i2Gt0J0U4ESL9k4Jsrw2RGzHXw5vKHntrUn8OlV84rzU+XrdmDw0wyCl70Xs6cyiwLYoikFROhUIZZDQyt3PPRAhvRYb2Rbuk7qb0Nd+2b55RgkMUoO9J6YBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.034354, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 10, 2024 at 8:32=E2=80=AFAM Vlastimil Babka wr= ote: > > On 12/10/24 17:20, Suren Baghdasaryan wrote: > > On Tue, Dec 10, 2024 at 6:21=E2=80=AFAM Vlastimil Babka wrote: > >> > >> On 12/6/24 23:52, Suren Baghdasaryan wrote: > >> > To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that > >> > object reuse before RCU grace period is over will be detected inside > >> > lock_vma_under_rcu(). > >> > lock_vma_under_rcu() enters RCU read section, finds the vma at the > >> > given address, locks the vma and checks if it got detached or remapp= ed > >> > to cover a different address range. These last checks are there > >> > to ensure that the vma was not modified after we found it but before > >> > locking it. > >> > vma reuse introduces several new possibilities: > >> > 1. vma can be reused after it was found but before it is locked; > >> > 2. vma can be reused and reinitialized (including changing its vm_mm= ) > >> > while being locked in vma_start_read(); > >> > 3. vma can be reused and reinitialized after it was found but before > >> > it is locked, then attached at a new address or to a new mm while > >> > read-locked; > >> > For case #1 current checks will help detecting cases when: > >> > - vma was reused but not yet added into the tree (detached check) > >> > - vma was reused at a different address range (address check); > >> > We are missing the check for vm_mm to ensure the reused vma was not > >> > attached to a different mm. This patch adds the missing check. > >> > For case #2, we pass mm to vma_start_read() to prevent access to > >> > unstable vma->vm_mm. This might lead to vma_start_read() returning > >> > a false locked result but that's not critical if it's rare because > >> > it will only lead to a retry under mmap_lock. > >> > For case #3, we ensure the order in which vma->detached flag and > >> > vm_start/vm_end/vm_mm are set and checked. vma gets attached after > >> > vm_start/vm_end/vm_mm were set and lock_vma_under_rcu() should check > >> > vma->detached before checking vm_start/vm_end/vm_mm. This is require= d > >> > because attaching vma happens without vma write-lock, as opposed to > >> > vma detaching, which requires vma write-lock. This patch adds memory > >> > barriers inside is_vma_detached() and vma_mark_attached() needed to > >> > order reads and writes to vma->detached vs vm_start/vm_end/vm_mm. > >> > After these provisions, SLAB_TYPESAFE_BY_RCU is added to vm_area_cac= hep. > >> > This will facilitate vm_area_struct reuse and will minimize the numb= er > >> > of call_rcu() calls. > >> > > >> > Signed-off-by: Suren Baghdasaryan > >> > >> I'm wondering about the vma freeing path. Consider vma_complete(): > >> > >> vma_mark_detached(vp->remove); > >> vma->detached =3D true; - plain write > >> vm_area_free(vp->remove); > >> vma->vm_lock_seq =3D UINT_MAX; - plain write > >> kmem_cache_free(vm_area_cachep) > >> ... > >> potential reallocation > >> > >> against: > >> > >> lock_vma_under_rcu() > >> - mas_walk finds a stale vma due to race > >> vma_start_read() > >> if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(mm->mm_lock_seq.seq= uence)) > >> - can be false, the vma was not being locked on the freeing side? > >> down_read_trylock(&vma->vm_lock.lock) - suceeds, wasn't locked > >> this is acquire, but was there any release? > > > > Yes, there was a release. I think what you missed is that > > vma_mark_detached() that is called from vma_complete() requires VMA to > > be write-locked (see vma_assert_write_locked() in > > vma_mark_detached()). The rule is that a VMA can be attached without > > write-locking but only a write-locked VMA can be detached. So, after > > OK but write unlocking means the mm's seqcount is bumped and becomes > non-equal with vma's vma->vm_lock_seq, right? > > Yet in the example above we happily set it to UINT_MAX and thus effective= ly > false unlock it for vma_start_read()? > > And this is all done before the vma_complete() side would actually reach > mmap_write_unlock(), AFAICS. Ah, you are right. With the possibility of reuse, even a freed VMA should be kept write-locked until it is unlocked by mmap_write_unlock(). I think the fix for this is simply to not reset vma->vm_lock_seq inside vm_area_free(). I'll also need to add a comment for vm_lock_seq explaining these requirements. Do you agree that such a change would resolve the issue? > > > vma_mark_detached() and before down_read_trylock(&vma->vm_lock.lock) > > in vma_start_read() the VMA write-lock should have been released by > > mmap_write_unlock() and therefore vma->detached=3Dfalse should be > > visible to the reader when it executed lock_vma_under_rcu(). > > > >> is_vma_detached() - false negative as the write above didn't propaga= te > >> here yet; a read barrier but where is the write barrier? > >> checks for vma->vm_mm, vm_start, vm_end - nobody reset them yet so f= alse > >> positive, or they got reset on reallocation but writes didn't prop= agate > >> > >> Am I missing something that would prevent lock_vma_under_rcu() falsely > >> succeeding here? > >> >