From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27A6EE77188 for ; Sun, 12 Jan 2025 17:35:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 871B46B0088; Sun, 12 Jan 2025 12:35:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FC346B0089; Sun, 12 Jan 2025 12:35:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 672C36B008A; Sun, 12 Jan 2025 12:35:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 491936B0088 for ; Sun, 12 Jan 2025 12:35:40 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 83F471618D2 for ; Sun, 12 Jan 2025 17:35:39 +0000 (UTC) X-FDA: 82999501998.21.8A0F7EE Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf04.hostedemail.com (Postfix) with ESMTP id AA38E40008 for ; Sun, 12 Jan 2025 17:35:37 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=R2uL71x+; spf=pass (imf04.hostedemail.com: domain of surenb@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736703337; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+ION3ihCMSrasQLEd19APpd5G2BGVh9rojd8Gq+kTIo=; b=RlZ91PsFCQCTabzKQMdedXP2eCz+9dgsLq9F00U85lU7iYRx5SRQ7B74SZ43KXfn+Is0eB UX05yiPYW1pVt+/XWh3uaS5TwMqYBGxAok5doXMneWvMliZsc7WYXMTfS9bBHCTBI0MY/G /n+L6XkBX30h7m6xU6NTk+n3s392Hus= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736703337; a=rsa-sha256; cv=none; b=0rX6wcklRjb9QEm7KYSXqczCCbOGQ6dq6s6ElKjpme95kOf8O7I4doK28aJchOS01VH/3M WnTwSP+E3G2koDDbaNlefVviw0l58bWrn6g77V3RIIPseLSSL0O0JAFqcnEDJAC57H8Bbu EFHsLk2BTySmt76vcGb9ByJRDIrLi0w= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=R2uL71x+; spf=pass (imf04.hostedemail.com: domain of surenb@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-467abce2ef9so270611cf.0 for ; Sun, 12 Jan 2025 09:35:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736703337; x=1737308137; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+ION3ihCMSrasQLEd19APpd5G2BGVh9rojd8Gq+kTIo=; b=R2uL71x+lJEvEOMI6q/KfTU0mWvYWsemeXoVvjbyJiHnQ5QYbpgx7zyCaDxzvxlrju QjjMzPsdhneOhG3Xu//ZJ4h2bxik8qopZSr4H2+oCRjaiYPEp143+7aF2m37sIVsjqXH 6e+ON5cZD8bsF172IQgL+E5xUQsI73c+CCC57UV54BOcSPQfCTYfF9UIEAE/e3spAqhT 7A4m5+x46F/W+u0rvmI+CYgw8a5wtWcmZdljolROIQg2QdQxswJ9JCG3FQmuuHYJ+nmN XTQ+ZbUck/zzZgyjpdLaIeV0xelwPknKsfnomAkPoeiQRt1MM6Fqn3y+WIU9WFvLYv/i Ee2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736703337; x=1737308137; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+ION3ihCMSrasQLEd19APpd5G2BGVh9rojd8Gq+kTIo=; b=AqYs4WPjAR/sT6ijPJEPKfNl8W1J2/vmE3B5v9bnWUz+AQG9Dtr/UDyU7Wuuci1d4J SFQhzf37f4zNCTM606QVxtCiDnMVOr6wZqRmXWZR4V3CwwYjDxvOHqrBz45jTTfI0LJO uhXrzT5i78TUIm7t6iXaXF/+qOKWLU8MOpjZjW+8pBv7qVzNY149A3J5bQJMmrrwDN4+ u6dxKfhz5BrIaHpxOVMrx8fpyHUG5AFKa6Z/+NsEnqwYc1l6Y6NKEUyBze44OReZDA89 ArCcjQoWAd5Q5MMVSbI60x7BOk2FF6hBWhHbu9kjZ71x/CWTECE4AKkqSxNXqwVfM1qO p1bw== X-Forwarded-Encrypted: i=1; AJvYcCXhcnKm/rpoD7FBuUw62mX3tWjFkAxgqrmKp7IZ56HoiukVX7t3oPBqSBN/iF6sBzzAaHom4rRTyg==@kvack.org X-Gm-Message-State: AOJu0Yy06uE546PDGhbYahXpie5FiTZX1xSWIwyBsBe/zp3coDHm4Ddb BsBBaHUf+kNBbDE9VYBTSgbJ8HU1h1h8E7k8as7DrDDe9gvZv191XVoNDxyl4LueegyqYhZ2FGv MoFfzjwolAidsyeo9SYsCoGUns4xmAWQH+DGe X-Gm-Gg: ASbGncuIGEPWM3hV3jZ5G7mrOSOzGHEetx17j8pODtbIVmeJuRg1T6FMHOsB2gwyiwO EKw0yKqq9xzuVFXZ0qXBf6RgSCJI2RlDAe8Zcpw== X-Google-Smtp-Source: AGHT+IHoxSINx+2fiNlkKp2HO13xmVVraw7w+qFlcY7fPDOqtIzMqgIxOIE7Rx/nv5vzXl7EDXd8ukieRQJ6ZV6BWXg= X-Received: by 2002:ac8:5841:0:b0:447:e59b:54eb with SMTP id d75a77b69052e-46c87f3c822mr8122881cf.26.1736703336467; Sun, 12 Jan 2025 09:35:36 -0800 (PST) MIME-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> <20250111042604.3230628-12-surenb@google.com> <20250112025935.7mxi3klm5ijkb73m@master> In-Reply-To: <20250112025935.7mxi3klm5ijkb73m@master> From: Suren Baghdasaryan Date: Sun, 12 Jan 2025 09:35:25 -0800 X-Gm-Features: AbW1kva2mBDHBSmf1-5Mq6xF2AHsTLH26HaRuuzCHTrw8AQ2JneMn0NgAFJp7KE Message-ID: Subject: Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count To: Wei Yang Cc: akpm@linux-foundation.org, peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: AA38E40008 X-Stat-Signature: 5483qjnp3zmcmpmkiah4uogwkgs36w95 X-Rspam-User: X-HE-Tag: 1736703337-397376 X-HE-Meta: U2FsdGVkX1+pC5pJwuX//oTb9ZpjUzb1OMylRD3iNrIOGMUhTzW5YQQNEnjxqpRFTthuNJUY5yh2i2cK3cXmc/b1NVFXsDBjjBqLRNIRTx90iDhHl+ihdnLkyYSw4dVucWZsS86qgdqkJ9P66xr8F1prwghxlxFOlkcyAb4r7WbiWwKUMQarGMI/HWngdFoBF5hx07WfPdb7x+YFF9W/VOkZiJkS+sBlmQC1GVCvzbXr8nihgHoisEkWdlupXW0gzA2rlpX4C4vYJjoodgA8dCpEwJ2GGPmUq/EqGv6+GFQ7PXc2C18vO/6KuEr/XH8MnEPIkACbxnXDQusQJ+0O1wlXa3oVf9sDa1n5zjRczvU7uNRAXXzSN1AHIY9Jxo0EjsjPIe88fAk6RTZNi59H3CS9M3A8H3al9RTNrjqsuVJanAuMnqHvnDxVp00pMnHrLvzznUhGgNVgCJ+DDetHuOgVl/8Qbp7RNHytg4E5ykTYq9AQ/E+ykTqZXzrwIrtqRed8GBXphsXkRce6lMlWIQDCYfm6PkjamUaURepMGjJ+uAMBhIdiYxcFjeMR4DKRwMs7N544HYdIaOap1OGVaffmmw8EBHABo1R4PmrnILqzZ9/rh74XKxhzUWiUrSzMWX32qLq/Le6wMw9Tu+ge4GvlDHeQPMfYPp0r9RkX64KlwAdw45qciVxJlktr0adDblDUWaZOnQraynawU6L0wj86sC8YKCTkFoj3WdcvrBRW0okrNxW1fNVap1niky5MvQaTH4SnzOfei2IuEr9hItc7go7BUvJBRoZRzB83ZS10+KCR6jfBLj89Zv5DLgHSkoqlGTaW3ZWGnXAZq/BQiHXtKq1PSyi9mvedvmLwed7fUtr/pCgULyPeRVNll9MyPKDNmBKDmuh+RoxZqZYot0MhWnAJ5Q/CyiET2QzSHwtyKujjkwgq+wzNUL4xHpMXEp2jPbTcPv7l8ouGSSt V13Hxlw3 FdAfj8+mwcNLhXeEEJh666nRuDHXi+3gH681dyhGpk6H7iDPWDGNQtcUObDkE6fqvb5T5xniwndxzcBgikGyHGtIyrG30x1zlVtYPZA4WXKGfb4m4SvYH5ADDklT3XJj9CDkud0T3MwwP/JVvVIeArI4IKG2B2j87zLbtCxW9c+6eD0W37NLzzfhwk7J+goOIct5PfKdnsIvZt+vF2OKRRD50eZNeYrW6OIUg/VPzQqR1XZwFVjhSlUqZ4E6ejfxt7ttVDnZ7EH2EIfMnDRPDLonDaEr1uDg6e+XCe+Wf8k8CRmmNE42/SwyqR/x0LAnSTWmXV1Gre7miBhfnORX+CxQoePDsnL9vhmpkccfP/6cqAAVpIk6SxqZUdw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.032465, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Jan 11, 2025 at 6:59=E2=80=AFPM Wei Yang wrote: > > On Fri, Jan 10, 2025 at 08:25:58PM -0800, Suren Baghdasaryan wrote: > >rw_semaphore is a sizable structure of 40 bytes and consumes > >considerable space for each vm_area_struct. However vma_lock has > >two important specifics which can be used to replace rw_semaphore > >with a simpler structure: > >1. Readers never wait. They try to take the vma_lock and fall back to > >mmap_lock if that fails. > >2. Only one writer at a time will ever try to write-lock a vma_lock > >because writers first take mmap_lock in write mode. > >Because of these requirements, full rw_semaphore functionality is not > >needed and we can replace rw_semaphore and the vma->detached flag with > >a refcount (vm_refcnt). > > This paragraph is merged into the above one in the commit log, which may = not > what you expect. > > Just a format issue, not sure why they are not separated. I'll double-check the formatting. Thanks! > > >When vma is in detached state, vm_refcnt is 0 and only a call to > >vma_mark_attached() can take it out of this state. Note that unlike > >before, now we enforce both vma_mark_attached() and vma_mark_detached() > >to be done only after vma has been write-locked. vma_mark_attached() > >changes vm_refcnt to 1 to indicate that it has been attached to the vma > >tree. When a reader takes read lock, it increments vm_refcnt, unless the > >top usable bit of vm_refcnt (0x40000000) is set, indicating presence of > >a writer. When writer takes write lock, it sets the top usable bit to > >indicate its presence. If there are readers, writer will wait using newl= y > >introduced mm->vma_writer_wait. Since all writers take mmap_lock in writ= e > >mode first, there can be only one writer at a time. The last reader to > >release the lock will signal the writer to wake up. > >refcount might overflow if there are many competing readers, in which ca= se > >read-locking will fail. Readers are expected to handle such failures. > >In summary: > >1. all readers increment the vm_refcnt; > >2. writer sets top usable (writer) bit of vm_refcnt; > >3. readers cannot increment the vm_refcnt if the writer bit is set; > >4. in the presence of readers, writer must wait for the vm_refcnt to dro= p > >to 1 (ignoring the writer bit), indicating an attached vma with no reade= rs; > > It waits until to (VMA_LOCK_OFFSET + 1) as indicates in __vma_start_write= (), > if I am right. Yeah, that's why I mentioned "(ignoring the writer bit)" but maybe that's too confusing. How about "drop to 1 (plus the VMA_LOCK_OFFSET writer bit)? > > >5. vm_refcnt overflow is handled by the readers. > > > >While this vm_lock replacement does not yet result in a smaller > >vm_area_struct (it stays at 256 bytes due to cacheline alignment), it > >allows for further size optimization by structure member regrouping > >to bring the size of vm_area_struct below 192 bytes. > > > -- > Wei Yang > Help you, Help me