From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1C87E7719E for ; Mon, 13 Jan 2025 00:59:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 618B26B0093; Sun, 12 Jan 2025 19:59:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A1996B0095; Sun, 12 Jan 2025 19:59:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41BCE6B0096; Sun, 12 Jan 2025 19:59:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 203EF6B0093 for ; Sun, 12 Jan 2025 19:59:46 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 91EB9A1A41 for ; Mon, 13 Jan 2025 00:59:45 +0000 (UTC) X-FDA: 83000621130.14.2835CFD Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf17.hostedemail.com (Postfix) with ESMTP id 65FCD40009 for ; Mon, 13 Jan 2025 00:59:43 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GQ8yUD6l; spf=pass (imf17.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736729983; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LqoVsr+gl+qU6mNlGZCrGV8/sQ0hkkQo1zBCKybybuc=; b=tOd53O0HqSHN93UELd9KAgLBGv91DrlcVnB4rkEnUJKGHGrw4nP4juH5PqtLfDoyoGkQU0 Ua0zCW7ok9aH41dNtCE4yQwtOylMPfNf9TYiQCe5vOfWNYauHj9SJWgAEcs3AwnPAxwnW1 +dY+HFrZzX3zQ8y5WjaC5HMgWebnMjc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736729983; a=rsa-sha256; cv=none; b=TJs+2ZScmNMNLz9cEJrB/m2I+slC59LYufYGH8uGAQIrUgEPe4HSci+LDE5rjmvvYyWxoi ZwiWTV8X4UOnPprCTCbyHvOL9vVHn0P6+welYpw5LEhQYlxj5zB8ADPw0vbRZcKVSDClod qR0vjo+sTeTaY/MEZLAzNsrE7NscAJU= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GQ8yUD6l; spf=pass (imf17.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-aae81f4fdc4so744287366b.0 for ; Sun, 12 Jan 2025 16:59:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736729982; x=1737334782; darn=kvack.org; h=user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=LqoVsr+gl+qU6mNlGZCrGV8/sQ0hkkQo1zBCKybybuc=; b=GQ8yUD6lJe8vbUeEzSSHr1J0QChCGWsRvuNneomIuSSt/o4V3ZV9JqLZYCs/h3GwIT mM2fbQoIK5643F5lYkyDfSmWz8/h2AKchTvhVbr7iWcGS7DWHsz4xV8yuyOfE/HSs0d0 j9TINptX8hZOvYqRNOhrI7uzTD1LhE4D5S3YsdAwSt+Keq6aTsldKjOdM6PuBBkstpG5 XLtanIIW5V1WKKyo/5sZMKPdYdi7CT4gVVcAW3J9EHc0pcw7iUf6I23A6cy2bV99yhX6 KOdAr3fdcZEjTzIVGJ4PS/jy18Tog8JEtYUmS1YWqK/5TkBKgYtwNgM3mir8npU0LOtl xoMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736729982; x=1737334782; h=user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=LqoVsr+gl+qU6mNlGZCrGV8/sQ0hkkQo1zBCKybybuc=; b=rtDfDCsfLda9wW8x7iymB3zHsFlOppb96nWvqaaZTCxA9wMCKe0GsdgHwrJUWBRVDN NZtp+8UAr168d/uHKwMR4qUSS5Vf/atZLyUu4iMr6D0zNcaVrBAFOBur+HrIBM2/62WM nssUcBk0KEK7Ts3OBWc7EyYK9WT6pU6Ujc0WPf/tja9VCmxzWAiHcBU5QEQfDGWd8fu7 bnifyyn7gT/rcOtvSvanT/3eci6jQ8/XlNFQLlaTQVp4P19q+A5e+rDfPLDvy64VtN4C p5+x+sCQLBSjwecZWWsAv15VgMUNuEfbBY6bgYI6q5Lk7SQ3ZocuNvqXadGhYumQOQ7z p94g== X-Forwarded-Encrypted: i=1; AJvYcCW5yPPbwwIXG7bOQEnQifNchecB5bR/TQeDa+yWmEcLPpx31+ZB/JZWhqm7CXnRaSHGNJ8YrDw95A==@kvack.org X-Gm-Message-State: AOJu0YxNdYNBopXsqwhKagYM0mFcR4QlH5QFI9yAg63otuxjCCsczQ+f K/6rTMZw1FEb+y31/ZCIWDZxMYyQdYXt3n9q7RE12ACZ4zkuEPN8 X-Gm-Gg: ASbGncvykas/ecd+kTEFYOOxZvWJhaN+SxIsxg/n6WIuNiQV70j6HYXcHg7qnZ5aVp8 6ZxkaeOve3tc1+Seu9Rid+E9z62EqzVVn0McBFaemsaX6oUdP6tAMHbv6q1lqfT8pWakzGGbafm voxlKp/Sdfbu91iXJB+SXAJPuHMd6HQ+1ZjWcJjvlQsEyemKYteU/p2sUT9SF1vvHYbnSZrEnK/ f7bkqOe4rItKkjUGMvrNM0EIcGjtzfqmHc3aQ6bmdlMH+SnBNVB6w5M X-Google-Smtp-Source: AGHT+IH5JODZaxJICK+di56Xe/cEypRJHvNMlxUm/IAxEz/jWwFJvuN3fKscCjadO1CYYI++0Evezg== X-Received: by 2002:a17:907:7f24:b0:aaf:123a:f303 with SMTP id a640c23a62f3a-ab2abdc5a93mr1702565066b.55.1736729980973; Sun, 12 Jan 2025 16:59:40 -0800 (PST) Received: from localhost ([185.92.221.13]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ab2c9060e27sm439573066b.13.2025.01.12.16.59.37 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Sun, 12 Jan 2025 16:59:38 -0800 (PST) Date: Mon, 13 Jan 2025 00:59:37 +0000 From: Wei Yang To: Suren Baghdasaryan Cc: Wei Yang , akpm@linux-foundation.org, peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count Message-ID: <20250113005937.b5xp67uksxyf3g7b@master> Reply-To: Wei Yang References: <20250111042604.3230628-1-surenb@google.com> <20250111042604.3230628-12-surenb@google.com> <20250112025935.7mxi3klm5ijkb73m@master> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 65FCD40009 X-Stat-Signature: 44cb1srz5qnapaijktg5xr7ip83t4dpr X-Rspam-User: X-HE-Tag: 1736729983-294428 X-HE-Meta: U2FsdGVkX18/GitYHzhliS/5zKFSUcVWJZhFaubq7f3ioaTji+xhzONQA65xtfKcL5QzXonvmkMn6DSEJqwXsNlW7nR4SjG3TeebL46FxduJcXuvGqKAdly/qWJvAFSW/8QPd/UdVCckwcnFDyNStLsAg77Xh5xUYD/cWXIyVAx3UwVGXlZu+MzOMmYzaH+XQSZ4viriDofMyC8W6aZvwJ33iiO8q5Sf9ljpvJ/DtyZ9GCLclPQGUxGRD3W8aS4Z5H8buAx9m89s/CZyMnR3MPBL0EB1h/PgbQ0OQsxUnf32pyEVnxC6BX/VEX1gSVHpAzownMN4xzAf3y+IZeLjly9SqUudhl6uiFUMRMwb7GDIWWpyWC2DPa+ik20jxWnXGbvcneD6UZw1o665epmZ0+OHXtFO2aT3PKdpkkOOE2CuQlbo/Ix4MpVpJqarWGJFBvRGNeOjrL99CstQfpxNngY1zOvjhhaj/LlVbN5T6A5MdhP+rNr3cllJyiGsf/XED6AchHK9WEq4By7d44/wJOVeu8qfujzc7iNCfr+D1Wyj5OMU2TaDmHZjl8wWg2rjYBcC7aTxX63GZpRcp5tScBFQaLjWcuyAC6db4m4wZ3k1R8PB9DEyAQiw6XFNVGchghyoyY6ezNspZubycN9PHOAnyDqXcDIJyBRPCnurN4fYKLbey73WWpxZpBBgSKa3GVzmCZ83lZkraSTTX7DH+H428n0wRiLADqbYIKzgd2ACtJmvIqwtbc4xJMQtMM2lEsQE2wCdQoJa41ANBp4P/2wE5AQv86hZNslcSYm4T3UY5m1YRggeRjoQKfqewsj2YMVf6sTYKTcWZE3UBAN/3/KVWyc3M3fdyD5GqmqF5XlYWibU+hn3AFY57sDuEzY9iv7hOB5adCsKqq+io5Xeg0lHTD4+GEJsz6aFvRsXElSg7vDs4tzZpWPy4lQ9Ai/kMLPiXhE13Cahv2f5ELJ mBqP6uha a8DTM9fM7KXJZmJVJv1Dvhqi+7Nf5wtqj580ObevBkziArAzPkpsap0uli9psUhKlA67YfAmWEpaPf9QZhN/J9joUnlY7OeMRKS+v0On8uPnNef1cpC3l1SnzY906bklfartsh2SHgn2yZbY0/RNih2wwXW0fFH0XAggqpuWlN9c65o5vOZUXWUAo8Bdd+DCpxWEMouIVzReQhg+3axfXyPfl77/HLQ96TS4JcKXcVTt3rMfSBm6nDtd5eHErbyU9ARkf64ywoiWqNcJEIqHCTrzl+mJr6GUrtqy6CXXVhaGnmwPqikIZW28TZ5bvQspRKmNzRf4FXajXJaHAYguHYuONQa9BPuGpNfPoMaDg9FOIFMC2TNI6+JuRF+FHJ4EKVGIaAUNCpL2JiVPvkB84OGT+4eoTtuc5HE5GBetR0tYUPHE5L6ZnIChKdy6INQ/sAmz/Qt/96Ljp28R/nJhLvXJ/1xNhqXh0poJk/vBoHOKCEhBvbdID0OfH4iYC8WGyKRclj1yCUhh7UOUtnVVd/fo5U7VJfkTrSLQMIe0tGey9yT2+LE3wNVm1fFiSXN+Zj5OSwjf3mvibKRNXCCHkbklFRTGOa17NtVEhujxoWbXBwPoGafzhGWnLsuvHVSLpO1/wApw97+GWbIu6+GcLpHX/YErPhHAqaZxLSN4vxmgXK9bgT/93K4V4NAmHwdSlMlz2qABcgexTeb6p1uj4qPkrkg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.027782, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jan 12, 2025 at 09:35:25AM -0800, Suren Baghdasaryan wrote: >On Sat, Jan 11, 2025 at 6:59 PM Wei Yang wrote: >> >> On Fri, Jan 10, 2025 at 08:25:58PM -0800, Suren Baghdasaryan wrote: >> >rw_semaphore is a sizable structure of 40 bytes and consumes >> >considerable space for each vm_area_struct. However vma_lock has >> >two important specifics which can be used to replace rw_semaphore >> >with a simpler structure: >> >1. Readers never wait. They try to take the vma_lock and fall back to >> >mmap_lock if that fails. >> >2. Only one writer at a time will ever try to write-lock a vma_lock >> >because writers first take mmap_lock in write mode. >> >Because of these requirements, full rw_semaphore functionality is not >> >needed and we can replace rw_semaphore and the vma->detached flag with >> >a refcount (vm_refcnt). >> >> This paragraph is merged into the above one in the commit log, which may not >> what you expect. >> >> Just a format issue, not sure why they are not separated. > >I'll double-check the formatting. Thanks! > >> >> >When vma is in detached state, vm_refcnt is 0 and only a call to >> >vma_mark_attached() can take it out of this state. Note that unlike >> >before, now we enforce both vma_mark_attached() and vma_mark_detached() >> >to be done only after vma has been write-locked. vma_mark_attached() >> >changes vm_refcnt to 1 to indicate that it has been attached to the vma >> >tree. When a reader takes read lock, it increments vm_refcnt, unless the >> >top usable bit of vm_refcnt (0x40000000) is set, indicating presence of >> >a writer. When writer takes write lock, it sets the top usable bit to >> >indicate its presence. If there are readers, writer will wait using newly >> >introduced mm->vma_writer_wait. Since all writers take mmap_lock in write >> >mode first, there can be only one writer at a time. The last reader to >> >release the lock will signal the writer to wake up. >> >refcount might overflow if there are many competing readers, in which case >> >read-locking will fail. Readers are expected to handle such failures. >> >In summary: >> >1. all readers increment the vm_refcnt; >> >2. writer sets top usable (writer) bit of vm_refcnt; >> >3. readers cannot increment the vm_refcnt if the writer bit is set; >> >4. in the presence of readers, writer must wait for the vm_refcnt to drop >> >to 1 (ignoring the writer bit), indicating an attached vma with no readers; >> >> It waits until to (VMA_LOCK_OFFSET + 1) as indicates in __vma_start_write(), >> if I am right. > >Yeah, that's why I mentioned "(ignoring the writer bit)" but maybe >that's too confusing. How about "drop to 1 (plus the VMA_LOCK_OFFSET >writer bit)? > Hmm.. hard to say. It is a little confusing, but I don't have a better one :-( >> >> >5. vm_refcnt overflow is handled by the readers. >> > >> >While this vm_lock replacement does not yet result in a smaller >> >vm_area_struct (it stays at 256 bytes due to cacheline alignment), it >> >allows for further size optimization by structure member regrouping >> >to bring the size of vm_area_struct below 192 bytes. >> > >> -- >> Wei Yang >> Help you, Help me -- Wei Yang Help you, Help me