From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A675E7719A for ; Sun, 12 Jan 2025 02:59:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 246066B0093; Sat, 11 Jan 2025 21:59:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F5A46B0095; Sat, 11 Jan 2025 21:59:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BD106B0096; Sat, 11 Jan 2025 21:59:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DB3096B0093 for ; Sat, 11 Jan 2025 21:59:41 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6C5D6B0D4C for ; Sun, 12 Jan 2025 02:59:41 +0000 (UTC) X-FDA: 82997294562.20.243C87A Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by imf01.hostedemail.com (Postfix) with ESMTP id 7491940003 for ; Sun, 12 Jan 2025 02:59:39 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DUqg32OD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736650779; a=rsa-sha256; cv=none; b=bCcWahW/6jhbpyzliglqCCWIVsO1L7dKZOS9NpOE1PNxT0GbxInm8yn2DkBoUwXpGui0// dU/171eZAJL8l9AdpiHg0nuHCfYWt6wOJskKdxl3GXQDsQT6jnUd7jrwU2crI2Q/7DGIo5 p4FPWQc8Et3XoYXTR1O42HAGcGbscD8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DUqg32OD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736650779; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RrqPRw3aH8IZUqpi/lxgVfYANsNMzxjEAHaY51a4Nm8=; b=xHwZ4L752CCKlwyrYEe+3ie0LIheTOlosRyejYNXfK5yEcj5Hn8k6BKgHVPiw5FpPaqHo3 /O1jutejoSKPgmm61Ex3cYgKA1D1mcdYU2osldU2mOgFIhq967kbrGQ9Cp8F4qrg7j6NNC DYSlb/WDidk311yRVWrmnfPVo7aS+Lo= Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-ab2b72fb3c9so570064466b.0 for ; Sat, 11 Jan 2025 18:59:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736650778; x=1737255578; darn=kvack.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=RrqPRw3aH8IZUqpi/lxgVfYANsNMzxjEAHaY51a4Nm8=; b=DUqg32ODycjpwn49RQ3muAxP5NVmqSksk6dqfA48eMplhA7JeXuzkPfAXsyBexurzC wmzF/bRkkKfkJfzQYBuIfHAUSfnK6QpbTmSKll0R0q9M/vk+6JR3esR/3XD7h7hCM8MO liRRjhHCXyts7mZDq933cXUeQoK1Vtj33sNWAFyfK40LhOx0RNyXi8JFDoXBvRPPZzug SwRzaKk8RV6A1mgne0jORhKZgtUufvor91ARWZT9e+3PpAWHiPXFlkvqAf+XQuMI6K16 x+RIcanXCYLgczsGuIgguq98/kHQOFGXEooLJTIekwJa1gNC6r3d77EIQ0XbpHk8tC7R EgAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736650778; x=1737255578; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=RrqPRw3aH8IZUqpi/lxgVfYANsNMzxjEAHaY51a4Nm8=; b=hoLo42Xxzh8Ev4gE7juxpTKcFlEOlbwJZyUaNK7kDBVWn6GheIcu4cIJli+2y6hQR5 BrM0xwzD6Maw/kh3t0aonQiQf0ieUBU6t17RGZQ6kYvQU982jWzLrBnuXnnNPEKnhQH8 3D0pWOmhT5zmX+6H+Sp1Z7/Kln2yLynSIS63YlzGUnMifMhCU8dkZN7HmTdIUloz5/iF uSv6V8HJaXB5Pmb+Ep8kTE0yZ7VO4e5qZ8asbbvEB3/XAv/d8EkHMbjOWm/qxCpXcX9x 1UTL206La6y+rHRJt8KDCXz9dICokrauop08tjo2PNHVPEOUQ6q4RJyt3neqtQiCWGL/ wAkw== X-Forwarded-Encrypted: i=1; AJvYcCWAP+AIK2CS0xudpYf0O0DhxDfY6vJ9te43KHNN2l7joMws9eVNRDJdVPV9AObU3qCuopD0siAQ5Q==@kvack.org X-Gm-Message-State: AOJu0Yw8d1K0Q0wj9NaAwuqvhsBxjya5kuqdavtgFF2nCn/3xv97ETyh R6d94pA+8I4Ta2WZ7889G5CPRBfa+xm82dQehs335Bd1kgZnrF/o X-Gm-Gg: ASbGncvR5ZaGaZLlc9U5qskQeMbkGNUdaOLW5czVRdUzviUWCnKbNUe2ttc8HLe3OeW CGofiCvEymG0fPAoWGlzbv3HcfHUllMVN8zCKgGDeEI1vVPed/XsdPJqH7AVa4Z9RNIIrkEsyOv fJeGOCZ2Qs4Cmx94AzfkeU1Zd9bji9w0yaTGMM91HelfKs0mMY320+EciymGMm+yyYSzh1+cpNn nheuFu36+ldINIH27BfeINpBM9K3nXkMus27rJmPzVgLLmQr/Fv0xxM X-Google-Smtp-Source: AGHT+IGL45YfS3dew6ix676E1fwhmaYpN8O7dGGDaVNtkw9TXBYT04xKHksdRqa56a7bea7Mg3DFNQ== X-Received: by 2002:a17:907:1b17:b0:aa4:cd1e:c91b with SMTP id a640c23a62f3a-ab2c3c451a3mr1137071066b.7.1736650777876; Sat, 11 Jan 2025 18:59:37 -0800 (PST) Received: from localhost ([185.92.221.13]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ab2c905ec09sm326542966b.32.2025.01.11.18.59.35 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Sat, 11 Jan 2025 18:59:36 -0800 (PST) Date: Sun, 12 Jan 2025 02:59:35 +0000 From: Wei Yang To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count Message-ID: <20250112025935.7mxi3klm5ijkb73m@master> Reply-To: Wei Yang References: <20250111042604.3230628-1-surenb@google.com> <20250111042604.3230628-12-surenb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250111042604.3230628-12-surenb@google.com> User-Agent: NeoMutt/20170113 (1.7.2) X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7491940003 X-Stat-Signature: czpgphb5boj1yaqn6mqjstyhmhr94pzm X-Rspam-User: X-HE-Tag: 1736650779-9623 X-HE-Meta: U2FsdGVkX18Q+CVtC7v50Rjtf42beKF5GesUwy2L7kSbOvLFtUtdDsG/7xaBaqp405UQTv+UGeQtXqBm4/MonKD5FGl38TtEHI1ys+hOEveuU1dScbC3nrweJZwhHQTMtbk4ri5n4JzDmBFN2koUvQtSFKHyapgCPXgfgvdf984PWncUb/w+xZWF6fDCJbCyR/K08iViBQSvpAgZo5NX8gO9lZdZMGA/Qd9STEmrUlYUFTwyJdj356t/cNFkLA2BhVykSr7E3EwFDig8zIvDHT4FeOu9llrVpXoZrTkceCbBpFJ7x03sbazUhAEW7HkPFRiDwicWGC09Bhk+Pk3HMV5xRjpVbo9CXXkWQ3xsP6VRi242/vMrSz1fExK4pvwdfME78YGTQVxL22UiHZKRiSUWxbrMr1qzi6pT/IRxDGsVCfILcMXyRQ3SaGHwhzspj8bS/hscD3DekZW8wc/mVB1mF5txkOghVZk/4Lu6XPdK2MTyJ6pqjZx3Dc7HdBzg5HKifjURcMG6/Hq6+WQcRSeYrRggWCPUV73YR8ha2O0EoIjRdwzDqOcKO3SKwlaQFUOT9grUrzLK6N0DQb99jgOr6c4ix1qkUMbl/4jrWw783wnUYhtYat9TUN+c6LQUOrFD6pLUyW5dG7QoDkLLcBRVpRkjsHW+lcxJWxN9KQvVRxn1rsDorlxR3hmcDNM2Jr/+Vj2HkRR3T28E+QSjOiPkzRRLwa90Mg7d5thD67+TLnnCoRowdy7htsumVqSVspvxqv0Pd8i25o2Mn+YnU7DbMD75b+ofuGCUmdl/ujzH6iWTZo0m66Uf8iIOGpIYVFgPlBs7YJ/2XjkN9v/XdfCERQ98q/M5HDTh99XHlbPcZmnz5RclVPJFQoLxxtPUOXe8fj3AHwOnFqvkq2bP/EE8KSCV0VYBr8eV8CRDoP5xhVmpwpnRHgy8FZeTgSwHQticWZ7JjC4sI9nJtTk GYCrX0a6 5gHM30h6b+URsosxw5kAIzLZ0EMM1hXbENgyiyDthLFr/sFYU0EejVAXDSCDFsWSpVShJwGuiVWBr+CY0Ne80fFbtyh17EdH/cN/YbcXmaOOYR7X9m7FWszH6Sy1QNi9bO30fP2DjBN5E3RI+xUz0sovyb1lCmv9C+xb+3mipkoG4i81HWFeYpLHd+REzzpLDlnrgXCyFl/SaPq9wito6uvH55MqgJMdIKrX+gEdBLX9yEDIZV5ltPXycXUIxiaXEO8uORIG42QodJptcPQpUMF2k53fMgIKEF+I0oM/nzfpLQvRy7Xv0LamMQuXm4lFYnV2mHzYCl0H9COt+FVEd9plDGLnnfoXK+zU8VxnkNzlDKUfgTMgo5bAMd1hNWa1Us3ZHI57iwQVQ+2yP56YE5L1ZmRO0L+FNn6xGWsaA7wk83EiZXFORGf7fJPlgakIGsvPjwzFtw/rU4BvKg8lUoCFIEE3JG0/If8JIVhq6w7r7SwEfaLVFo+pzwEmzyzoLJ8hqts6NebXlwcvOT9OznKVyol2i3LMQcC3wms0HHfxeiTTL6Kp53KGAgONSdB3zeB2mss48zB1FiCjNUbx3JI5LP4umFNXMvYg8SFDyh1mV9PBe1PEgLfiP+/8wsvjn1agTOJuzz+caq7UNjQfx/j6VVvBAlwDz/DwqfMyXqFJ4oQQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.002780, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 10, 2025 at 08:25:58PM -0800, Suren Baghdasaryan wrote: >rw_semaphore is a sizable structure of 40 bytes and consumes >considerable space for each vm_area_struct. However vma_lock has >two important specifics which can be used to replace rw_semaphore >with a simpler structure: >1. Readers never wait. They try to take the vma_lock and fall back to >mmap_lock if that fails. >2. Only one writer at a time will ever try to write-lock a vma_lock >because writers first take mmap_lock in write mode. >Because of these requirements, full rw_semaphore functionality is not >needed and we can replace rw_semaphore and the vma->detached flag with >a refcount (vm_refcnt). This paragraph is merged into the above one in the commit log, which may not what you expect. Just a format issue, not sure why they are not separated. >When vma is in detached state, vm_refcnt is 0 and only a call to >vma_mark_attached() can take it out of this state. Note that unlike >before, now we enforce both vma_mark_attached() and vma_mark_detached() >to be done only after vma has been write-locked. vma_mark_attached() >changes vm_refcnt to 1 to indicate that it has been attached to the vma >tree. When a reader takes read lock, it increments vm_refcnt, unless the >top usable bit of vm_refcnt (0x40000000) is set, indicating presence of >a writer. When writer takes write lock, it sets the top usable bit to >indicate its presence. If there are readers, writer will wait using newly >introduced mm->vma_writer_wait. Since all writers take mmap_lock in write >mode first, there can be only one writer at a time. The last reader to >release the lock will signal the writer to wake up. >refcount might overflow if there are many competing readers, in which case >read-locking will fail. Readers are expected to handle such failures. >In summary: >1. all readers increment the vm_refcnt; >2. writer sets top usable (writer) bit of vm_refcnt; >3. readers cannot increment the vm_refcnt if the writer bit is set; >4. in the presence of readers, writer must wait for the vm_refcnt to drop >to 1 (ignoring the writer bit), indicating an attached vma with no readers; It waits until to (VMA_LOCK_OFFSET + 1) as indicates in __vma_start_write(), if I am right. >5. vm_refcnt overflow is handled by the readers. > >While this vm_lock replacement does not yet result in a smaller >vm_area_struct (it stays at 256 bytes due to cacheline alignment), it >allows for further size optimization by structure member regrouping >to bring the size of vm_area_struct below 192 bytes. > -- Wei Yang Help you, Help me