From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C528E7717F for ; Thu, 12 Dec 2024 14:20:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F7D56B008C; Thu, 12 Dec 2024 09:20:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CE396B0092; Thu, 12 Dec 2024 09:20:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFEFB6B0093; Thu, 12 Dec 2024 09:20:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D28AF6B008C for ; Thu, 12 Dec 2024 09:20:07 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DB61A141ABB for ; Thu, 12 Dec 2024 14:20:05 +0000 (UTC) X-FDA: 82886515656.26.81FA6C3 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf27.hostedemail.com (Postfix) with ESMTP id 4A21540019 for ; Thu, 12 Dec 2024 14:19:36 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hdTFKP1R; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734013193; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nCVMFBpGFbUIZbev/pXMJAK+krnsD/kF9SAOCAuF0H0=; b=bGtdyeX5vu+MJ4NcZbb5bRCZ77sofetpt5BPn90hHWRjMjKn24Oda/D2aTkXc23reQC62i STJktbtUuL5eLxMMJGpNRgnDbX8sv7vaeZ/9aJ6p7x3l3A+r7R/5jBjlLZeiuS94nBDFC1 0VGNgfz/3QwAMX8ySZjG7OjpFKaOLQU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734013193; a=rsa-sha256; cv=none; b=kiCGK2Ioedj0ohDnqQ1wu1fRwVYeJh8mIfLI/yfzn2J25A4wcTA8mWGlMhIbI4H5JoLxOt o+9ULGL856B0je7ZwWLKx6NncK4pjil1trxaxjVHqGlvGEiA9ZpsSDU5ZJNnOIF0KrKYLw c4x3XrVtxeo+NskJO3eE2YDGZPDU+yQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hdTFKP1R; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=surenb@google.com Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-4679b5c66d0so181731cf.1 for ; Thu, 12 Dec 2024 06:20:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734013203; x=1734618003; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nCVMFBpGFbUIZbev/pXMJAK+krnsD/kF9SAOCAuF0H0=; b=hdTFKP1RwKqOqQz9Uvunw2h24pDv5btrwlLCNvW+Lcpu5/6hnB3CPZmb9NL2ObWy4d 1vVjo0ydjk/P6fEZJopRA0XIJXY7Jm01+Po8UHbfkVl1ID3LTmQJpvZoAv3CJyMHcGdv AJ37OG0OiBJFt3Q+b5NUTdy9QzGIZSzsN+T9cLwjEcoXFVP9VPT0mg2kN3PShlIccPWN RCK5KcZOrvs/cXJhr6J/JeZBGCBDGyQd2gyJFxOTCuQFqHiQqb23gNYtiVCvj2WrotmA YN+20F7P1U2H7YUBkPf2Z8F93m8eNMoSt38yE9Xa+SXOw9DjhiLohEY3r0psIW6ZdCB4 MxKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734013203; x=1734618003; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nCVMFBpGFbUIZbev/pXMJAK+krnsD/kF9SAOCAuF0H0=; b=W+G6FxMIxUopwTGxMykiqRIA3jHZqNQ+byt+gP33cGTqJNj1xjDvSmNmS7bzg5QHUj vIGfWjBWCDTspjJ+TzbyKyIWYdYja2NUxad0DmgA0hI8Y/ujBAJJMttDzBk5OyXUgQ2s 2qQinOK0MdnCnxhXW2zqLB92DkTH5Sx9ipAWACko7gb4zbEGrZ5gIvPaXXwhvN2+/P0u ND06QbdizHonFpbnlN/xnxmqlK8c89YpewCKuPjPBs9GomSYDPEDCkk3Bi1Pc1N7V1TA gPBOcFdFxCkvOU3IGTaLmrXcBLgONyN4LlZlbgf06fnwIsjvNhTq7hnzipzyI4JMsz2b 0c2Q== X-Forwarded-Encrypted: i=1; AJvYcCUMtagMIfhDaK5aHblzOrm8x9QLZx55Glfe0VmoiiBXaMHvNWhwll1T1flvNeUvaOypq8ALGKvuvA==@kvack.org X-Gm-Message-State: AOJu0YzmGyU87FzsaR6Zp0K71Ni4WfbXLwzDN5UZx6o3Q19v5elQO8kG 8jW1iiRSA8IuDTljyJRRNbfZ2t1do8ZlLfWqPVi3A6/4hXw5kPrw85rElxw+cN0vy/Q7Ldn+g5i 2RL8+eCL4mvuX76KqUCgSbCk7SZ6ALKigl97I X-Gm-Gg: ASbGncvxNVkCipxuLDgeaD4G85sJ6TliGTjYZi/gJwTy1VwbbOSgVIYip8hCaWq1xBG esrmHTh1XJBHlwcki+49RcW1t4UStq2Iltvg1gDe22xFQ1FeKZ7HJd5BA3jlZ7pVexiHI X-Google-Smtp-Source: AGHT+IGQD28bFCgREVH5GOLTWkXyk6w0rQZbOiNATXbvgagTXwWuRCrJ8uEbfZTGbMgsqTIRIKjOlTWvx6CUpOJPP4A= X-Received: by 2002:a05:622a:230c:b0:461:3311:8986 with SMTP id d75a77b69052e-467a0fd4b60mr653601cf.18.1734013202866; Thu, 12 Dec 2024 06:20:02 -0800 (PST) MIME-Version: 1.0 References: <20241111205506.3404479-1-surenb@google.com> <20241111205506.3404479-4-surenb@google.com> <20241210223850.GA2484@noisy.programming.kicks-ass.net> <20241211082541.GQ21636@noisy.programming.kicks-ass.net> <20241212091659.GU21636@noisy.programming.kicks-ass.net> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 12 Dec 2024 06:19:51 -0800 Message-ID: Subject: Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock To: Peter Zijlstra Cc: Matthew Wilcox , akpm@linux-foundation.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: g4jd9qody9agch1mggcrs84yitatoat7 X-Rspamd-Queue-Id: 4A21540019 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1734013176-67045 X-HE-Meta: U2FsdGVkX1/NxdxQbFg/atElhZu+71ggkbMGEhh0m5zJJvT8yEs0JnnOzeqdSlFXAm0hzxFp7GU/MedLUQXWRKsjXhBTjt20Vn0q8Hs9LlZdeCx3KPEjg1LFSRTx9+5jzcLqqJAmKnc2AK1RWWqHfhxJ6wzsxXiNAUfaLnOxlj4maJjMBMzego+6L7myRtHCzNJhsPewFuN/gh2s6A1mILs41SG6b38KkdngK+/dkQIFewsiUcIyyIUZ2fkHRhiY5S+ewtfJEdu1hVSb0JdzU+/67kgm4kt4Xxdeq7rAEOJi5DP/RB0Ru+kRS/VpYA2ZOvBz5angXK2coHbPJXaHhl6Y+hLqMoMzzK4H5szuq6UhJfaAjDJRJbtLt/GaUvB4RFpQAJSvnh9U3QXH6Qba4zrszzb/C0u6jMzZsDK/1eyi7JzHWaDeYJs4VDBe7ONMriz4+mByWKKhLixCzUrFZ5FS5X5ZSCJfDHVgds50XJfSnnJr8gzBeYHIchmde69BDnOTqrW2W7+FzqJA44eufKMpWIC/DO3sAW7pi+uu3yqH//S480nWdCu6dLTfeorg3x2l/GN3bs3kNBDIfuXy3xuRtSzeii/Qq7fsaG+Ucf6pp2InleucCLEIETja3d/YzaToPPPdT7P5l/Mxt99Gd0VU9tW0gt+KpjznnKaNGhM5OraTLA+xUOhFoNJBhLtyV27lHL/5OBJ6g9ABVOulc9SIz8cyvN4Q5a0vlMNW21CVDDJsfNj+665aK3EgV0MratcQUA+FLz9hp4DdEJqXgbgj/XF784tZS0ZiAkhCfmGCVEUF4i+D4wiSdAbXvWs7faUbYcFGBKF5SdgeE+Dj90U3QuznQ7rNk0BdDUDXHLBXIGMz0bynlB+rRnHFqKEN761jKygcbBY/DLr25YuxIYWbK4KozcHBvtmwJrDRZuyTV9JVx9GRjgX3+aLkNJsRx2vUf9anFSuJ/EFXRrR gAQOchiC BaOJDooolBTZfXg9pLNTmrcly5owYB/R/rRQQ8KcIX2PGZXewJ3HxH0/mF2WinWCim/QVFCXveLEoiZECJf12FqaUKfvRAlFMOfnaiMKTVZEkPQiRCDVihORtbXVFxIfQ5seCEpHHNdAWTJW5frGYud+Oh5dn9T1fj1hX0C8BYx/x/q5TktOKsEbU5TZDpDHRxDqNWzmSfYs2psG+aYaGiOau8bUm6YEboWmOvVbGkjMgkkymzEZJzKy1OYlxhQOXSUchxN1g2uUxPrjyhdfeFFxakF8WqoSMG4cmvEqzy/Rphr8Qt+SbiDD6E7cfwFcd5BjKBVmWO80xx4gQR2QOR14Boxd0AD1JtbGLoUlY89UK9r1JpOciWRSjod5YDqNdUgRD X-Bogosity: Ham, tests=bogofilter, spamicity=0.318947, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 12, 2024 at 6:17=E2=80=AFAM Suren Baghdasaryan wrote: > > On Thu, Dec 12, 2024 at 1:17=E2=80=AFAM Peter Zijlstra wrote: > > > > On Wed, Dec 11, 2024 at 07:01:16PM -0800, Suren Baghdasaryan wrote: > > > > > > > > I think your proposal should work. Let me try to code it and se= e if > > > > > > something breaks. > > > > > > Ok, I tried it out and things are a bit more complex: > > > 1. We should allow write-locking a detached VMA, IOW vma_start_write(= ) > > > can be called when vm_refcnt is 0. > > > > This sounds dodgy, refcnt being zero basically means the object is dead > > and you shouldn't be touching it no more. Where does this happen and > > why? > > > > Notably, it being 0 means it is no longer in the mas tree and can't be > > found anymore. > > It happens when a newly created vma that was not yet attached > (vma->vm_refcnt =3D 0) is write-locked before being added into the vma > tree. For example: > mmap() > mmap_write_lock() > vma =3D vm_area_alloc() // vma->vm_refcnt =3D 0 (detached) > //vma attributes are initialized > vma_start_write() // write 0x8000 0001 into vma->vm_refcnt > mas_store_gfp() > vma_mark_attached() > mmap_write_lock() // vma_end_write_all() s/mmap_write_lock()/mmap_write_unlock() > > In this scenario, we write-lock the VMA before adding it into the tree > to prevent readers (pagefaults) from using it until we drop the > mmap_write_lock(). In your proposal, the first thing vma_start_write() > does is add(0x8000'0001) and that will trigger a warning. > For now instead of add(0x8000'0001) I can play this game to avoid the war= ning: > > if (refcount_inc_not_zero(&vma->vm_refcnt)) > refcount_add(0x80000000, &vma->vm_refcnt); > else > refcount_set(&vma->vm_refcnt, 0x80000001); > > this refcount_set() works because vma with vm_refcnt=3D=3D0 could not be > found by readers. I'm not sure this will still work when we add > TYPESAFE_BY_RCU and introduce vma reuse possibility. > > > > > > 2. Adding 0x80000000 saturates refcnt, so I have to use a lower bit > > > 0x40000000 to denote writers. > > > > I'm confused, what? We're talking about atomic_t, right? > > I thought you suggested using refcount_t. According to > https://elixir.bootlin.com/linux/v6.13-rc2/source/include/linux/refcount.= h#L22 > valid values would be [0..0x7fff_ffff] and 0x80000000 is outside of > that range. What am I missing? > > > > > > 3. Currently vma_mark_attached() can be called on an already attached > > > VMA. With vma->detached being a separate attribute that works fine bu= t > > > when we combine it with the vm_lock things break (extra attach would > > > leak into lock count). I'll see if I can catch all the cases when we > > > do this and clean them up (not call vma_mark_attached() when not > > > necessary). > > > > Right, I hadn't looked at that thing in detail, that sounds like it > > needs a wee cleanup like you suggest. > > Yes, I'll embark on that today. Will see how much of a problem that is.