From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54505C04A94 for ; Mon, 31 Jul 2023 18:03:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E6EEA280093; Mon, 31 Jul 2023 14:03:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E1F3428007A; Mon, 31 Jul 2023 14:03:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE702280093; Mon, 31 Jul 2023 14:03:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C230628007A for ; Mon, 31 Jul 2023 14:03:14 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 81237120237 for ; Mon, 31 Jul 2023 18:03:14 +0000 (UTC) X-FDA: 81072678708.02.889C9FC Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf12.hostedemail.com (Postfix) with ESMTP id BE91F40078 for ; Mon, 31 Jul 2023 18:02:56 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=SBKNBWiB; spf=pass (imf12.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.54 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690826576; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sbvUvGljtEHmYL7qi+HXkrNPoFOxFR5BvYu2TjHTwr0=; b=ez77H8gE5SKmxiqtvnopw2r4BsXBScjoXpxKAQBXhBss9vATPAp2uEjX9LFrOQCTNmtu9s 2Q5j+ny9ypAMdGDr1DpnW2ryDF9aUy/2JkW8jur8RZ+kWFAeFTc9/2/QUc96tjT7yDQm8h BnMvl/ebvKz+lPxg03AEHBHl7lBU3Uw= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=SBKNBWiB; spf=pass (imf12.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.54 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690826576; a=rsa-sha256; cv=none; b=0hLt097WasdAomVVHWuE29pbN9xnAQ4LNNMAPTZ9//1tVE1xyZGJzte6yj6y+kNDttX3Pl zHMyXQZLKTZpIvvHYUTKT/OrVWtEpR6ixBzki6lpxK3EMqXURftAqviR3EwXfJNpdUrgAp FpZCqpa4dL4aL5sDFBEv1XjZOuUma+k= Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-4fe41f7c7ffso197471e87.1 for ; Mon, 31 Jul 2023 11:02:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1690826575; x=1691431375; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sbvUvGljtEHmYL7qi+HXkrNPoFOxFR5BvYu2TjHTwr0=; b=SBKNBWiBFxUs7uuCWTsicxwpGCcvM4Kef0rRGrJ5UvaO5lXvG/bc8AvvhnlKBVRzYT X5Xqtndb5rdHGVAr1rvsjK1DdwHu3oXiODm5tNpOUfPsBIQxLQ1DWCh+uOKfy0hV9WMH svztTReWCclA2IGFAZYqnaQYxB4fGKTPjmbF8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690826575; x=1691431375; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sbvUvGljtEHmYL7qi+HXkrNPoFOxFR5BvYu2TjHTwr0=; b=iC68bikpXf2OSxexnJ1TkRhFdoxKBUoaN9xcbAe7iCtsVW+aukpQiXz9Tuj6svbySa eOfh/zk9pHINNARHS5Pwduq7VZGsfcWaDgWtLjlecZX6pTQ2aAFMDYayzMoihSKuXq2p hVJhEXuhC5H8N9iFFNzPtRX0U+sJ9GihuZjTLbRezvvfinMCWP7dY2eSN4Sc6f3IRXmw sdzLXq+azdbbUayCcE2P4LZNH45d5TrRvsOmw2D7+3+kVkcOsY1XYXcqd8RjLY7k9/Qw 1f93CvxIDKlG2kviogK0BYwg2Bpyg3fPwKrRl9YEPjU0mMMx/Rahf2D6V7n/b3AsrewV vpkg== X-Gm-Message-State: ABy/qLYLdXRVB7zl86L5K3WwnIBlysR0249pcGQJu6CogEX2sC6eE7kn s9Noqe92cbSPbERDHlHXRB0ghQF8eSEaG8DV3LxwhA== X-Google-Smtp-Source: APBJJlGTKAWp4r1gsFpVV1W6oed2oWaalmhxmySSf2P76GrkQG3dhyBfqYULaA98WeBTRLF1bBXgsw== X-Received: by 2002:a05:6512:b93:b0:4fd:e113:f5fa with SMTP id b19-20020a0565120b9300b004fde113f5famr469567lfv.7.1690826574744; Mon, 31 Jul 2023 11:02:54 -0700 (PDT) Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com. [209.85.167.51]) by smtp.gmail.com with ESMTPSA id t28-20020ac243bc000000b004fe31fa2490sm801198lfl.247.2023.07.31.11.02.53 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 31 Jul 2023 11:02:53 -0700 (PDT) Received: by mail-lf1-f51.google.com with SMTP id 2adb3069b0e04-4fe11652b64so7966756e87.0 for ; Mon, 31 Jul 2023 11:02:53 -0700 (PDT) X-Received: by 2002:ac2:4e07:0:b0:4fe:ec5:2698 with SMTP id e7-20020ac24e07000000b004fe0ec52698mr498383lfr.50.1690826573309; Mon, 31 Jul 2023 11:02:53 -0700 (PDT) MIME-Version: 1.0 References: <20230731171233.1098105-1-surenb@google.com> <20230731171233.1098105-2-surenb@google.com> In-Reply-To: <20230731171233.1098105-2-surenb@google.com> From: Linus Torvalds Date: Mon, 31 Jul 2023 11:02:36 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/6] mm: enable page walking API to lock vmas during the walk To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, jannh@google.com, willy@infradead.org, liam.howlett@oracle.com, david@redhat.com, peterx@redhat.com, ldufour@linux.ibm.com, vbabka@suse.cz, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, hannes@cmpxchg.org, dave@stgolabs.net, hughd@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: BE91F40078 X-Rspam-User: X-Stat-Signature: y6uopyugj1dwe8s7h8excp8wgf7c9jqo X-Rspamd-Server: rspam01 X-HE-Tag: 1690826576-97411 X-HE-Meta: U2FsdGVkX1/7t92j/Pg52M/9Vo41O/3j1NQ4mx18fwr05shlONmPy0UOaAmcDXWrA7dDY/HRkdxKLXnnJfyA/zULxBj2BMOntvg89L1sCsncqAYdJdolwoqTswpOi7+OcarjZgGO/Dj7b4rYU1pJ21Z14bdNeMfHOoYyJwZJv3cpRArx83brHsiaR/zWl0CYy2INi3KpgLxu5GZc4Bjo7h/sglBQyi3qtcQhamNTpDEzeT85czoykgnz/pfqbFuiRyWPM233nabZneo5J3QTbuKZatle9/NXlYaTNufEUdzTYoWJVIiI5XrIr6op0w9AKqzxIR63I3iArxI7UxU486o63Kd/cW8QN3estxmGZiltpw0vUZR28t633Lc74qufpOnWxRVen2bOya9d+izRyoxm1BzqVBjaC0+0aysdb02/PgXu8pDtNMkL0znCsOuyZaMIVdfAI1ZHSNF72W7QtwoSSw56Q5q59+Z+gUXE1KDt1Cj4SNa+vxiEv+tT9hLjYrdRdweohJypTQu80FEFVfW4oAqf5PqU/c0CfExKzo004IR9HDBwh2duar0++STPRW6NXaF0cLexgmmEV5IlCCMhm8+1lsmyXfmOm87aM7c9Eby0yyynhxCDyhsOfVOZ/x0NoTWf8aOfsAKPblwV6Ps3UVkeSyxQnwpvFMofodJFvAUWxtKJzPQcGL3u9SyxQ1CJ7YiXhhyQ0TyS1dKRnX9qhrC3rhQf4WrkLt2HCVsCxcJG2FLVXGoCy1o0Cu5RmttR3DnNojcNfoN3++YnSYGNVu14RCa/IllFgShcW+2M+bm3IPdnqFfsd0i4tsPvGaDgAWWxfa/YfnVI0XYC7N0hzz6CQ15SE5SwdEtvDEF6L0C2yKqZdp8YvqSmpUfDC9i0QVasYjNTcV3t1Qv8dJskn2aOM8+luxUdzmipJQ5MtbAYCiz1SSwnYF2vIj3/GSMtG4fInXYOY8UbJf6 8otTs16F E8bLTh0v9kCncOWmVcdhveK73PEAlOC/Nq8x/UugTzN3UN4P2KV3pO8bwh0LgyFBfeIWNOKrbHtb5PZTFnRyST6LXipNbWmz/gGZA4E6vRpVPeR1+SaJM52Ij0MDbuXwszWctRrKJey/J5btpbYlHLkwsDGPgox8pyk37psq3y52jAA5vDQGK4bJ7mdbPXf9Zq6ho4nTfNq95Uz+rGipEi+TNZFlz9aEVnvBl+UKuBvDDui3a1ELbI3Sph54Q0dj4ufwWMVFBesxqHThkEQncoFDIGjDDQaXRfICKqqMnr+8r7VQa+l5pBsudOYGlvLdYCezoeETJveYnKMyfx/glQ741xWubzF134g+c7BVKkRcIZcQJWGj9jpNzhCmTnyOAziPDY6q9eMnD9T1KbroQFpZXfHV2ldacyoZPhnC+KVecz5pSzSovjM+1B5oI4eCOMk8O8QvbsaVTZNHD3lOu9EfsvUg+LurrWrYOC7gX3CIA1bTEkC04af4mKohYEwHU1cJDiU3x1xZkLHM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 31 Jul 2023 at 10:12, Suren Baghdasaryan wrote: > > - walk_page_vma(vma, &subpage_walk_ops, NULL); > + walk_page_vma(vma, &subpage_walk_ops, true, NULL); Rather than add a new argument to the walk_page_*() functions, I really think you should just add the locking rule to the 'const struct mm_walk_ops' structure. The locking rule goes along with the rules for what you are actually doing, after all. Plus it would actually make it all much more legible when it's not just some random "true/false" argument, but a an actual .write_lock = 1 in the ops definition. Yes, yes, that might mean that some ops might need duplicating in case you really have a walk that sometimes takes the lock, and sometimes doesn't, but that is odd to begin with. The only such case I found from a quick look was the very strange queue_pages_range() case. Is it really true that do_mbind() needs the write-lock, but do_migrate_pages() does not? And if they really are that different maybe they should have different walk_ops? Maybe there were other cases that I didn't notice. > error = walk_page_range(current->mm, start, end, > - &prot_none_walk_ops, &new_pgprot); > + &prot_none_walk_ops, true, &new_pgprot); This looks odd. You're adding vma locking to a place that didn't do it before. Yes, the mmap semaphore is held for writing, but this particular walk doesn't need it as far as I can tell. In fact, this feels like that walker should maybe *verify* that it's held for writing, but not try to write it again? Maybe the "lock_vma" flag should be a tri-state: - lock for reading (no-op per vma), verify that the mmap sem is held for reading - lock for reading (no-op per vma), but with mmap sem held for writing (this kind of "check before doing changes" walker) - lock for writing (with mmap sem obviously needs to be held for writing) > mmap_assert_locked(walk.mm); > + if (lock_vma) > + vma_start_write(vma); So I think this should also be tightened up, and something like switch (ops->locking) { case WRLOCK: vma_start_write(vma); fallthrough; case WRLOCK_VERIFY: mmap_assert_write_locked(mm); break; case RDLOCK: mmap_assert_locked(walk.mm); } because we shouldn't have a 'vma_start_write()' without holding the mmap sem for *writing*, and the above would also allow that mprotect_fixup() "walk to see if we can merge, verify that it was already locked" thing. Hmm? NOTE! The above names are just completely made up. I dcon't think it should actually be some "WRLOCK" enum. There are probably much better names. Take the above as a "maybe something kind of in this direction" rather than "do it exactly like this". Linus