From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D45B0C001DF for ; Fri, 20 Oct 2023 14:09:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C15B8D00D0; Fri, 20 Oct 2023 10:09:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 371708D0003; Fri, 20 Oct 2023 10:09:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 25FE88D00D0; Fri, 20 Oct 2023 10:09:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1813A8D0003 for ; Fri, 20 Oct 2023 10:09:45 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D71DC8107C for ; Fri, 20 Oct 2023 14:09:44 +0000 (UTC) X-FDA: 81366023088.03.5F2BFB8 Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) by imf17.hostedemail.com (Postfix) with ESMTP id 16DD440006 for ; Fri, 20 Oct 2023 14:09:42 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OgaBLBbC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of surenb@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697810983; a=rsa-sha256; cv=none; b=FAwtUy53wOwZnJ9PTIaopZR9zKsAvQZfqX1MrCN+/TSGeUYhU0paUPEYJiqNRB8LIzzf75 B6W71e17CWTElkMft0rIpdEGPz9g6Flw0a23WAxj0OmCjG7yDWe702DlqbjTbfP6/h4KPA NWaag9Q6K5xmaFunIdaIv24FbVVntMw= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OgaBLBbC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of surenb@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697810983; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M6YP/teVQRQ13BPszkwW5RTXBLcu7j1b8Aesjq/guSY=; b=NKrYwoBDV3h/zo21P25C4ARP/avhHkDsz/03MLbdARisU77iMDE86Gnpi6IuEW67PQ39Eb whgXxUWekfzgcq2CKyteaKYuB6o9pEpOm/0FHHtkDx22YMaXp/rAxnX8zm6xYczg4oYOD6 6Gj+nbuoIP0TB/kTuYH7oU1jsSMaV00= Received: by mail-yw1-f177.google.com with SMTP id 00721157ae682-5a7c95b8d14so9455787b3.3 for ; Fri, 20 Oct 2023 07:09:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697810982; x=1698415782; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=M6YP/teVQRQ13BPszkwW5RTXBLcu7j1b8Aesjq/guSY=; b=OgaBLBbCUMv47sObxprb783D2w2SFmOumLn5mIX8C7vQ6jdEfjTOCXGJ304XhD6DTV Z/gwiqUgrkl3M2JpUvkRZtmlyMFRPjETDOB1BhcMTzWxdW6YFTnc69bItJ9FYMZ0rvBO XL20Zbrg45xpbowGIyf7nsYOvF0I/JsW8eSTTpFbQSXdPobeycfCie91+5nKdejcM7VM gLY/b1XzETsChQeWSi9dYloQrPILdbhdTfdxBiJx39wuIURGsaJ/n6WNiH3eHney0cND ohlKJn/AdAtp2EpdS34U0h9bCB7pO+qU6VYAhiSfpsJO82S0CDH+S8m+VQlkZFHXGagL /v0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697810982; x=1698415782; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M6YP/teVQRQ13BPszkwW5RTXBLcu7j1b8Aesjq/guSY=; b=ChGanxT28PjujJwigMgg7iaQHsIfnPSPG+GEDd1wIdhDgQ7BldehIaUlhVnIOM0N2/ jjdjFbnrwxTEbUeZRMmJ1hFpU2yhCXQ1SgOYU9fppwHG/uXfpWf5haRKiFIdVGI5+imB F9FwjK4B8bepV6gPeno+cAQXJbO6xo9fLjvxYHw3SJv5V/unEebeMjS/6jZ8QTxNHbaT Oy4L4P3dzXhkME1HjXJeHvX2epcSwv2TuY6urOEFlySX9FkYrkPmhH3OXS6B9WNATeQ0 mXnakVGa3AQWeVpVSmfLME1STb3K8To8nkxvO9C2vobhI5BepEyk2vPrIE7FUaVChQ+T 5WMQ== X-Gm-Message-State: AOJu0YyzQH8nlDT9U6SXpYJeCq9IQtochimcWfbO4qMsEoZzdhFo6sD+ BkU1wXHtbMg+i3TP2GqNHEiC7OVdfAeCPWcM1TFAvw== X-Google-Smtp-Source: AGHT+IEMDIp10DzXlnGt6SASQtJ1tuQPgPeVTO1Ezg9HTj83cfJu2cTxYgtGWx3/7WTfDMYL6h6fGRi+BmSj1Io1CoY= X-Received: by 2002:a0d:dd97:0:b0:5a8:2d2b:ca9c with SMTP id g145-20020a0ddd97000000b005a82d2bca9cmr2372237ywe.32.1697810981824; Fri, 20 Oct 2023 07:09:41 -0700 (PDT) MIME-Version: 1.0 References: <478697aa-f55c-375a-6888-3abb343c6d9d@redhat.com> <205abf01-9699-ff1c-3e4e-621913ada64e@redhat.com> <12588295-2616-eb11-43d2-96a3c62bd181@redhat.com> <8d187891-f131-4912-82d8-13112125b210@redhat.com> <81cf0943-e258-494c-812a-0c00b11cf807@redhat.com> In-Reply-To: <81cf0943-e258-494c-812a-0c00b11cf807@redhat.com> From: Suren Baghdasaryan Date: Fri, 20 Oct 2023 07:09:28 -0700 Message-ID: Subject: Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI To: David Hildenbrand Cc: Peter Xu , Lokesh Gidra , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, Liam.Howlett@oracle.com, jannh@google.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 16DD440006 X-Stat-Signature: jq8mrcz5x93enieb8hkiadkpdz4xcffp X-HE-Tag: 1697810982-646816 X-HE-Meta: U2FsdGVkX18/xXnguwlBjLeJZuhE8yTvh0gB7xTBwl2b3UavLp8LQJfg6LSCXZ6poGwBPmz6iAiBKIE6sOXrCkwrhkm19sDmExWVEElHyW4dkmNAGlVt6FuTMjd4Ngx3WYqAbLk9mRUDobgEQC6LxXcmzwBWW4Kp8gA5Xk64W+MU+JNKwLxz3Sv1NH/robWZHVn+yMRGIIrav8aBXNPsoW1tOoa6yBn+EjJx2ASvNDzhBuAb7838UiANWNTMyzFwDxUBrn42J+Uzl30jJZqrwVCdiahNJLU/RM4GBY33LeFl86rTlBJu22yde53gsFwcf0iG3e4/lODAfwuPgBQKWXkFzsh+JZ0WblOwUar1aGQH0GIoNgLq9OsWEjFpw3GIqmcVrk28QtGuV3GBypR0lgxqpxwmrbTMbWbvQhegRcMWP6+3avxRS6RRlEkCPSyiifTRUbbqj/vg5ZWY/PYPe58wIgd/O7NnIElAmyh8Cky1KiDvSaIWDvmubw0n4tUCgvhVN+zqfz85xjy0XZNENLOoDVCMgyrwvivNVaakcCd03wkKS0msklvQY6oDGRbz0dIYk0RyhWo5RlZueMDTNTtJLMxL66KdyS+EUZPtSf73KEA7QG8PeL5TmGnNlHH6yJaEEgqnd1pWiNIFbAv329vo8/UHxRgKVIPN3AipzWrDBx/Yp+XsFW/k3f1wYvi2F93wlb9jwxHDS2q8sn1G5T2bxgZxj1a/xkaw5gI63eZU0rZILus2M66svzAlwGEQUYc7V0o2B+PQsJqJQ5yjh+Ld/Fv6YEHVSnaSvXHPdlmFodGZitzHRYooka/oJpXAPyTW+LO+FcteL5jvMB3eGlL7/jbYONAxHgOqluVf5t5MAvAGkj9cK+3z4pfglsl/3zJKWMbIw16kLd97huSlF82YFZjLp2h3biiUa2ZGIpQ/o1gszonZXXReg7AVIwtNsQp40YiQ1u82YmFIm2l MdBSIpVy UUJGwBpCeQWHIU/qAG27uqJwLkUN4K53AwzYw5kSBG0iVsevThgJhl7m3oQaIU41jHcBQSWmACz5D91edByz9wunwHrvtaOedHky7vEWSavaTZJgzyjUNRPFjMIQGuI1z+y+D59levh3YocU/U7J3ptnjmoV/LbolbaKQSEZ/rw/i8bSf1iDimGxruMfckxPTnnntJwa+hDdAmq42Fob6ZQEKfj8VD6RxobZxESLYIr1EJdHyd+ll1ynk5yCT2xX+oofADdFQqG7Gp+lO3NUSkKnSrE3weIfd/fptnf0dfWsCu9I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000659, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 20, 2023 at 3:02=E2=80=AFAM David Hildenbrand wrote: > > On 19.10.23 21:53, Peter Xu wrote: > > On Thu, Oct 19, 2023 at 05:41:01PM +0200, David Hildenbrand wrote: > >> That's not my main point. It can easily become a maintenance burden wi= thout > >> any real use cases yet that we are willing to support. > > > > That's why I requested a few times that we can discuss the complexity o= f > > cross-mm support already here, and I'm all ears if I missed something o= n > > the "maintenance burden" part.. > > > > I started by listing what I think might be different, and we can easily > > speedup single-mm with things like "if (ctx->mm !=3D mm)" checks with > > e.g. memcg, just like what this patch already did with pgtable depositi= ons. > > > > We keep saying "maintenance burden" but we refuse to discuss what is th= at.. > > Let's recap > > (1) We have person A up-streaming code written by person B, whereby B is > not involved in the discussions nor seems to be active to maintain that > code. > > Worse, the code that is getting up-streamed was originally based on a > different kernel version that has significant differences in some key > areas -- for example, page pinning, exclusive vs. shared. > > I claim that nobody here fully understands the code at hand (just look > at the previous discussions), and reviewers have to sort out the mess > that was created by the very way this stuff is getting upstreamed here. > > We're already struggling to get the single-mm case working correctly. > > > (2) Cross-mm was not even announced anywhere nor mentioned which use it > would have; I had to stumble over this while digging through the code. > Further, is it even *tested*? AFAIKS in patch #3 no. Why do we have to > make the life of reviewers harder by forcing them to review code that > currently *nobody* on this earth needs? > > > (3) You said "What else we can benefit from single mm? One less mmap > read lock, but probably that's all we can get;" and I presented two > non-obvious issues. I did not even look any further because I really > have better things to do than review complicated code without real use > cases at hand. As I said "maybe that works as expected, I > don't know and I have no time to spare on reviewing features with no > real use cases)"; apparently I was right by just guessing that memcg > handling is missing. > > > The sub-feature in question (cross-mm) has no solid use cases; at this > point I am not even convinced the use case you raised requires > *userfaultfd*; for the purpose of moving a whole VMA worth of pages > between two processes; I don't see the immediate need to get userfaultfd > involved and move individual pages under page lock etc. You make a compelling case against cross-mm support. While I can't force Andrea to participate in upstreaming nor do I have his background, keeping it simple, as you requested, is doable. That's what I plan on doing by splitting the patch and I think we all agreed to that. I'll also see if I can easily add a separate patch to test cross-mm support. I do apologize for the extra effort required from reviewers to cover for the gaps in my patches. I'm doing my best to minimize that and I really appreciate your time. > > > > > I'll leave that to Suren and Lokesh to decide. For me the worst case i= s > > one more flag which might be confusing, which is not the end of the wor= ld.. > > Suren, you may need to work more thoroughly to remove cross-mm implicat= ions > > if so, just like when renaming REMAP to MOVE. > > I'm asking myself why you are pushing so hard to include complexity > "just because we can"; doesn't make any sense to me, honestly. > > Maybe you have some other real use cases that ultimately require > userfaultfd for cross-mm that you cannot share? > > Will the world end when we have to use a separate flag so we can open > this pandora's box when really required? > > > Again, moving anon pages within a process is a known thing; we do that > already via mremap; the only difference here really is, that we have to > get the rmap right because we don't adjust VMAs. It's a shame we don't > try to combine both code paths, maybe it's not easily possible like we > did with mprotect vs. uffd-wp. That's a good point. With cross-mm support baked in, the overlap was not obvious to me. I'll see how much we can reuse from the mremap path. > > Moving anon pages between process is currently only done via COW, where > all things (page pinning, memcg, ...) have been figured out and are > simply working as expected. Making uffd special by coding-up their own > thing does not sound compelling to me. > > > I am clearly against any unwarranted features+complexity. Again, I will > stop arguing further, the whole thing of "include it just because we > can" to avoid a flag (that we might never even see) doesn't make any > sense to me and likely never will. > > The whole way this feature is getting upstreamed is just messed up IMHO > and I the reasoning used in this thread to stick > as-close-as-possible to some code person B wrote some years ago (e.g., > naming, sub-features) is far out of my comprehension. I don't think staying as-close-as-possible to the original version was the way I was driving this so far. At least that was not my conscious intention. I'm open to further suggestions whenever it makes sense to deviate from it. Thanks, Suren. > > -- > Cheers, > > David / dhildenb >