From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18FD4C43334 for ; Wed, 15 Jun 2022 19:42:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 312326B0071; Wed, 15 Jun 2022 15:42:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C0976B0072; Wed, 15 Jun 2022 15:42:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 163086B0074; Wed, 15 Jun 2022 15:42:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 073706B0071 for ; Wed, 15 Jun 2022 15:42:17 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CBF08204A0 for ; Wed, 15 Jun 2022 19:42:16 +0000 (UTC) X-FDA: 79581491472.12.483132D Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf22.hostedemail.com (Postfix) with ESMTP id 741C3C009D for ; Wed, 15 Jun 2022 19:42:16 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id g16-20020a17090a7d1000b001ea9f820449so3081260pjl.5 for ; Wed, 15 Jun 2022 12:42:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=E1FZhJxOu9sqrzI5vTqDD02jx8mqt52sVLiwvYYDFeA=; b=ZRVqtXbZUMpFGGk0z4E6a8OrjE3GZLI80Suftb6q+FxnID85rVTXe6PImchrt08pwC 5fmSfiAJUOR/jnFdju6hF2picIKm1rDeGPyV35qSJZo0U/HHcavlZ1LIlp0iBrgLboHp ESYRFKLPGFtB4Cyo4ys4qhlUyQ8IdDiP2+e/kMShsMdndU2XWbzpZhG2kEtju4yRUlVr VeQ3v1oBFlCbZCFp5YfVmcHOduLKFktM7R0xey3rc1rZgpxSuam3QHzJ5vABcKVYRfZO v4RwZSF0oYa6oeVEtWds5Or3L2orEn8U5o8twKp/AyvUth/vyi5ffZj0tJoVBQfXiSXp xEfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=E1FZhJxOu9sqrzI5vTqDD02jx8mqt52sVLiwvYYDFeA=; b=vaj7wsJssiHDclw6JW+uSETEkx8VAsQCA8fTO5Re/7IeG1qAbtxIN24X/i3rtr/nhA 4kcrHQaFqtR2a+1BiI4kn+yZwk19IUwHDKzwS3ssImU/tRYPkYsYa29F4fvnPKo7J8UM xhfE75SeP1cmKNIawRtIslN+X67xrHOMkrvZhCudAp0pONd+6N8tW+Fj6xWKK2s8MDLz FWsPIKcd9Eg/WjXx5yZLm8R6Grfwm/cOXArbJE7XiWAYWI+YJdnWK1+UcxT8Ley1SYuj JlRD8D67O31FnXtHwX4K/iMTxfTchQH5vbnRzAVNI8EbNAe3HuvauITbl0LpRv4tgfS3 GUBQ== X-Gm-Message-State: AJIora+bY74kD3/weQUzqkrlWDRpEPE7r1+HomlcyCkb8Ra888G/Dq2W Y5njvSEarIm/sragkd0pbLw= X-Google-Smtp-Source: AGRyM1v47vUx3l54W3ZjoQR4e86vOa1V+edp2LNQ6ZKRj7xwYdn7vg4kD0Gpva9vMstyVHreb+MhIg== X-Received: by 2002:a17:902:ef94:b0:168:a730:4abd with SMTP id iz20-20020a170902ef9400b00168a7304abdmr1103843plb.152.1655322135036; Wed, 15 Jun 2022 12:42:15 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id jc15-20020a17090325cf00b00168dadc7354sm11535plb.78.2022.06.15.12.42.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Jun 2022 12:42:14 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG From: Nadav Amit In-Reply-To: Date: Wed, 15 Jun 2022 12:42:13 -0700 Cc: Mike Rapoport , John Hubbard , David Hildenbrand , Linux MM , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen Content-Transfer-Encoding: quoted-printable Message-Id: References: <20220613204043.98432-1-namit@vmware.com> <3eea2e6e-1646-546a-d9ef-d30052c00c7d@redhat.com> <481fc9d0-6122-bf59-9d04-23c10d256764@nvidia.com> <0BB58ACF-2801-4622-BF3B-9913A23AE46C@gmail.com> To: Peter Xu X-Mailer: Apple Mail (2.3696.100.31) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655322136; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=E1FZhJxOu9sqrzI5vTqDD02jx8mqt52sVLiwvYYDFeA=; b=n3SqXC0J00cnYU7gpupV4ssW9GvWO/xLOpYjVsYxVgTMieSrQY8Ed2vH3qtZ/LzPm80CNI bdIOQysc+y3FYFBGu7YPUDkhoVli1lIkqNwYyF5YOw2ZEzHg4SbMdJkmiR26lP83k5h3IV Un4M7cxn8RDLNBYTUgbAqCx1lARS+WQ= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZRVqtXbZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655322136; a=rsa-sha256; cv=none; b=E3DqjrBcJao4/Ajx/UZ/mzsQNNje8fJLkHRUytYMma0XmfgdRGCGhQUjzyPe+umox0RFoI Y1hUis1G5UXSzjafzCTh/5nrxRwalfVHmfp+kxbZ3yPu0ahqCHATqUwSlZjJpcFTJ4S8G9 RDwP7TQETMwrhoSOz6i1P54jvd39U48= X-Rspamd-Queue-Id: 741C3C009D X-Rspam-User: X-Stat-Signature: 5jbyk4bktsa4i6bch68g7nczyf71m3c7 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZRVqtXbZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com X-Rspamd-Server: rspam04 X-HE-Tag: 1655322136-691692 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Jun 15, 2022, at 11:39 AM, Peter Xu wrote: > On Wed, Jun 15, 2022 at 09:58:27AM -0700, Nadav Amit wrote: >> On Jun 15, 2022, at 8:43 AM, Peter Xu wrote: >>=20 >>> On Wed, Jun 15, 2022 at 10:26:21AM +0300, Mike Rapoport wrote: >>>> On Tue, Jun 14, 2022 at 01:56:56PM -0700, Nadav Amit wrote: >>>>> On Jun 14, 2022, at 1:40 PM, John Hubbard = wrote: >>>>>=20 >>>>>> On 6/14/22 11:56, Mike Rapoport wrote: >>>>>>>> But, I cannot take it anymore: the list of arguments for uffd = stuff is >>>>>>>> crazy. I would like to collect all the possible arguments that = are used for >>>>>>>> uffd operation into some =E2=80=9Cstruct uffd_op=E2=80=9D. >>>>>>> Squashing boolean parameters into int flags will also reduce the = insane >>>>>>> amount of parameters. No strong feelings though. >>>>>>=20 >>>>>> Just a quick drive-by comment about boolean arguments: they ruin = the >>>>>> readability of the call sites. In practice, sometimes a single = boolean >>>>>> argument can be OK-ish (still poor to read at the call site, but = easier >>>>>> to code initially), but once you get past one boolean argument in = the >>>>>> function, readability is hopeless: >>>>>>=20 >>>>>> foo(ptr, true, false, a =3D=3D b); >>>>>>=20 >>>>>> So if you have a choice, I implore you to prefer flags and/or = enums. :) >>>>>=20 >>>>> Thanks for the feedback - I am aware it is very confusing to have = booleans >>>>> and especially multiple ones in a func call. >>>>>=20 >>>>> Just not sure how it maps to what I proposed. I thought of passing = as an >>>>> argument reference (pointer) to something similar to the following = struct, >>>>> which I think is very self-descriptive: >>>>>=20 >>>>> struct uffd_op { >>>>> /* various fields */ >>>>> struct vm_area_struct *dst_vma; >>>>> unsigned long len; >>>>> atomic_t *mmap_changing; >>>>>=20 >>>>> ... >>>>> =09 >>>>> /* ... and some flags */ >>>>> int wp: 1; >>>>> int zero: 1; >>>>> int read_likely: 1; >>>>>=20 >>>>> ... >>>>> }; >>>>>=20 >>>>> I think that fits what you were asking for. The only thing I am = not sure of, >>>>> is whether to include in uffd_op fields that are internal to = mm/userfaultfd >>>>> such as =E2=80=9Cpage=E2=80=9D and =E2=80=9Cnewly_allocated=E2=80=9D= . I guess not. >>>>=20 >>>> mfill_atomic_install_pte() is called by shmem_mfill_atomic_pte() so = it's >>>> not entirely internal to mm/userfaultfd.c. >>>>=20 >>>> Another thing is that with all the parameters packed into a struct, = the >>>> call sites could become really hairy, so maybe the best way would = be to >>>> pack some of the parameters and leave the others. >>>>=20 >>>> But you'll never know until you try :) >>>=20 >>> Yeh. Axel packed some booleans in f619147104c8e into = mcopy_atomic_mode. >>> The other option (besides uffd_ops) could be making = mcopy_atomic_mode a >>> bitmask and keep the rest, the mode itself only took 2 bits. >>>=20 >>> uffd_ops sounds good too if the final outcome looks clean, since we = do pass >>> quite a few things over and over deep into the stack. >>=20 >> Thanks. >>=20 >> I see 3 options: >>=20 >> 1. Pack only fs/mm flags: WP, read-likely, write-likely. >> 2. (1) + as part of the flags internally include Axel=E2=80=99s = copy_atomic_mode. >> 3. The uffd_op approach: include all relevant fields. >>=20 >> For the time being I=E2=80=99m going with (1) since I do not have too = much time to >> finish all of that and upstream the rest of my work (Broadcom is = knocking). >>=20 >> (3) also has the downside of stack-protector that would be added due = to >> stack-protector strong, which is not-that-bad, but I hate it. >=20 > Any short (but further) explanations? Sure, but it might not be short. So one time Mathew Wilcox tried to convert some function arguments into = a struct that would hold the arguments and transfer it as a single = argument, and he found out that the binary size actually increased. Since I like = to waste my time, I analyzed why. IIRC, there were two main reasons. The first one is that the kernel by default uses the =E2=80=9Cstrong=E2=80= =9D stack-protector. It means that not all functions would have a stack protector, and actually only quite few would. One main reason that you = have a stack protector is that you provide dereference a local variable. So = if you have a local struct that hold the arguments - you get a stack = protector, and it does introduce slight overhead (~10ns IIRC). There may be some = pragma to prevent the stack protector, but clearly I will be shot if I used it. The second reason is that the compiler either reloads data from the = struct you use to hold the arguments or might spill it to the stack if you try = to cache it. Consider the following two scenarios: A. You access an argument multiple times: local1 =3D args->arg1; another_fn(); // Or some store to the heap local2 =3D args->arg1; // You use local1 and local2 In this case the compiler would reload args->arg1 from memory, even if = there is a register that holds the value. The compiler is concerned that = another_fn() might have overwritten args->arg1 or - in the case of a store - that the = value was overwritten. The reload might prevent further optimizations of the = compiler. B. You cache the argument locally (aka, you decided to be =E2=80=9Csmart=E2= =80=9D): arg1 =3D args->arg1; local1 =3D arg1; another_fn(); local2 =3D arg1; You may think that this prevents the reload. But this might even be = worse. The compiler might run out of registers spill arg1 to the stack and then access it from the stack. Or it might need to spill something else, or shuffle registers around. So what can you do? You can mark another_fn() as pure if it is so, which does help in very certain cases. There are various limitations though. = IIRC, gcc (or is it clang?) ignores it for inline functions. So if you have an inline function which does some write that you don=E2=80=99t care about = you cannot force the compiler to ignore it. Note that at least gcc (IIRC) regards inline assembly as something that = might write to arbitrary memory address. So having BUG_ON() would require a = reload of the argument from the struct. You may say: =E2=80=9Cwhat about the restrict keyword?=E2=80=9D Well, = this is nice in theory, but it does not always help and the implementation of gcc and = clang are not even the same in this regard. IIRC, in one of them (I forgot = which), the restrict keyword will prevent the reload if instead another_fn() = there was a store, but once you call a different function, that compiler says = =E2=80=9Call bets are off, I ignore the restrict=E2=80=9D and would reload/spill arg1 = (depending on A or B). >> Three more points for consideration in future cleanups: >>=20 >> 1. This __always_inline thingy is crazy IMHO. The size of the = compilation >> unit is almost double because of it, and I saw no explanation for its = use in >> the commit log (unless I missed it). The overheads in userfaultfd are = mostly >> due to memory copying, scheduling, IPIs. >=20 > Do you meant __mcopy_atomic()? I agree it's suspicious. That's a = huge > function with three callers. Even if to mark it inline I think it's = more > proper to mark it at the three callers, or I'd think we can leave that = for > the compilers. Among others. But yes, __mcopy_atomic() is the craziest one. >=20 >> 2. I think it makes more sense to strive not to have more than 6 = arguments >> for each function (as supported in registers on x86). For that it is = possible >> to get rid of dst_mm when it can be retrieved from dst_vma. Anyhow we = access >> dst_vma->vm_flags which share a cache-line with dst_vma->vm_mm. >>=20 >> 3. These BUG_ON()s all around are also ... excessive. I guess they = were >> introduced before the age in which Linus got angry on each BUG_ON(). = Is >> there any good reason not to change them into VM_BUG_ON()? >=20 > I'm not sure how that was handled, I'd personally think it's fine to = keep > them if they're there for ages. They're gone already if they can be > triggered in any bad ways anyway.. But for sure we'd keep an eye on = new > ones when reviewing. Whatever. They might just cause slightly less optimized code to be generated.