From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E5DCC3DA7D for ; Tue, 3 Jan 2023 20:52:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DB6E78E0002; Tue, 3 Jan 2023 15:52:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D3FDC8E0001; Tue, 3 Jan 2023 15:52:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBA8B8E0002; Tue, 3 Jan 2023 15:52:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A65CA8E0001 for ; Tue, 3 Jan 2023 15:52:24 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7568A40B78 for ; Tue, 3 Jan 2023 20:52:24 +0000 (UTC) X-FDA: 80314685808.12.8DC60B4 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf27.hostedemail.com (Postfix) with ESMTP id 91AFD40010 for ; Tue, 3 Jan 2023 20:52:22 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=IViIz6Xh; spf=pass (imf27.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672779142; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K0cC8cO0+mr7J0tr2RIuBUhvjDFX1yK793vsvfO/ff4=; b=qnnnC2UMb4pbGrgT61dQwkWUaKwBLx+EUJ9U1gkwjmJd+nl2qgUVQ7l2vhBbnlFf0jiWfW gVb7sw+TgI0aZRw52/3r0qmUhVZoxvqSIry5Tk56pFZpKl1p654S56dhI6ulaYITwPpfxu WhVOKnVG3Z8oNNzZf+PbyZOFIxg0RUY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=IViIz6Xh; spf=pass (imf27.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672779142; a=rsa-sha256; cv=none; b=JmeNzoezrbC6euBd/K6ZZUlrNESCcYYERo8oUfiowJnt+S/yFxTTNzwieJTBQ9OiDuLif2 fdu4K5ThgkCFe4ae8g5ZyLh0zQeaJntTzKhFZ7DrjTv2YT8WuM46pazrfGUkewPhBYa0YS br9Y95uFPg5rcPFiS85n2+q7lqQjy9E= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 72EF2614F6 for ; Tue, 3 Jan 2023 20:52:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D6B3BC4339B for ; Tue, 3 Jan 2023 20:52:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672779140; bh=xs5A+nMiaJ/Zy5xE1gzTZdyLZdx/SjaaBqXGHdvbWO4=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=IViIz6XhLna+IMieFjX0kFayYqlTjoK4+UskVhEDdMhHOndGCCTRyqPaAieL8owHT pQXfK+agxm0x3tTTeVx7yC+pwtxkji9bgIkV37m1k0+Ga6fXMCDxXcJtWbKSVOncSO dYo1bfTvjn6nVQAxcu5xgNJ+b/aiBEzVsrnfffA4R19DIxLqWzoAE3SuKKm2gld83q FqiFd9fplaWqc5DpIYzb0Y6uvms1SQ5FN9SKFpHLPdE3ZA9nNiZMXBdDGD2De9aLuu 2JwzNU83DTalxqv9ivLGHCDGxKNWIdW7DOp/VAZvcdtZmzadzDcvm9IP/0fE1zywnC TkheMl1WauzXg== Received: by mail-ej1-f52.google.com with SMTP id vm8so70195947ejc.2 for ; Tue, 03 Jan 2023 12:52:20 -0800 (PST) X-Gm-Message-State: AFqh2ko87dJR1dhXONCin6o5TofCLoNHDb/eYg/Pl186knzd9jqcWSC2 CPpb1zZ3yCVYANs+VV54dc6bJ4B+bhs457Lv7EThjw== X-Google-Smtp-Source: AMrXdXtJn4wvVsf0Gv6tOxeEuC/2g3dPY5tqpzGUE+LHAvoe7SJkKjKb2Tii9tUOZsU2wwA8B5p3fjKCk0JZwPbcZjk= X-Received: by 2002:a17:906:9155:b0:81a:c468:4421 with SMTP id y21-20020a170906915500b0081ac4684421mr5049277ejw.149.1672779138998; Tue, 03 Jan 2023 12:52:18 -0800 (PST) MIME-Version: 1.0 References: <20230101162910.710293-1-Jason@zx2c4.com> <20230101162910.710293-3-Jason@zx2c4.com> In-Reply-To: From: Andy Lutomirski Date: Tue, 3 Jan 2023 12:52:07 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings To: "Jason A. Donenfeld" Cc: Andy Lutomirski , Ingo Molnar , linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de, linux-crypto@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, Greg Kroah-Hartman , Adhemerval Zanella Netto , "Carlos O'Donell" , Florian Weimer , Arnd Bergmann , Jann Horn , Christian Brauner , linux-mm@kvack.org, Linus Torvalds Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 91AFD40010 X-Stat-Signature: xt79gep3pbmgtx8mmzrbezpd6wjzy3zm X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1672779142-199876 X-HE-Meta: U2FsdGVkX1+Q/x60oQeaFCrup2qgerxtg+V5lt++4Pr/Lrp0BgpMJXBpEJ9Dk0qXzlDbIBFYlbjTM1Eo4PYfcmktUzFy+aT+OLlyZiMn7NV24Iic69F4DGKweVW5pXb1BtzgCXa4J9SqaDJ3GbtMTcqYYY43KQaZphCcVK2lrRhKSxe95+mosms7I2gwzaDCxaaeRyiLf2mui/AH9L9d5Q/Occd29l1GnNWhMTNyBxOVqTEMfPDxU5V0+WVSzMi3uk5sz0Hzgcy/i6oYVxU9QmwGhZe8naey3sf/cB88XMJtufwky2D6Jf3MIicPA18Fh0UHo2UdKn6hKem4c2eZXCmlXtcU1Ff2lZG7VvoDqEPyhIwCvokWq7j59gwglkrN4nBcYVJeKZBXmf+l8bE2oMydUQigYbtjjW1SAL68voRxaFjxPBrMb/oD0WREE0miz+CnhO3YoKrhGMTNa4xvD8/aXzXPLb5j7fZBtI3yDIRNkhWqDgcK1fhuJ7O2WIzc3XYiW9pAwqdPLWlDVgu0KNeB5m30q6VM8anDds2H9po9NxYnFApKH+VoVhISK2Aw5kTChSoVCshvyVlxp7NgG9DvgLFXnMLV3oXRMm5U5LBT6IwgbUKXgPxoSYwXiOW1uo0aN2HjLLHOkb/STJ7oi/KOk1owgHXAGDscZ6GQXErCJGMew4YJBZ1dSp7ECQYPAnoxm3R1bLnetQja1xjHL45ciI+b3YGRbyEEhu4u0cGbIRtWnFDgdWNA9AyeUZRxbgo7Kvth+1ldEv44ZWG0re7rQq6WFbByry9HKLVXFb0akfTXZqO8VgBeh9a+aMue2IftU5lPfg2OHumuG+jbM7aPXURqwaa99w1QBhniPukYfSqf8ClqrcgIlMb8xw71D1QrfCxzHVRyltU96uQUO9FLqkv6ObjFacjsImUDD7auW6BkvNvg8l8G85lPf1550fh3+GxIX+TX9LQgMT+ 30wa5BRx iX+edeNeRO+RVT+n4T3KVRyUQWYavnBws9P3T2j8fC0VhE2A8uNhG0o28IkydDXvCPlyHVXC9jyQe2V4KDWra9tBesZb4NJbtWWT+TE6gPxAT3Z8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 3, 2023 at 11:06 AM Jason A. Donenfeld wrote: > > Hi Andy, > > Thanks for your constructive suggestions. > > On Tue, Jan 03, 2023 at 10:36:01AM -0800, Andy Lutomirski wrote: > > > > c) If there's not enough memory to service a page fault, it's not fatal, > > > > and no signal is sent. Instead, writes are simply lost. > > > > This just seems massively overcomplicated to me. If there isn't > > enough memory to fault in a page of code, we don't have some magic > > instruction emulator in the kernel. We either OOM or we wait for > > memory to show up. > > Before addressing the other parts of your email, I thought I'd touch on > this. Quoting from the email I just wrote Ingo: > > | *However* - if your big objection to this patch is that the instruction > | skipping is problematic, we could actually punt that part. The result > | will be that userspace just retries the memory write and the fault > | happens again, and eventually it succeeds. From a perspective of > | vgetrandom(), that's perhaps worse -- with this v14 patchset, it'll > | immediately fallback to the syscall under memory pressure -- but you > | could argue that nobody really cares about performance at that point > | anyway, and so just retrying the fault until it succeeds is a less > | complex behavior that would be just fine. > | > | Let me know if you think that'd be an acceptable compromise, and I'll > | roll it into v15. As a preview, it pretty much amounts to dropping 3/7 > | and editing the commit message in this 2/7 patch. > > IOW, I think the main ideas of the patch work just fine without "point > c" with the instruction skipping. Instead, waiting/retrying could > potentially work. So, okay, it seems like the two of you both hate the > instruction decoder stuff, so I'll plan on working that part in, in one > way or another, for v15. > > > On Tue, Jan 3, 2023 at 2:50 AM Ingo Molnar wrote: > > > > The vDSO getrandom() implementation works with a buffer allocated with a > > > > new system call that has certain requirements: > > > > > > > > - It shouldn't be written to core dumps. > > > > * Easy: VM_DONTDUMP. > > > > - It should be zeroed on fork. > > > > * Easy: VM_WIPEONFORK. > > > > I have a rather different suggestion: make a special mapping. Jason, > > you're trying to shoehorn all kinds of bizarre behavior into the core > > mm, and none of that seems to me to belong to the core mm. Instead, > > have an actual special mapping with callbacks that does the right > > thing. No fancy VM flags. > > Oooo! I like this. Avoiding adding VM_* flags would indeed be nice. > I had seen things that I thought looked in this direction with the shmem > API, but when I got into the details, it looked like this was meant for > something else and couldn't address most of what I wanted here. > > If you say this is possible, I'll look again to see if I can figure it > out. Though, if you have some API name at the top of your head, you > might save me some code squinting time. Look for _install_special_mapping(). --Andy > > Want to mlock it? No, don't do that -- that's absurd. Just arrange > > so that, if it gets evicted, it's not written out anywhere. And when > > it gets faulted back in it does the right thing -- see above. > > Presumably mlock calls are redirected to some function pointer so I can > just return EINTR? Or just don't worry about it. If someone mlocks() it, that's their problem. The point is that no one needs to. > > > Zero on fork? I'm sure that's manageable with a special mapping. If > > not, you can add a new vm operation or similar to make it work. (Kind > > of like how we extended special mappings to get mremap right a couple > > years go.) But maybe you don't want to *zero* it on fork and you want > > to do something more intelligent. Fine -- you control ->fault! > > Doing something more intelligent would be an interesting development, I > guess... But, before I think about that, all mapping have flags; > couldn't I *still* set VM_WIPEONFORK on the special mapping? Or does the > API you have in mind not work that way? (Side note: I also want > VM_DONTDUMP to work.) You really want unmap (the pages, not the vma) on fork, not wipe on fork. It'll be VM_SHARED, and I'm not sure what VM_WIPEONFORK | VM_SHARED does.