From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B313C53210 for ; Tue, 3 Jan 2023 18:51:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81FE38E0003; Tue, 3 Jan 2023 13:51:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A9568E0001; Tue, 3 Jan 2023 13:51:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64B6E8E0003; Tue, 3 Jan 2023 13:51:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 54E358E0001 for ; Tue, 3 Jan 2023 13:51:54 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2E800C06CF for ; Tue, 3 Jan 2023 18:51:54 +0000 (UTC) X-FDA: 80314382148.05.46B423A Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf23.hostedemail.com (Postfix) with ESMTP id DC46A140005 for ; Tue, 3 Jan 2023 18:51:50 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=zx2c4.com header.s=20210105 header.b=bHHodc0f; spf=pass (imf23.hostedemail.com: domain of "SRS0=kn2u=5A=zx2c4.com=Jason@kernel.org" designates 145.40.68.75 as permitted sender) smtp.mailfrom="SRS0=kn2u=5A=zx2c4.com=Jason@kernel.org"; dmarc=pass (policy=quarantine) header.from=zx2c4.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672771911; a=rsa-sha256; cv=none; b=wkeIkhUrIL12w+LmBhNpu2IaSggpqxqvcGtm/etfDSJhVNAkNn0mS6CtTvQGL4pj2p5lGq J4EJA2VQy9OHPYo5pbqE8My/z5aK7ibLXgqwExek2DNfvguvwGi6yYQLMo2Ts1LxMlnUuc XIIIgdhkHKdH9ItL+KfJMyKtZ7R3hBY= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=zx2c4.com header.s=20210105 header.b=bHHodc0f; spf=pass (imf23.hostedemail.com: domain of "SRS0=kn2u=5A=zx2c4.com=Jason@kernel.org" designates 145.40.68.75 as permitted sender) smtp.mailfrom="SRS0=kn2u=5A=zx2c4.com=Jason@kernel.org"; dmarc=pass (policy=quarantine) header.from=zx2c4.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672771911; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yIp6kO5eIHdv4v3c2ecLfUgKivnf2+mEm8o0l2hkbzo=; b=AqOnzyTbD/kkUkP5LXitHE0ir4NLxcuV3cC5zM1/hRk6CBUzASZxiQiZipKl/t8fbIcp2o mOzvK1MeG1dNIYu3G/fFh4uTDeTMWNCaeoMv/1c1ERXsdiWrXb9klOW0PMPs9nwpt1q2Mc ym0UOs+ay+3knoFJlEm+tKxsA+JpxMo= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 0C68CB810AA; Tue, 3 Jan 2023 18:51:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E000DC433EF; Tue, 3 Jan 2023 18:51:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1672771903; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yIp6kO5eIHdv4v3c2ecLfUgKivnf2+mEm8o0l2hkbzo=; b=bHHodc0fH+H7OOTTkxPAWw94TxrIyjgw8ZAmtm3SM0EYAXHTAqUINBwNVpDp+Ep30VQN7N ngeXJ0no/zVq7bgjckhK5fPEXBM3HxDC/3PeTN5YrBYhgdotA9NzKVzvi5PfzmDmYoEQJd Ss6rEgGplCLChsIP0gx+7BeqKr6GAKg= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id ea8cdcd1 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Tue, 3 Jan 2023 18:51:42 +0000 (UTC) Date: Tue, 3 Jan 2023 19:51:38 +0100 From: "Jason A. Donenfeld" To: Ingo Molnar Cc: linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de, linux-crypto@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, Greg Kroah-Hartman , Adhemerval Zanella Netto , Carlos O'Donell , Florian Weimer , Arnd Bergmann , Jann Horn , Christian Brauner , linux-mm@kvack.org, Linus Torvalds Subject: Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings Message-ID: References: <20230101162910.710293-1-Jason@zx2c4.com> <20230101162910.710293-3-Jason@zx2c4.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DC46A140005 X-Stat-Signature: 98xmngwaxpdshbchyd3nwk6snayb51z6 X-HE-Tag: 1672771910-617690 X-HE-Meta: U2FsdGVkX18qAKSGjPFoCu+kbBMTxyQuCVv5jS5UKyfv8hE2wzFcXF/yEHhLQXWKKAiJVXypYJTN0oGDHSB/LfmHbjb5TMz/4YlDujzd/yIc3+ZuIoIQvYMBTSWIwbA27mMNSEZ8N18hs5i+PeOH5O2JbBvraKLi3vxASupPfT64+PMgh4NIv17+1imO3j6iRNFsqHm4k7DixSBLiJAuu36GUwfv92OMBsi7mFSom9Jtd20kC6bd3qfcfk510rVYl2A3vC41zXPJiS54MjrJq9wdlFnG1Hb9C+bmgq4zPu+OjJixxjsGU3Vn0WJHm0ktB2lpYGHNCR/kyjlv5WbDZstYwODgj4+wjbc2t7BM1R//1g1m7y0Bxobb639M5Ov7VAiCreQRb9JbtLRNbBJbMhzdmXn9WpsSEL19glhECRCW/hRkZlxiPn55/KdPXR9DkGtEyzsOFEeXcAIMKVLEHslRZj7BlDFq0q6KeQiNigtDnj7D092RQtpbwcOuP3Ta0EmPmOcGKbHhdoBP7+FxffKqaqiO/OIctXez9L5JhoDjS1fnRKmHrL43q/E9lTVLJeoARWQkrxj6piXv2FRELWeFtJMsxnNP560EKEs3iAkdkJ4QGFQ7xrg15FF0fF0s+WYqPWLh2EHUav2C99/YUKjQYwa0NRq8pIigvEcma8vX5iU6Lf5o+VrwtHGMHUB4aJryCZFMgbDQBxmN+0O+0htJ/zki3hV4d6pIqGKqOX+zB+ei9gxFCsumeW51yZPSYcOKnVFHUuoG3Hdv0LeM7EQqO4F8Zane4XLtFUvVyLpZ6PynfD8yCO7eYpQMj9XMX09T197dfvTNrTM4GUo1expJryaUmXS+iiJrZGzrd579AlCX3A1w6QT8NbfURoIt5SXTugghKmypOwbsD9E1jWiYAwynpjX+i0pL30fNXr4qvhnykdRqM4a7R7yUEIYeXIoeYbStjIay3z1MUcg lZic+2QV fHMe0Hd2qs1bjjrAw2Gl7olpbMN6mDVgT+eJLrU/r3Ycb1DZxG/c7SbC1ppZI0Hn2iEIImrnV4p8nV0D4AJG49w9atxjuK5qbVCvmuYLq9tTxl6XIxrB3uv+XCzETi/hb0/WtOLVRXkYUyUsZMoW44TF3oVglx6ZUNrlqivLLk4xBmfYZnO9pfa9bFcEWtPVn6rikNiy5Yy6wsPYimnMOzMBmhDLpDDsJSrKwoF5gI34VcGEn/E+gXQu5sinedmfbdIGBuisef4lQ/ug= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 03, 2023 at 07:15:50PM +0100, Ingo Molnar wrote: > Frankly, I don't appreciate your condescending discussion style that > borders on the toxic, and to save time I'm nacking this technical approach > until both the patch-set and your reaction to constructive review feedback > improves: > > NAcked-by: Ingo Molnar Your initial email to me did not strike me as constructive at all. All I gotta say is that you really seem to live up to your reputation here... But trying to steer things back to the technical realm: > For a single architecture: x86. > > And it's only 19 lines because x86 already happens to have a bunch of > complexity implemented, such as a safe instruction decoder that allows the > skipping of an instruction - which relies on thousands of lines of > complexity. > > On an architecture where this isn't present, it would have to be > implemented to support the instruction-skipping aspect of VM_DROPPABLE. My assumption is actually the opposite: that x86 (CISC) is basically the most complex, and that the RISC architectures will all be a good deal more trivial -- e.g. just adding 4 to IP on some. It looks like some architectures also already have mature decoders where required. > Even on x86, it's not common today for the software-decoder to be used in > unprivileged code - primary use was debugging & instrumentation code. So > your patches bring this piece of complexity to a much larger scope of > untrusted user-space functionality. As far as I can tell, this decoder *is* used with userspace already. It's used by SEV and by UMIP, in a totally userspace accessible way. Am I misunderstanding its use there? It looks to me like that operates on untrusted code. *However* - if your big objection to this patch is that the instruction skipping is problematic, we could actually punt that part. The result will be that userspace just retries the memory write and the fault happens again, and eventually it succeeds. From a perspective of vgetrandom(), that's perhaps worse -- with this v14 patchset, it'll immediately fallback to the syscall under memory pressure -- but you could argue that nobody really cares about performance at that point anyway, and so just retrying the fault until it succeeds is a less complex behavior that would be just fine. Let me know if you think that'd be an acceptable compromise, and I'll roll it into v15. As a preview, it pretty much amounts to dropping 3/7 and editing the commit message in this 2/7 patch. > I did not suggest to swap it: my suggestion is to just pin these vDSO data > pages. The per thread memory overhead is infinitesimal on the vast majority > of the target systems, and the complexity trade-off you are proposing is > poorly reasoned IMO. > > I think my core point that it would be much simpler to simply pin those > pages and not introduce rarely-excercised 'discardable memory' semantics in > Linux is a fair one - so it's straightforward to lift this NAK. Okay so this is where I think we're really not lined up and is a large part of why I wondered whether you'd read the commit messages before dismissing this. This VM_DROPPABLE mapping comes as a result of a vgetrandom_alloc() syscall, which (g)libc makes at some point, and then the result of that is passed to the vDSO getrandom() function. The memory in vgetrandom_alloc() is then carved up, one per thread, with (g)libc's various internal pthread creation/exit functions. So that means this isn't a thing that's trivially limited to just one per thread. Userspace can call vgetrandom_alloc() all it wants. Thus, I'm having a hard time seeing how relaxing rlimits here as you suggested doesn't amount to an rlimit backdoor. I'm also not seeing other fancy options for "pinning pages" as you mentioned in this email. Something about having the kernel allocate them on clone()? That seems terrible and complex. And if you do want this all to go through mlock(), somehow, there's still the fork() inheritabiity issue. (This was all discussed on the thread a few versions ago that surfaced these issues, by the way.) So I'm not really seeing any good alternatives, no matter how hard I squint at your suggestions. Maybe you can elaborate a bit? Alternatively, perhaps the compromise I suggested above where we ditch the instruction decoder stuff is actually fine with you? > rarely-excercised 'discardable memory' semantics in > Linux is a fair one - so it's straightforward to lift this NAK. I still don't think calling this "rarely-exercised" is true. Desktop machines regularly OOM with lots of Chrome tabs, and this is functionality that'll be in glibc, so it'll be exercised quite often. Even on servers, many operators work with the philosophy that unused RAM is wasted RAM, and so servers are run pretty close to capacity. Not to mention Android, where lots of handsets have way too little RAM. Swapping and memory pressure and so forth is very real. So claiming that this is somehow obscure or rarely used or what have you isn't very accurate. These are code paths that will certainly get exercised. Jason