From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F53FC433F5 for ; Fri, 4 Mar 2022 09:12:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA6468D0002; Fri, 4 Mar 2022 04:12:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B558A8D0001; Fri, 4 Mar 2022 04:12:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A47508D0002; Fri, 4 Mar 2022 04:12:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 960A58D0001 for ; Fri, 4 Mar 2022 04:12:36 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 501D3240C9 for ; Fri, 4 Mar 2022 09:12:36 +0000 (UTC) X-FDA: 79206138312.01.B3E2CD4 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf14.hostedemail.com (Postfix) with ESMTP id B2224100019 for ; Fri, 4 Mar 2022 09:12:35 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 77377212C5; Fri, 4 Mar 2022 09:12:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1646385154; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VqMdHxq3Tr9wFinwTE0WAXhnv9tEGYziFkIAVbbyOnQ=; b=AkRsaKEn6BkQPgOaWVvK+fXQY1+K/rVYo7Ck9TjWhm7pX6kkJptc+hon/dZMW/TzohL2lh nPfP6WzaHDN/4tGiPUz2vFwgkPKPKBrRSNrstsA/DXm6XV0eaZ4+a9fYQdPUGxC4CV+Wc6 MSF6h2BaLxV6mE7tiW/OlzH/mp+82V8= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 62190A3B84; Fri, 4 Mar 2022 09:12:34 +0000 (UTC) Date: Fri, 4 Mar 2022 10:12:33 +0100 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED Message-ID: References: <20220303212956.229409-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B2224100019 X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=AkRsaKEn; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf14.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Stat-Signature: fzoz4tdcwsu5atfrmai9epkkhf1psjcf X-HE-Tag: 1646385155-515710 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [please CC linux-api if you are going to repost with the fix suggested by Nadav] On Thu 03-03-22 16:47:34, Johannes Weiner wrote: > On Thu, Mar 03, 2022 at 04:29:56PM -0500, Johannes Weiner wrote: > > MADV_DONTNEED historically rejects mlocked ranges, but with > > MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating, > > there are valid use cases for depopulating locked ranges as well. > > > > Users mlock memory to protect secrets. There are allocators for secure > > buffers that want in-use memory generally mlocked, but cleared and > > invalidated memory to give up the physical pages. This could be done > > with explicit munlock -> mlock calls on free -> alloc of course, but > > that adds two unnecessary syscalls, heavy mmap_sem write locks, vma > > splits and re-merges - only to get rid of the backing pages. > > > > Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are > > okay with on-demand initial population. It seems valid to selectively > > free some memory during the lifetime of such a process, without having > > to mess with its overall policy. > > > > Why add a separate flag? Isn't this a pretty niche usecase? > > > > - MADV_DONTNEED has been bailing on locked vmas forever. It's at least > > conceivable that someone, somewhere is relying on mlock to protect > > data from perhaps broader invalidation calls. Changing this behavior > > now could lead to quiet data corruption. > > > > - It also clarifies expectations around MADV_FREE and maybe > > MADV_REMOVE. It avoids the situation where one quietly behaves > > different than the others. MADV_FREE_LOCKED can be added later. > > > > - The combination of mlock() and madvise() in the first place is > > probably niche. But where it happens, I'd say that dropping pages > > from a locked region once they don't contain secrets or won't page > > anymore is much saner than relying on mlock to protect memory from > > speculative or errant invalidation calls. It's just that we can't > > change the default behavior because of the two previous points. > > > > Given that, an explicit new flag seems to make the most sense. > > > > Signed-off-by: Johannes Weiner > > Just for context, I found this discussion back from 2018: > > https://lkml.iu.edu/hypermail/linux/kernel/1806.1/00483.html > > It seems to me that the usecase wasn't really in question, but people > weren't sure about the API, and then Jason found a workaround before > the discussion really concluded. I was asked internally about this > feature, so I'm submitting another patch in this direction, but with > more thoughts on why I chose to go with a new flag. Hopefully we can > work it out this time around :-) Thanks for the link. The topic sounded familiar but I couldn't really remember any details anymore. Now I do remember that I wasn't happy about special casing MLOCK_ONFAULT. A dedicated madvise operation is definitely safer and I am OK with that. Presented usecases make sense to me as well. Btw. I have a recollection that Mike is working on MADV_DONTNEED support for hugetlb pages. I do not know the current state of that work. Not that it would make nay impact on your new flag but some minor changes might be needed. Anyway, after the madvise_need_mmap_write is addressed, feel free to add Acked-by: Michal Hocko Thanks! -- Michal Hocko SUSE Labs