From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Fowb=OY=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AAB81C433EF
	for <linux-mm@archiver.kernel.org>; Mon,  4 Oct 2021 17:58:35 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 354A261213
	for <linux-mm@archiver.kernel.org>; Mon,  4 Oct 2021 17:58:35 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 354A261213
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id 9360D940065; Mon,  4 Oct 2021 13:58:34 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8E4C794000B; Mon,  4 Oct 2021 13:58:34 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7861C940065; Mon,  4 Oct 2021 13:58:34 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37])
	by kanga.kvack.org (Postfix) with ESMTP id 6379E94000B
	for <linux-mm@kvack.org>; Mon,  4 Oct 2021 13:58:34 -0400 (EDT)
Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 0B49D181019C5
	for <linux-mm@kvack.org>; Mon,  4 Oct 2021 17:58:34 +0000 (UTC)
X-FDA: 78659514948.22.C49F820
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by imf19.hostedemail.com (Postfix) with ESMTP id 8E144B001CCC
	for <linux-mm@kvack.org>; Mon,  4 Oct 2021 17:58:33 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1633370313;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=FxQQxvvGcqAWraIvuuYwyUrgNPDg6omqH/ZFVhZ6jGA=;
	b=URd9kzxA0l4aNBSrGDc58hN2LCffM+ZNNZNTkvqo6BoR9jEfHWUFA3EX7vDQYG/shqVZPS
	1tZxOMG8BQYdcxJsMXvRuCHq5wJq8vab0jAXyiGWRHR7ioWhTddec5DyM3GDbithlioxfN
	jHrtBqDG40VJyyDT2swrfSfH7dIOK6g=
Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com
 [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-278-WDjS66vBME-pc3qoVKMswg-1; Mon, 04 Oct 2021 13:58:32 -0400
X-MC-Unique: WDjS66vBME-pc3qoVKMswg-1
Received: by mail-wm1-f70.google.com with SMTP id h24-20020a7bc938000000b0030d400be5b5so5731441wml.0
        for <linux-mm@kvack.org>; Mon, 04 Oct 2021 10:58:31 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:to:cc:references:from:organization:subject
         :message-id:date:user-agent:mime-version:in-reply-to
         :content-language:content-transfer-encoding;
        bh=FxQQxvvGcqAWraIvuuYwyUrgNPDg6omqH/ZFVhZ6jGA=;
        b=YuzX4cwP5tAGl5kNkhXlhzdXeOP1f+9VM5Yy2+Qb6FdcOMjDgiFzHZBGYKr7sn5Ckq
         hMPzr+U36sMbNmrdv8rA+nxzGWmql8jlE5YIeXOLqnfjdso3L1OrIKhxaaJiqV8nSbjv
         3cP5Kc791YCH17FN5WyGZ2nwB98AE7x/N4tp2G98pJX9N23nCLow9omia6gXE97b/Z0j
         tt0yjuTp1+D/3OPR31nRrK6o1lilk3ZC+IiNFwtdgfdeLHiuJqAoICGVQxbSZLnnQ9TS
         RUR1ZjrYMmFkcZNSLUt/POT7tnfKF6y2FgZq7rkpSZHl5EoSHOFzJ22ImuB/hYkKbRyl
         chlQ==
X-Gm-Message-State: AOAM531vTEv2Uf2WiIZ9BeGYVXD8KLw83shyvb5kWITDBoWH2edq0T35
	pv+3NDco3k3sGUBURQUddjjI4ZDQoUnaEyz5PAZd/KPOC83it6/xQQywonkjdG2Y881ZVsUMcR3
	mFPsN0/8zOmA=
X-Received: by 2002:a7b:cf17:: with SMTP id l23mr8477241wmg.152.1633370310775;
        Mon, 04 Oct 2021 10:58:30 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJwd8ptk/toChuDvoXuiJScRuc2IZBxB2I/yMLx35rtCJMRxU4w3BpsifpFFOeWLODKAa24ZCw==
X-Received: by 2002:a7b:cf17:: with SMTP id l23mr8477210wmg.152.1633370310451;
        Mon, 04 Oct 2021 10:58:30 -0700 (PDT)
Received: from [192.168.3.132] (p5b0c6672.dip0.t-ipconnect.de. [91.12.102.114])
        by smtp.gmail.com with ESMTPSA id z17sm15175596wrr.49.2021.10.04.10.58.29
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 04 Oct 2021 10:58:30 -0700 (PDT)
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Linux-MM <linux-mm@kvack.org>,
 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
 Peter Xu <peterx@redhat.com>, Andrea Arcangeli <aarcange@redhat.com>,
 Minchan Kim <minchan@kernel.org>, Colin Cross <ccross@google.com>,
 Suren Baghdasarya <surenb@google.com>,
 Mike Rapoport <rppt@linux.vnet.ibm.com>
References: <20210926161259.238054-1-namit@vmware.com>
 <7ce823c8-cfbf-cc59-9fc7-9aa3a79740c3@redhat.com>
 <6E8A03DD-175F-4A21-BCD7-383D61344521@gmail.com>
 <2753a311-4d5f-8bc5-ce6f-10063e3c6167@redhat.com>
 <AE756194-07D4-4467-92CA-9E986140D85D@gmail.com>
 <f47970f5-faa7-9d5f-f07a-9399e4626eda@redhat.com>
 <9DE833C8-515F-4427-9867-E5BF9AD380FB@gmail.com>
 <9b53a85c-83f4-4548-c3b5-c65bd8737670@redhat.com>
 <C533782D-9E4B-41F5-9120-A31A4782BCE5@gmail.com>
From: David Hildenbrand <david@redhat.com>
Organization: Red Hat
Subject: Re: [RFC PATCH 0/8] mm/madvise: support
 process_madvise(MADV_DONTNEED)
Message-ID: <a456a41d-c089-a639-b223-4412bad82e8d@redhat.com>
Date: Mon, 4 Oct 2021 19:58:29 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <C533782D-9E4B-41F5-9120-A31A4782BCE5@gmail.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Authentication-Results: imf19.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=URd9kzxA;
	spf=none (imf19.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 8E144B001CCC
X-Stat-Signature: uyur3d4c57pezi7rj56q644k5k77yx6x
X-HE-Tag: 1633370313-658238
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

>>
>> Thanks for the pointer.
>>
>> And my question would be if something like DAMON would actually be wha=
t you want.
>=20
> I looked into DAMON and even with the proposed future extensions it sou=
nds
> as a different approach with certain benefits but with many limitations=
.
>=20
> The major limitation of DAMON is that you need to predefine the logic y=
ou
> want for reclamation into the kernel. You can add programability throug=
h
> some API or even eBPF, but it would never be as easy or as versatile as
> what user manager can achieve. We already have pretty much all the
> facilities to do so from userspace, and the missing parts (at least for
> basic userspace manager) are almost already there. In contrast, see how
> many iterations are needed for the basic DAMON implementation.

I can see what you're saying when looking at optimizing a hand full of=20
special applications. I yet fail to see how something like that could=20
work as a full replacement for in kernel swapping. I'm happy to learn.

>=20
> The second, also big, difference is that DAMON looks only on reclamatio=
n.
> If you want a custom prefetch scheme or different I/O stack for backing
> storage, you cannot have such one.

I do wonder if it could be extended for prefetching. But I am absolutely=20
not a DAMON expert.

[...]

>>
>> You raise a very excellent point (and it should have been part of your=
 initial sales pitch): how does it differ to process_vm_writev().
>>
>> I can say that it differs in a way that you can break applications in =
more extreme ways. Let me give you two examples:
>>
>> 1. longterm pinnings: you raised this yourself; this can break an appl=
ication silently and there is barely a safe way your tooling could handle=
 it.
>>
>> 2. pagemap: applications can depend on the populated(present |swap) in=
formation in the pagemap for correctness. For example, there was recently=
 a discussion to use pagemap information to speed up live migration of VM=
s, by skipping migration of !populated pages. There is currently no way y=
our tooling can fake that. In comparison, ordinary swapping in the kernel=
 can handle it.
>=20
> I understand (1). As for (2): the scenario that you mention sound
> very specific, and one can argue that ignoring UFFD-registered
> regions in such a case is either (1) wrong or (2) should trigger
> some UFFD event.
>=20
>>
>> Is it easy to break an application with process_vm_writev()? Yes. When=
 talking about dynamic debugging, it's expected that you break the target=
 already -- or the target is already broken. Is it easier to break an app=
lication with process_madvise(MADV_DONTNEED)? I'd say yes, especially whe=
n implementing something way beyond debugging as you describe.
>=20
> If you do not know what you are doing, you can easily break anything.
> Note that there are other APIs that can break your application even
> worse, specifically ptrace().
>=20
>> I'm giving you "a hard time" for the reason Michal raised: we discusse=
d this in the past already at least two times IIRC and "it is a free tick=
et to all sorts of hard to debug problem" in our opinion; especially when=
 we mess around in other process address spaces besides for debugging.
>>
>> I'm not the person to ack/nack this, I'm just asking the questions :)
>=20
> I see your points and I try to look for a path of least resistance.
> I thought that process_madvise() is a nice interface to hook into.

It would be the right interface -- iff the operation wouldn't have a bad=20
smell to it. We don't really want applications to mess around in the=20
page table layout of some other process: however, that is exactly what=20
you require. By unlocking that interface for that use case we agree that=20
what you are proposing is a "sane use case", but  ...

>=20
> But if you are concerned it will be misused, how about adding instead
> an IOCTL that will zap pages but only in UFFD-registered regions?
> A separate IOCTL for this matter have an advantage of being more
> tailored for UFFD, not to notify UFFD upon =E2=80=9Cremove=E2=80=9D and=
 to be less
> likely to be misused.

... that won't change the fact that with your user-space swapping=20
approach that requires this interface we can break some applications=20
silently, and that's really the major concern I have.

I mean, there are more cases where you can just harm the target=20
application I think, for example if the target application uses=20
SOFTDIRTY tracking.


To judge if this is a sane use case we want to support, it would help a=20
lot if there would be actual code+evaluation when actually implementing=20
some of these advanced policies. Because you raise a lot of interesting=20
points in your reply to Michal to back your use case, and naive me=20
thinks "this sounds interesting but ... aren't we losing a lot of=20
flexibility+features when doing this in user space? Does anyone actually=20
want to do it like that?".

Again, I'm not the person to ack/nack this, I'm just questioning if the=20
use case that requires this interface is actually something that will=20
get used later in real life because it has real advantages, or if it's a=20
pure research project that will get abandoned at some point and we ended=20
up exposing an interface we really didn't want to expose so far=20
(especially, because all other requests so far were bogus).

--=20
Thanks,

David / dhildenb