From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAB81C433EF for ; Mon, 4 Oct 2021 17:58:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 354A261213 for ; Mon, 4 Oct 2021 17:58:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 354A261213 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 9360D940065; Mon, 4 Oct 2021 13:58:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E4C794000B; Mon, 4 Oct 2021 13:58:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7861C940065; Mon, 4 Oct 2021 13:58:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id 6379E94000B for ; Mon, 4 Oct 2021 13:58:34 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0B49D181019C5 for ; Mon, 4 Oct 2021 17:58:34 +0000 (UTC) X-FDA: 78659514948.22.C49F820 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 8E144B001CCC for ; Mon, 4 Oct 2021 17:58:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633370313; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FxQQxvvGcqAWraIvuuYwyUrgNPDg6omqH/ZFVhZ6jGA=; b=URd9kzxA0l4aNBSrGDc58hN2LCffM+ZNNZNTkvqo6BoR9jEfHWUFA3EX7vDQYG/shqVZPS 1tZxOMG8BQYdcxJsMXvRuCHq5wJq8vab0jAXyiGWRHR7ioWhTddec5DyM3GDbithlioxfN jHrtBqDG40VJyyDT2swrfSfH7dIOK6g= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-278-WDjS66vBME-pc3qoVKMswg-1; Mon, 04 Oct 2021 13:58:32 -0400 X-MC-Unique: WDjS66vBME-pc3qoVKMswg-1 Received: by mail-wm1-f70.google.com with SMTP id h24-20020a7bc938000000b0030d400be5b5so5731441wml.0 for ; Mon, 04 Oct 2021 10:58:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=FxQQxvvGcqAWraIvuuYwyUrgNPDg6omqH/ZFVhZ6jGA=; b=YuzX4cwP5tAGl5kNkhXlhzdXeOP1f+9VM5Yy2+Qb6FdcOMjDgiFzHZBGYKr7sn5Ckq hMPzr+U36sMbNmrdv8rA+nxzGWmql8jlE5YIeXOLqnfjdso3L1OrIKhxaaJiqV8nSbjv 3cP5Kc791YCH17FN5WyGZ2nwB98AE7x/N4tp2G98pJX9N23nCLow9omia6gXE97b/Z0j tt0yjuTp1+D/3OPR31nRrK6o1lilk3ZC+IiNFwtdgfdeLHiuJqAoICGVQxbSZLnnQ9TS RUR1ZjrYMmFkcZNSLUt/POT7tnfKF6y2FgZq7rkpSZHl5EoSHOFzJ22ImuB/hYkKbRyl chlQ== X-Gm-Message-State: AOAM531vTEv2Uf2WiIZ9BeGYVXD8KLw83shyvb5kWITDBoWH2edq0T35 pv+3NDco3k3sGUBURQUddjjI4ZDQoUnaEyz5PAZd/KPOC83it6/xQQywonkjdG2Y881ZVsUMcR3 mFPsN0/8zOmA= X-Received: by 2002:a7b:cf17:: with SMTP id l23mr8477241wmg.152.1633370310775; Mon, 04 Oct 2021 10:58:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwd8ptk/toChuDvoXuiJScRuc2IZBxB2I/yMLx35rtCJMRxU4w3BpsifpFFOeWLODKAa24ZCw== X-Received: by 2002:a7b:cf17:: with SMTP id l23mr8477210wmg.152.1633370310451; Mon, 04 Oct 2021 10:58:30 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6672.dip0.t-ipconnect.de. [91.12.102.114]) by smtp.gmail.com with ESMTPSA id z17sm15175596wrr.49.2021.10.04.10.58.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Oct 2021 10:58:30 -0700 (PDT) To: Nadav Amit Cc: Andrew Morton , Linux-MM , Linux Kernel Mailing List , Peter Xu , Andrea Arcangeli , Minchan Kim , Colin Cross , Suren Baghdasarya , Mike Rapoport References: <20210926161259.238054-1-namit@vmware.com> <7ce823c8-cfbf-cc59-9fc7-9aa3a79740c3@redhat.com> <6E8A03DD-175F-4A21-BCD7-383D61344521@gmail.com> <2753a311-4d5f-8bc5-ce6f-10063e3c6167@redhat.com> <9DE833C8-515F-4427-9867-E5BF9AD380FB@gmail.com> <9b53a85c-83f4-4548-c3b5-c65bd8737670@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 0/8] mm/madvise: support process_madvise(MADV_DONTNEED) Message-ID: Date: Mon, 4 Oct 2021 19:58:29 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=URd9kzxA; spf=none (imf19.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8E144B001CCC X-Stat-Signature: uyur3d4c57pezi7rj56q644k5k77yx6x X-HE-Tag: 1633370313-658238 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> >> Thanks for the pointer. >> >> And my question would be if something like DAMON would actually be wha= t you want. >=20 > I looked into DAMON and even with the proposed future extensions it sou= nds > as a different approach with certain benefits but with many limitations= . >=20 > The major limitation of DAMON is that you need to predefine the logic y= ou > want for reclamation into the kernel. You can add programability throug= h > some API or even eBPF, but it would never be as easy or as versatile as > what user manager can achieve. We already have pretty much all the > facilities to do so from userspace, and the missing parts (at least for > basic userspace manager) are almost already there. In contrast, see how > many iterations are needed for the basic DAMON implementation. I can see what you're saying when looking at optimizing a hand full of=20 special applications. I yet fail to see how something like that could=20 work as a full replacement for in kernel swapping. I'm happy to learn. >=20 > The second, also big, difference is that DAMON looks only on reclamatio= n. > If you want a custom prefetch scheme or different I/O stack for backing > storage, you cannot have such one. I do wonder if it could be extended for prefetching. But I am absolutely=20 not a DAMON expert. [...] >> >> You raise a very excellent point (and it should have been part of your= initial sales pitch): how does it differ to process_vm_writev(). >> >> I can say that it differs in a way that you can break applications in = more extreme ways. Let me give you two examples: >> >> 1. longterm pinnings: you raised this yourself; this can break an appl= ication silently and there is barely a safe way your tooling could handle= it. >> >> 2. pagemap: applications can depend on the populated(present |swap) in= formation in the pagemap for correctness. For example, there was recently= a discussion to use pagemap information to speed up live migration of VM= s, by skipping migration of !populated pages. There is currently no way y= our tooling can fake that. In comparison, ordinary swapping in the kernel= can handle it. >=20 > I understand (1). As for (2): the scenario that you mention sound > very specific, and one can argue that ignoring UFFD-registered > regions in such a case is either (1) wrong or (2) should trigger > some UFFD event. >=20 >> >> Is it easy to break an application with process_vm_writev()? Yes. When= talking about dynamic debugging, it's expected that you break the target= already -- or the target is already broken. Is it easier to break an app= lication with process_madvise(MADV_DONTNEED)? I'd say yes, especially whe= n implementing something way beyond debugging as you describe. >=20 > If you do not know what you are doing, you can easily break anything. > Note that there are other APIs that can break your application even > worse, specifically ptrace(). >=20 >> I'm giving you "a hard time" for the reason Michal raised: we discusse= d this in the past already at least two times IIRC and "it is a free tick= et to all sorts of hard to debug problem" in our opinion; especially when= we mess around in other process address spaces besides for debugging. >> >> I'm not the person to ack/nack this, I'm just asking the questions :) >=20 > I see your points and I try to look for a path of least resistance. > I thought that process_madvise() is a nice interface to hook into. It would be the right interface -- iff the operation wouldn't have a bad=20 smell to it. We don't really want applications to mess around in the=20 page table layout of some other process: however, that is exactly what=20 you require. By unlocking that interface for that use case we agree that=20 what you are proposing is a "sane use case", but ... >=20 > But if you are concerned it will be misused, how about adding instead > an IOCTL that will zap pages but only in UFFD-registered regions? > A separate IOCTL for this matter have an advantage of being more > tailored for UFFD, not to notify UFFD upon =E2=80=9Cremove=E2=80=9D and= to be less > likely to be misused. ... that won't change the fact that with your user-space swapping=20 approach that requires this interface we can break some applications=20 silently, and that's really the major concern I have. I mean, there are more cases where you can just harm the target=20 application I think, for example if the target application uses=20 SOFTDIRTY tracking. To judge if this is a sane use case we want to support, it would help a=20 lot if there would be actual code+evaluation when actually implementing=20 some of these advanced policies. Because you raise a lot of interesting=20 points in your reply to Michal to back your use case, and naive me=20 thinks "this sounds interesting but ... aren't we losing a lot of=20 flexibility+features when doing this in user space? Does anyone actually=20 want to do it like that?". Again, I'm not the person to ack/nack this, I'm just questioning if the=20 use case that requires this interface is actually something that will=20 get used later in real life because it has real advantages, or if it's a=20 pure research project that will get abandoned at some point and we ended=20 up exposing an interface we really didn't want to expose so far=20 (especially, because all other requests so far were bogus). --=20 Thanks, David / dhildenb