From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, FSL_HELO_FAKE,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAAB3C33CAA for ; Tue, 21 Jan 2020 18:32:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63C3C24655 for ; Tue, 21 Jan 2020 18:32:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="feBsVhrF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63C3C24655 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E06516B026E; Tue, 21 Jan 2020 13:32:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DB54F6B0270; Tue, 21 Jan 2020 13:32:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCC6D6B0271; Tue, 21 Jan 2020 13:32:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id B3E7F6B026E for ; Tue, 21 Jan 2020 13:32:17 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 7C1B7181AEF1A for ; Tue, 21 Jan 2020 18:32:17 +0000 (UTC) X-FDA: 76402486314.29.flame58_4e245468a281d X-HE-Tag: flame58_4e245468a281d X-Filterd-Recvd-Size: 10563 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Tue, 21 Jan 2020 18:32:16 +0000 (UTC) Received: by mail-pl1-f195.google.com with SMTP id p23so103899plq.10 for ; Tue, 21 Jan 2020 10:32:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=yvndSpP/NSgHluqgcBzNQ8DcumhrYSHKGx7ZpL0Xkkg=; b=feBsVhrFKavcTHSNyEl6RKC+Qz5Xajkr2H4ffG+vzQ6fEayv/3W+yePz1Vbv2WsB7z 1AJ1GBB/Ya81kCybByEtAINJOR21f+gkBsnUo1Jfv7ZMUJEVZZcrNmd1oKoXM4iZu3lB Ho3mb+wD3PvaTy5ODQb/mMxEPC1hViwBVJpNmbAJ2/JXheUaC7kFzMxsV6kYtMGDDXhc 29KZ+LmmUsSyC4lHmmML685aQadzq/V/wed/bOFJg0fvs26/cE9utfiVmwJ/Mh/jfE5/ Jy5tMqx+TVsp4NNldlrtXi2VyGSoxkX4H9e4vgyDaVXnh5rWwapDEiWv93dHwnLcpHBU pT2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=yvndSpP/NSgHluqgcBzNQ8DcumhrYSHKGx7ZpL0Xkkg=; b=U0qLt+OGzoN8U4IDAPAG47iVaibBwnfapffwcTbDDfl1ET0CMjoYdpzw0BQ0ef9clK I28ELwAXw9cNYeqMjafSBt7CQYFJxqZiXLCS+nGOYG3YOqE8TB7Y/0kqAkMLfH9vW9Lk fIgdN6RsxR0S/PVYQvL1h6YH5V4HB/DP4cLN0u/AITOzkyg7lJl9ZFAqP6BCUn8RW/9V LXcE1+d3Hc6G1uZgWyx4DHyN+Y8zX/sqX5Ud5+xLGxAytbZ6pfwxR4SJ/k5OTauKCR4p YHSEZfClOx5W9i/f9mb0VxVg+A/ZBKT4ojS2PAUX2U6Vy3y/OZtM0gnWk8n9ARKeTBfJ GbRQ== X-Gm-Message-State: APjAAAVXrL7tYO19n4rjDNkNMQzeahxFAjueQ5Zr7S/qIDqoUcjjmXq6 38DnyG91jlDmG3nIsG7kCV4= X-Google-Smtp-Source: APXvYqxYREi5pdtkVxLeB5fMNn8UAXj1H1VBRUK+T8tIL7ytUYbmXrBEmvU32LPQNTui8UDjhIAOLA== X-Received: by 2002:a17:902:a40c:: with SMTP id p12mr6739925plq.292.1579631535551; Tue, 21 Jan 2020 10:32:15 -0800 (PST) Received: from google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id e1sm44395865pfl.98.2020.01.21.10.32.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jan 2020 10:32:14 -0800 (PST) Date: Tue, 21 Jan 2020 10:32:12 -0800 From: Minchan Kim To: Michal Hocko Cc: sspatil@google.com, kirill@shutemov.name, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, oleksandr@redhat.com, surenb@google.com, timmurray@google.com, dancol@google.com, sonnyrao@google.com, bgeffon@google.com, hannes@cmpxchg.org, shakeelb@google.com, joaodias@google.com, ktkhai@virtuozzo.com, christian.brauner@ubuntu.com, sjpark@amazon.de Subject: Re: [PATCH v2 2/5] mm: introduce external memory hinting API Message-ID: <20200121183212.GF140922@google.com> References: <20200116235953.163318-1-minchan@kernel.org> <20200116235953.163318-3-minchan@kernel.org> <20200117115225.GV19428@dhcp22.suse.cz> <20200117155837.bowyjpndfiym6cgs@box> <20200117173239.GB140922@google.com> <20200117212653.7uftw3lk35oykkmb@box> <20200119161431.GA94410@google.com> <20200120075825.GH18451@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200120075825.GH18451@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 20, 2020 at 08:58:25AM +0100, Michal Hocko wrote: > On Sun 19-01-20 08:14:31, sspatil@google.com wrote: > > On Sat, Jan 18, 2020 at 12:26:53AM +0300, Kirill A. Shutemov wrote: > > > On Fri, Jan 17, 2020 at 09:32:39AM -0800, Minchan Kim wrote: > > > > On Fri, Jan 17, 2020 at 06:58:37PM +0300, Kirill A. Shutemov wrote: > > > > > On Fri, Jan 17, 2020 at 12:52:25PM +0100, Michal Hocko wrote: > > > > > > On Thu 16-01-20 15:59:50, Minchan Kim wrote: > > > > > > > There is usecase that System Management Software(SMS) want to give > > > > > > > a memory hint like MADV_[COLD|PAGEEOUT] to other processes and > > > > > > > in the case of Android, it is the ActivityManagerService. > > > > > > > > > > > > > > It's similar in spirit to madvise(MADV_WONTNEED), but the information > > > > > > > required to make the reclaim decision is not known to the app. Instead, > > > > > > > it is known to the centralized userspace daemon(ActivityManagerService), > > > > > > > and that daemon must be able to initiate reclaim on its own without > > > > > > > any app involvement. > > > > > > > > > > > > > > To solve the issue, this patch introduces new syscall process_madvise(2). > > > > > > > It uses pidfd of an external processs to give the hint. > > > > > > > > > > > > > > int process_madvise(int pidfd, void *addr, size_t length, int advise, > > > > > > > unsigned long flag); > > > > > > > > > > > > > > Since it could affect other process's address range, only privileged > > > > > > > process(CAP_SYS_PTRACE) or something else(e.g., being the same UID) > > > > > > > gives it the right to ptrace the process could use it successfully. > > > > > > > The flag argument is reserved for future use if we need to extend the > > > > > > > API. > > > > > > > > > > > > > > I think supporting all hints madvise has/will supported/support to > > > > > > > process_madvise is rather risky. Because we are not sure all hints make > > > > > > > sense from external process and implementation for the hint may rely on > > > > > > > the caller being in the current context so it could be error-prone. > > > > > > > Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. > > > > > > > > > > > > > > If someone want to add other hints, we could hear hear the usecase and > > > > > > > review it for each hint. It's more safe for maintainace rather than > > > > > > > introducing a buggy syscall but hard to fix it later. > > > > > > > > > > > > I have brought this up when we discussed this in the past but there is > > > > > > no reflection on that here so let me bring that up again. > > > > > > > > > > > > I believe that the interface has an inherent problem that it is racy. > > > > > > The external entity needs to know the address space layout of the target > > > > > > process to do anyhing useful on it. The address space is however under > > > > > > the full control of the target process though and the external entity > > > > > > has no means to find out that the layout has changed. So > > > > > > time-to-check-time-to-act is an inherent problem. > > > > > > > > > > > > This is a serious design flaw and it should be explained why it doesn't > > > > > > matter or how to use the interface properly to prevent that problem. > > > > > > > > > > I agree, it looks flawed. > > > > > > > > > > Also I don't see what System Management Software can generically do on > > > > > sub-process level. I mean how can it decide which part of address space is > > > > > less important than other. > > > > > > > > > > I see how a manager can indicate that this process (or a group of > > > > > processes) is less important than other, but on per-addres-range basis? > > > > > > > > For example, memory ranges shared by several processes or critical for the > > > > latency, we could avoid those ranges to be cold/pageout to prevent > > > > unncecessary CPU burning/paging. > > > > > > Hmm.. I still don't see why any external entity has a better (or any) > > > knowledge about the matter. The process has to do this, no? > > > > FWIW, I totally agree with the time-to-check-time-to-react problem. However, > > I'd like to clarify the ActivityManager/SystemServer case (I'll call it > > SystemServer from now on) > > > > For Android, every application (including the special SystemServer) are forked > > from Zygote. The reason ofcourse is to share as many libraries and classes between > > the two as possible to benefit from the preloading during boot. > > > > After applications start, (almost) all of the APIs end up calling into this > > SystemServer process over IPC (binder) and back to the application. > > > > In a fully running system, the SystemServer monitors every single process > > periodically to calculate their PSS / RSS and also decides which process is > > "important" to the user for interactivity. > > > > So, because of how these processes start _and_ the fact that the SystemServer > > is looping to monitor each process, it does tend to *know* which address > > range of the application is not used / useful. > > > > Besides, we can never rely on applications to clean things up themselves. > > We've had the "hey app1, the system is low on memory, please trim your > > memory usage down" notifications for a long time[1]. They rely on > > applications honoring the broadcasts and very few do. > > > > So, if we want to avoid the inevitable killing of the application and > > restarting it, some way to be able to tell the OS about unimportant memory in > > these applications will be useful. Thanks for adding more useful description, Sandeep. > > This is a useful information that should be a part of the changelog. I I will incldue it in next respin. > do see how the current form of the API might fit into Android model > without many problems. But we are not designing an API for a single > usecase, right? In a highly cooperative environments you can use ptrace > code injection as mentioned by Kirill. Or is there any fundamental > problem about that? I replied it at Kirill's thread so let's discuss there. > > The interface really has to be robust to future potential usecases. I do understand your concern but for me, it's chicken and egg problem. We usually do best effort to make something perfect as far as possible but we also don't do over-engineering without real usecase from the beginning. I already told you how we could synchronize among processes and potential way to be extended Daniel suggested(That's why current API has extra field for the cookie) even though we don't need it right now. If you want to suggest the other way, please explain why your idea is better and why we need it at this moment. I don't think that is a blocker for the progress of this API since we already have several ways to synchronize processes.