From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71AC8C28CC6 for ; Mon, 3 Jun 2019 20:32:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 34E1C255C0 for ; Mon, 3 Jun 2019 20:32:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34E1C255C0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A5C6B6B026B; Mon, 3 Jun 2019 16:32:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E6F96B026C; Mon, 3 Jun 2019 16:32:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 888806B0270; Mon, 3 Jun 2019 16:32:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 31CE96B026B for ; Mon, 3 Jun 2019 16:32:36 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id l26so5805784eda.2 for ; Mon, 03 Jun 2019 13:32:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:message-id:references:mime-version:content-disposition :in-reply-to:user-agent; bh=xZMXF1O3L2BCUBbSh1uUkb+W06NvGrlgRqLNll5Jgw4=; b=FiydisrRVQM6TySaafBADjHQyw2kfeHiuOQrhMDUMqBG3PIbQ+1JEAtIekMRudZAYN S5z0OMDkb3UUxQmHq58pue6AbVfLbMEmAuOh9Eb7ru5NuQ96ZUNF+k+6J+zSpM246hES 88d+YFR+hedy0Qssms9SMIPWUJYxMlEgI7ownnVCNx4MHxNjeJlvTCYYwQ8YoypHcUNp igSJiPZP7g0GfgVt6MAUzvhSf8Xae3RbNb+iyzBuCDLfLxDoKjVnj033XXyFlsmDi2k5 WxSsKSEUqWHwFJaUtTvVT10peWYygDww0R9tmwy3B1IhliD0Zr1OvoAbDKo6CZsyumPs r8Wg== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: APjAAAXtrCAPvSxKxlEY7jn4qfHA5Ntbmk3SPOnqVcHXIdAmc3UbxdzZ S1kGeuUQ9zbfFbuILxgHkPtAZsXch1PqBeX5IwYSqt+gZC5IeqUa2PyU0RA7M1n4Q9FoZRZXPsn n6ecp9UXAFhl7V4Sv26D50jcfGXv3gePVkdALjej0e76gopcOh2azUT3sQOH37xQ= X-Received: by 2002:a17:906:3daa:: with SMTP id y10mr13643433ejh.65.1559593955726; Mon, 03 Jun 2019 13:32:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqz/eDIr+mI0A0mdM9jYoGa+P46qEZQuaJxtoLMkMHXcwLVgG20JTcMldtblXO0PKTZOA3KI X-Received: by 2002:a17:906:3daa:: with SMTP id y10mr13643334ejh.65.1559593954365; Mon, 03 Jun 2019 13:32:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559593954; cv=none; d=google.com; s=arc-20160816; b=xOivsC0ap2qO7ZalSQniDWS1nnXkhVDfjKR0qa199a/AJpKzM7g278TpkROOBtNVda 8deETBLwEzlEO6iq7XeJ4uLuTkGDll2qveKwSnzUJ5xzbnjiZIdEW/+W4QkIL4aZ+jUx 2N72BMsa4rNH6SE4nHJNcXOHqAfjWq98L/q0ocqUfFtur9F8NPWPpfKfDTPGv4+d6UDk 9bPPVCgRwyFrYp0dUbccMU5p2s+wnIjT8N23X2/h9jFaWNCx+0hMFXiWCBYb9jXXjOEs 3zi0kv6iG3LhVwscEHhphXPdCbnJNCBaFeKabK+UdO8lDYagFFZoQXY7Fh/TFw2B3kva IJ7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date; bh=xZMXF1O3L2BCUBbSh1uUkb+W06NvGrlgRqLNll5Jgw4=; b=pqicp8fLWBAeUAXB1QCYS4ywoYaD5EsEWzFVSbq18j+AZw55dyExSDNH32Dgmb3JQ3 Vlqae6fQ56nMVvsmnqaBSWl3Kqh2akJqnPywNYVKvqhoduS0rK+BioeckKBbmZPNoOCn b5mPZJ1xKJyIPjBIgAWX7tet8aPkbBzQyU0EXq4d7zPHYTXdpvwqXkX0UZEKiujYNSEM bbARmVgwCBGfMvF2fhsziryJ8AxVbrc4xibQiG//yUjSmOEkX/ewUVXkWfDNlQzvTr9C 3tKyTomCkElSoeXjBimuttL3w/7PadyxsM5Cbu2O68OBzNArTpwA8sX3T+nMqDDUp/5I 2wHA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id a39si1563109edd.216.2019.06.03.13.32.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jun 2019 13:32:34 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 633F7ACF8; Mon, 3 Jun 2019 20:32:33 +0000 (UTC) Date: Mon, 3 Jun 2019 22:32:30 +0200 From: Michal Hocko To: Johannes Weiner Cc: Minchan Kim , Andrew Morton , linux-mm , LKML , linux-api@vger.kernel.org, Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , jannh@google.com, oleg@redhat.com, christian@brauner.io, oleksandr@redhat.com, hdanton@sina.com Subject: Re: [RFCv2 1/6] mm: introduce MADV_COLD Message-ID: <20190603203230.GB22799@dhcp22.suse.cz> References: <20190531064313.193437-1-minchan@kernel.org> <20190531064313.193437-2-minchan@kernel.org> <20190531084752.GI6896@dhcp22.suse.cz> <20190531133904.GC195463@google.com> <20190531140332.GT6896@dhcp22.suse.cz> <20190531143407.GB216592@google.com> <20190603071607.GB4531@dhcp22.suse.cz> <20190603172717.GA30363@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190603172717.GA30363@cmpxchg.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 03-06-19 13:27:17, Johannes Weiner wrote: > On Mon, Jun 03, 2019 at 09:16:07AM +0200, Michal Hocko wrote: > > On Fri 31-05-19 23:34:07, Minchan Kim wrote: > > > On Fri, May 31, 2019 at 04:03:32PM +0200, Michal Hocko wrote: > > > > On Fri 31-05-19 22:39:04, Minchan Kim wrote: > > > > > On Fri, May 31, 2019 at 10:47:52AM +0200, Michal Hocko wrote: > > > > > > On Fri 31-05-19 15:43:08, Minchan Kim wrote: > > > > > > > When a process expects no accesses to a certain memory range, it could > > > > > > > give a hint to kernel that the pages can be reclaimed when memory pressure > > > > > > > happens but data should be preserved for future use. This could reduce > > > > > > > workingset eviction so it ends up increasing performance. > > > > > > > > > > > > > > This patch introduces the new MADV_COLD hint to madvise(2) syscall. > > > > > > > MADV_COLD can be used by a process to mark a memory range as not expected > > > > > > > to be used in the near future. The hint can help kernel in deciding which > > > > > > > pages to evict early during memory pressure. > > > > > > > > > > > > > > Internally, it works via deactivating pages from active list to inactive's > > > > > > > head if the page is private because inactive list could be full of > > > > > > > used-once pages which are first candidate for the reclaiming and that's a > > > > > > > reason why MADV_FREE move pages to head of inactive LRU list. Therefore, > > > > > > > if the memory pressure happens, they will be reclaimed earlier than other > > > > > > > active pages unless there is no access until the time. > > > > > > > > > > > > [I am intentionally not looking at the implementation because below > > > > > > points should be clear from the changelog - sorry about nagging ;)] > > > > > > > > > > > > What kind of pages can be deactivated? Anonymous/File backed. > > > > > > Private/shared? If shared, are there any restrictions? > > > > > > > > > > Both file and private pages could be deactived from each active LRU > > > > > to each inactive LRU if the page has one map_count. In other words, > > > > > > > > > > if (page_mapcount(page) <= 1) > > > > > deactivate_page(page); > > > > > > > > Why do we restrict to pages that are single mapped? > > > > > > Because page table in one of process shared the page would have access bit > > > so finally we couldn't reclaim the page. The more process it is shared, > > > the more fail to reclaim. > > > > So what? In other words why should it be restricted solely based on the > > map count. I can see a reason to restrict based on the access > > permissions because we do not want to simplify all sorts of side channel > > attacks but memory reclaim is capable of reclaiming shared pages and so > > far I haven't heard any sound argument why madvise should skip those. > > Again if there are any reasons, then document them in the changelog. > > I think it makes sense. It could be explained, but it also follows > established madvise semantics, and I'm not sure it's necessarily > Minchan's job to re-iterate those. > > Sharing isn't exactly transparent to userspace. The kernel does COW, > ksm etc. When you madvise, you can really only speak for your own > reference to that memory - "*I* am not using this." > > This is in line with other madvise calls: MADV_DONTNEED clears the > local page table entries and drops the corresponding references, so > shared pages won't get freed. MADV_FREE clears the pte dirty bit and > also has explicit mapcount checks before clearing PG_dirty, so again > shared pages don't get freed. Right, being consistent with other madvise syscalls is certainly a way to go. And I am not pushing one way or another, I just want this to be documented with a reasoning behind. Consistency is certainly an argument to use. On the other hand these non-destructive madvise operations are quite different and the shared policy might differ as a result as well. We are aging objects rather than destroying them after all. Being able to age a pagecache with a sufficient privileges sounds like a useful usecase to me. In other words you are able to cause the same effect indirectly without the madvise operation so it kinda makes sense to allow it in a more sophisticated way. That being said, madvise is just a _hint_ and the kernel will be always free to ignore it so the future implementation might change so we can start simple and consistent with existing MADV_$FOO operations now and extend later on. But let's document the intention in the changelog and make the decision clear. I am sorry to be so anal about this but I have seen so many ad-hoc policies that were undocumented and it was so hard to guess when revisiting later on and make some sense of it. -- Michal Hocko SUSE Labs