From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3964C4360C for ; Fri, 4 Oct 2019 09:28:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 671CB2133F for ; Fri, 4 Oct 2019 09:28:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 671CB2133F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 113386B0003; Fri, 4 Oct 2019 05:28:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C6178E0005; Fri, 4 Oct 2019 05:28:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1D1A6B0007; Fri, 4 Oct 2019 05:28:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98]) by kanga.kvack.org (Postfix) with ESMTP id CFCB46B0003 for ; Fri, 4 Oct 2019 05:28:11 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 6799182437C9 for ; Fri, 4 Oct 2019 09:28:11 +0000 (UTC) X-FDA: 76005575982.16.bikes74_2a57e6186411a X-HE-Tag: bikes74_2a57e6186411a X-Filterd-Recvd-Size: 3443 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Fri, 4 Oct 2019 09:28:10 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 65DCAAC16; Fri, 4 Oct 2019 09:28:09 +0000 (UTC) Date: Fri, 4 Oct 2019 11:28:08 +0200 From: Michal Hocko To: David Rientjes Cc: Vlastimil Babka , Mike Kravetz , Linus Torvalds , Andrea Arcangeli , Andrew Morton , Mel Gorman , "Kirill A. Shutemov" , Linux Kernel Mailing List , Linux-MM Subject: Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim Message-ID: <20191004092808.GC9578@dhcp22.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 03-10-19 12:52:33, David Rientjes wrote: > On Thu, 3 Oct 2019, Vlastimil Babka wrote: > > > I think the key differences between Mike's tests and Michal's is this part > > from Mike's mail linked above: > > > > "I 'tested' by simply creating some background activity and then seeing > > how many hugetlb pages could be allocated. Of course, many tries over > > time in a loop." > > > > - "some background activity" might be different than Michal's pre-filling > > of the memory with (clean) page cache > > - "many tries over time in a loop" could mean that kswapd has time to > > reclaim and eventually the new condition for pageblock order will pass > > every few retries, because there's enough memory for compaction and it > > won't return COMPACT_SKIPPED > > > > I'll rely on Mike, the hugetlb maintainer, to assess the trade-off between > the potential for encountering very expensive reclaim as Andrea did and > the possibility of being able to allocate additional hugetlb pages at > runtime if we did that expensive reclaim. That tradeoff has been expressed by __GFP_RETRY_MAYFAIL which got broken by b39d0ee2632d. > For parity with previous kernels it seems reasonable to ask that this > remains unchanged since allocating large amounts of hugetlb pages has > different latency expectations than during page fault. This patch is > available if he'd prefer to go that route. > > On the other hand, userspace could achieve similar results if it were to > use vm.drop_caches and explicitly triggered compaction through either > procfs or sysfs before writing to vm.nr_hugepages, and that would be much > faster because it would be done in one go. Users who allocate through the > kernel command line would obviously be unaffected. Requesting the userspace to drop _all_ page cache in order allocate a number of hugetlb pages or any other affected __GFP_RETRY_MAYFAIL requests is simply not reasonable IMHO. -- Michal Hocko SUSE Labs