From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI,SPF_PASS, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90874C10F14 for ; Tue, 23 Apr 2019 07:19:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 436DA2077C for ; Tue, 23 Apr 2019 07:19:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 436DA2077C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C53956B0003; Tue, 23 Apr 2019 03:19:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C03426B0006; Tue, 23 Apr 2019 03:19:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACBB26B0007; Tue, 23 Apr 2019 03:19:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 5BFB86B0003 for ; Tue, 23 Apr 2019 03:19:56 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id f7so2266596edi.20 for ; Tue, 23 Apr 2019 00:19:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:message-id:references:mime-version:content-disposition :in-reply-to:user-agent; bh=42ugvElMVoGqu/m8MSC5tnYu4AH/sC8WY74C7nB6B6o=; b=N62Cv9/fBkZRyf+5w4wSBxV/r1c6mhYT9/xSukItnII5T0AVDwmZyUIMfyTh82fFYm MXl4zQpgsFJ7fXyLxExwAkSZrw1xC5ar7KRYsW9kGCjTiH9ViC1tkYKglupRMJ+roUDj BPAn01yoqNGKl/Bs/CNW2bsCHfY0t9Dzaj8dHXBT7x9IX3+EU6DKMoRuVa0CIkOaQ+Zr Rm8V5CWYOZHiQs+X4euSUzVf6T3aHYISJ41ez3019QRD1d96oRyUXtW5cdbsnwo8T+r+ EEVEN5vHO9VfCfmC5PP1JlI/Oo3oxUeIh1dbJ6phvE77aPBAanj0dQ8HWY6Ic677Jm0s dI9g== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: APjAAAXyB1U5Q+ibEGcq9z7ul0b5IAYZ/366R3At+WamVzo4MRyBgKwZ UOt67zpC9vTMe9XIPHnrabH3g8FD298Hdn2Bzk6yFpi4gKDMcVsoPU1xIIot1xRPTIDn3AJBhV+ YQh5sTaMQn4vAeFNRDY99PEBY1VRa5yHLYly0SqfGKU81j1bWzK+8wmApg72XeWQ= X-Received: by 2002:aa7:c510:: with SMTP id o16mr14240382edq.277.1556003995917; Tue, 23 Apr 2019 00:19:55 -0700 (PDT) X-Google-Smtp-Source: APXvYqwPlIiKM71vRxLsozXqh05lMMwY0uFZpbcRCSFZf77KvYd2qeQtyz/5b1BpbcqYk9kIT9G/ X-Received: by 2002:aa7:c510:: with SMTP id o16mr14240341edq.277.1556003995096; Tue, 23 Apr 2019 00:19:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556003995; cv=none; d=google.com; s=arc-20160816; b=N2zZ1vpVkPHRZnrRGt4vONXChz2pQbrNFCSRVzK4h/bGmkeABe7hAcZMwMhOjztbFQ TTYj0S4IL0zDgO/UCkvaKwUGAWrIGMQtQhxuRsHSTof7iWT8qFk1e2WOKyGl4jixHQnz NhKAMnhC68+l37Tw7PoR0dSvp2qUwE33MECYbhZ2OsG/F1pyUObWZNd+PZpAbwSaiZN7 pVuaa75xGJhb49Fno2frzuqn4bQcl5M89EAqbicxGGsQr85cRSQcE8lljO0GVnj5QAl5 BgHKnK+rOOdN1uFCHlHswRpAaJ0BzqjXTsl5/lEy2pSeK2+A3c27wGEnoVggVUrlvjzc X/2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date; bh=42ugvElMVoGqu/m8MSC5tnYu4AH/sC8WY74C7nB6B6o=; b=ud6C6731IhvcLTMcSHYRknJBq3C7bNo2bN2C6XHx7I6XfFQB/vVY9fOQ+F0JzTGLoZ U3MvMpSgfcRYpDnn8xk//OVtQ4tc4uqE5O7MA+e5k8jxudkXyIengvbHNBBoyLGxusI9 vXU4AFQlCY33SWjM7rhUx/NMUD/MpOuA3tq+IFEQPB9ykb/Ojg+UzS/0YKhY+Admb0ZO 25LPfqkKqGwC3xSRatqL2kUOdlU7LYY8vn5QrAhIrVhTehS8GqXkBDgTXm/NywVZ/3Mn Oid1l7nJ/tl8KpwRdK5TdNMHOCd8VQgEFlxbMZC+mjD8oAZp4rvHDDkikZKPJxUd5pH/ 3AWw== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id r5si935438edy.227.2019.04.23.00.19.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Apr 2019 00:19:55 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8EE12AD1A; Tue, 23 Apr 2019 07:19:54 +0000 (UTC) Date: Tue, 23 Apr 2019 09:19:53 +0200 From: Michal Hocko To: Mike Kravetz Cc: "linux-mm@kvack.org" , linux-kernel , Andrea Arcangeli , Mel Gorman , Vlastimil Babka , Johannes Weiner Subject: Re: [Question] Should direct reclaim time be bounded? Message-ID: <20190423071953.GC25106@dhcp22.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 22-04-19 21:07:28, Mike Kravetz wrote: [...] > However, consider the case of a 2 node system where: > node 0 has 2GB memory > node 1 has 4GB memory > > Now, if one wants to allocate 4GB of huge pages they may be tempted to simply, > "echo 2048 > nr_hugepages". At first this will go well until node 0 is out > of memory. When this happens, alloc_pool_huge_page() will continue to be > called. Because of that for_each_node_mask_to_alloc() macro, it will likely > attempt to first allocate a page from node 0. It will call direct reclaim and > compaction until it fails. Then, it will successfully allocate from node 1. Yeah, the even distribution is quite a strong statement. We just try to distribute somehow and it is likely to not work really great on system with nodes that are different in size. I know it sucks but I've been recommending to use the /sys/devices/system/node/node$N/hugepages/hugepages-2048kB/nr_hugepages because that allows the define the actual policy much better. I guess we want to be more specific about this in the documentation at least. > In our distro kernel, I am thinking about making allocations try "less hard" > on nodes where we start to see failures. less hard == NORETRY/NORECLAIM. > I was going to try something like this on an upstream kernel when I noticed > that it seems like direct reclaim may never end/exit. It 'may' exit, but I > instrumented __alloc_pages_slowpath() and saw it take well over an hour > before I 'tricked' it into exiting. > > [ 5916.248341] hpage_slow_alloc: jiffies 5295742 tries 2 node 0 success > [ 5916.249271] reclaim 5295741 compact 1 This is unexpected though. What does tries mean? Number of reclaim attempts? If yes could you enable tracing to see what takes so long in the reclaim path? > This is where it stalled after "echo 4096 > nr_hugepages" on a little VM > with 8GB total memory. > > I have not started looking at the direct reclaim code to see exactly where > we may be stuck, or trying really hard. My question is, "Is this expected > or should direct reclaim be somewhat bounded?" With __alloc_pages_slowpath > getting 'stuck' in direct reclaim, the documented behavior for huge page > allocation is not going to happen. Well, our "how hard to try for hugetlb pages" is quite arbitrary. We used to rety as long as at least order worth of pages have been reclaimed but that didn't make any sense since the lumpy reclaim was gone. So the semantic has change to reclaim&compact as long as there is some progress. From what I understad above it seems that you are not thrashing and calling reclaim again and again but rather one reclaim round takes ages. That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It looks like there is something wrong in the reclaim going on. -- Michal Hocko SUSE Labs