From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45B09C4361B for ; Mon, 14 Dec 2020 22:52:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9E116221EF for ; Mon, 14 Dec 2020 22:52:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9E116221EF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E680F6B0036; Mon, 14 Dec 2020 17:52:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DF06A6B005D; Mon, 14 Dec 2020 17:52:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDEF86B0068; Mon, 14 Dec 2020 17:52:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0059.hostedemail.com [216.40.44.59]) by kanga.kvack.org (Postfix) with ESMTP id B4E236B0036 for ; Mon, 14 Dec 2020 17:52:57 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6DE84181AEF0B for ; Mon, 14 Dec 2020 22:52:57 +0000 (UTC) X-FDA: 77593389594.09.paste39_5d057662741e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 50EC9180AD837 for ; Mon, 14 Dec 2020 22:52:57 +0000 (UTC) X-HE-Tag: paste39_5d057662741e X-Filterd-Recvd-Size: 6134 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Mon, 14 Dec 2020 22:52:56 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E343BAC7F; Mon, 14 Dec 2020 22:52:54 +0000 (UTC) To: Hugh Dickins , Andrew Morton Cc: Rik van Riel , xuyu@linux.alibaba.com, mgorman@suse.de, aarcange@redhat.com, willy@infradead.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org, mhocko@suse.com References: <20201124194925.623931-1-riel@surriel.com> From: Vlastimil Babka Subject: Re: [PATCH v6 0/3] mm,thp,shm: limit shmem THP alloc gfp_mask Message-ID: Date: Mon, 14 Dec 2020 23:52:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/14/20 10:16 PM, Hugh Dickins wrote: > On Tue, 24 Nov 2020, Rik van Riel wrote: >=20 >> The allocation flags of anonymous transparent huge pages can be contro= lled >> through the files in /sys/kernel/mm/transparent_hugepage/defrag, which= can >> help the system from getting bogged down in the page reclaim and compa= ction >> code when many THPs are getting allocated simultaneously. >>=20 >> However, the gfp_mask for shmem THP allocations were not limited by th= ose >> configuration settings, and some workloads ended up with all CPUs stuc= k >> on the LRU lock in the page reclaim code, trying to allocate dozens of >> THPs simultaneously. >>=20 >> This patch applies the same configurated limitation of THPs to shmem >> hugepage allocations, to prevent that from happening. >>=20 >> This way a THP defrag setting of "never" or "defer+madvise" will resul= t >> in quick allocation failures without direct reclaim when no 2MB free >> pages are available. >>=20 >> With this patch applied, THP allocations for tmpfs will be a little >> more aggressive than today for files mmapped with MADV_HUGEPAGE, >> and a little less aggressive for files that are not mmapped or >> mapped without that flag. >>=20 >> v6: make khugepaged actually obey tmpfs mount flags >> v5: reduce gfp mask further if needed, to accomodate i915 (Matthew Wil= cox) >> v4: rename alloc_hugepage_direct_gfpmask to vma_thp_gfp_mask (Matthew = Wilcox) >> v3: fix NULL vma issue spotted by Hugh Dickins & tested >> v2: move gfp calculation to shmem_getpage_gfp as suggested by Yu Xu >=20 > Andrew, please don't rush >=20 > mmthpshmem-limit-shmem-thp-alloc-gfp_mask.patch > mmthpshm-limit-gfp-mask-to-no-more-than-specified.patch > mmthpshmem-make-khugepaged-obey-tmpfs-mount-flags.patch >=20 > to Linus in your first wave of mmotm->5.11 sendings. > Or, alternatively, go ahead and send them to Linus, but > be aware that I'm fairly likely to want adjustments later. >=20 > Sorry for limping along so far behind, but I still have more > re-reading of the threads to do, and I'm still investigating > why tmpfs huge=3Dalways becomes so ineffective in my testing with > these changes, even if I ramp up from default defrag=3Dmadvise to > defrag=3Dalways: > 5.10 mmotm > thp_file_alloc 4641788 216027 > thp_file_fallback 275339 8895647 So AFAICS before the patch shmem allocated hugepages basically with: mapping_gfp_mask(inode->i_mapping) | __GFP_COMP | __GFP_NORETRY | __GFP_= NOWARN where mapping_gfp_mask() should be the default GFP_HIGHUSER_MOVABLE unles= s I missed some shmem-specific override of the mask. So the important flags mean all zones avilable, both __GFP_DIRECT_RECLAIM= and __GFP_KSWAPD_RECLAIM, but also __GFP_NORETRY which makes it less aggressi= ve. Now, with defrag=3Dmadvise and without madvised vma, there's just GFP_TRANSHUGE_LIGHT, which means no __GFP_DIRECT_RECLAIM (and no __GFP_KSWAPD_RECLAIM). Thus no reclaim and compaction at all. Indeed "lit= tle less aggressive" is an understatement. On the other hand, with defrag=3Dalways and again without madvised vma th= ere should be GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM | __GFP_NORETRY, so compared to "before the patch" this is only missing __GFP_KSWAPD_RECLAIM.= I would be surprised if this meant so much difference in your testing as yo= u show above - I think you would have to be allocating those THPs just at a rate= where kswapd+kcompactd can keep up and nothing else "steals" the pages that bac= kground reclaim+compaction creates. In that (subjectively unlikely) case, I think significant improvement sho= uld be visible with defrag=3Ddefer over defrag=3Dmadvise. > I've been looking into it off and on for weeks (gfp_mask wrangling is > not my favourite task! so tend to find higher priorities to divert me); > hoped to arrive at a conclusion before merge window, but still have > nothing constructive to say yet, hence my silence so far. >=20 > Above's "a little less aggressive" appears understatement at present. > I respect what Rik is trying to achieve here, and I may end up > concluding that there's nothing better to be done than what he has. > My kind of hugepage-thrashing-in-low-memory may be so remote from > normal usage, and could be skirting the latency horrors we all want > to avoid: but I haven't reached that conclusion yet - the disparity > in effectiveness still deserves more investigation. >=20 > (There's also a specific issue with the gfp_mask limiting: I have > not yet reviewed the allowing and denying in detail, but it looks > like it does not respect the caller's GFP_ZONEMASK - the gfp in > shmem_getpage_gfp() and shmem_read_mapping_page_gfp() is there to > satisfy the gma500, which wanted to use shmem but could only manage > DMA32. I doubt it wants THPS, but shmem_enabled=3Dforce forces them.) >=20 > Thanks, > Hugh >=20