From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 993A9C433F5 for ; Fri, 29 Oct 2021 12:07:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 15016610EA for ; Fri, 29 Oct 2021 12:07:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 15016610EA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 6AC0F6B0071; Fri, 29 Oct 2021 08:07:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 65CC66B0072; Fri, 29 Oct 2021 08:07:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FC706B0073; Fri, 29 Oct 2021 08:07:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0149.hostedemail.com [216.40.44.149]) by kanga.kvack.org (Postfix) with ESMTP id 291576B0071 for ; Fri, 29 Oct 2021 08:07:44 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9F5D98249980 for ; Fri, 29 Oct 2021 12:07:43 +0000 (UTC) X-FDA: 78749350806.18.24E168F Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by imf11.hostedemail.com (Postfix) with ESMTP id 8F94EF0000B4 for ; Fri, 29 Oct 2021 12:07:41 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=ningzhang@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0Uu942yo_1635509255; Received: from ali-074845.local(mailfrom:ningzhang@linux.alibaba.com fp:SMTPD_---0Uu942yo_1635509255) by smtp.aliyun-inc.com(127.0.0.1); Fri, 29 Oct 2021 20:07:36 +0800 Subject: Re: [RFC 0/6] Reclaim zero subpages of thp to avoid memory bloat To: "Kirill A. Shutemov" Cc: linux-mm@kvack.org, Andrew Morton , Johannes Weiner , Michal Hocko , Vladimir Davydov , Yu Zhao , Gang Deng References: <1635422215-99394-1-git-send-email-ningzhang@linux.alibaba.com> <20211028141333.kgcjgsnrrjuq4hjx@box.shutemov.name> From: ning zhang Message-ID: Date: Fri, 29 Oct 2021 20:07:35 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <20211028141333.kgcjgsnrrjuq4hjx@box.shutemov.name> Content-Type: text/plain; charset=UTF-8; format=flowed Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf11.hostedemail.com: domain of ningzhang@linux.alibaba.com designates 115.124.30.42 as permitted sender) smtp.mailfrom=ningzhang@linux.alibaba.com X-Stat-Signature: xa46yyeoidhi7hcbd6tkrhwizuiruuam X-Rspamd-Queue-Id: 8F94EF0000B4 X-Rspamd-Server: rspam01 X-HE-Tag: 1635509261-936317 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: =E5=9C=A8 2021/10/28 =E4=B8=8B=E5=8D=8810:13, Kirill A. Shutemov =E5=86=99= =E9=81=93: > On Thu, Oct 28, 2021 at 07:56:49PM +0800, Ning Zhang wrote: >> As we know, thp may lead to memory bloat which may cause OOM. >> Through testing with some apps, we found that the reason of >> memory bloat is a huge page may contain some zero subpages >> (may accessed or not). And we found that most zero subpages >> are centralized in a few huge pages. >> >> Following is a text_classification_rnn case for tensorflow: >> >> zero_subpages huge_pages waste >> [ 0, 1) 186 0.00% >> [ 1, 2) 23 0.01% >> [ 2, 4) 36 0.02% >> [ 4, 8) 67 0.08% >> [ 8, 16) 80 0.23% >> [ 16, 32) 109 0.61% >> [ 32, 64) 44 0.49% >> [ 64, 128) 12 0.30% >> [ 128, 256) 28 1.54% >> [ 256, 513) 159 18.03% >> >> In the case, there are 187 huge pages (25% of the total huge pages) >> which contain more then 128 zero subpages. And these huge pages >> lead to 19.57% waste of the total rss. It means we can reclaim >> 19.57% memory by splitting the 187 huge pages and reclaiming the >> zero subpages. >> >> This patchset introduce a new mechanism to split the huge page >> which has zero subpages and reclaim these zero subpages. >> >> We add the anonymous huge page to a list to reduce the cost of >> finding the huge page. When the memory reclaim is triggering, >> the list will be walked and the huge page contains enough zero >> subpages may be reclaimed. Meanwhile, replace the zero subpages >> by ZERO_PAGE(0). > Does it actually help your workload? > > I mean this will only be triggered via vmscan that was going to split > pages and free anyway. > > You prioritize splitting THP and freeing zero subpages over reclaiming > other pages. It may or may not be right thing to do, depending on > workload. > > Maybe it makes more sense to check for all-zero pages just after > split_huge_page_to_list() in vmscan and free such pages immediately rat= her > then add all this complexity? > The purpose of zero subpages reclaim(ZSR) is to pick out the huge pages=20 which have waste and reclaim them. We do this for two reasons: 1. If swap is off, anonymous pages will not be scanned, and we don't=20 have the =C2=A0=C2=A0 opportunity=C2=A0 to split the huge page. ZSR can be helpfu= l for this. 2. If swap is on, splitting first will not only split the huge page, but=20 also =C2=A0=C2=A0 swap out the nonzero subpages, while ZSR will only split th= e huge page. =C2=A0=C2=A0 Splitting first will result to more performance degradation= . If ZSR=20 can't =C2=A0=C2=A0 reclaim enough pages, swap can still work. Why use a seperate ZSR list instead of the default LRU list? Because it may cause high CPU overhead to scan for target huge pages if=20 there both exist a lot of regular and huge pages. And it maybe especially=20 terrible when swap is off, we may scan the whole LRU list many times. A huge page=20 will be deleted from ZSR list when it was scanned, so the page will be=20 scanned only once. It's hard to use LRU list, because it may add new pages into LRU li= st continuously when scanning. Also, we can decrease the priority to prioritize reclaiming file-backed=20 page. For example, only triggerring ZSR when the priority is less than 4. >> Yu Zhao has done some similar work when the huge page is swap out >> or migrated to accelerate[1]. While we do this in the normal memory >> shrink path for the swapoff scene to avoid OOM. >> >> In the future, we will do the proactive reclaim to reclaim the "cold" >> huge page proactively. This is for keeping the performance of thp as >> for as possible. In addition to that, some users want the memory usage >> using thp is equal to the usage using 4K. > Proactive reclaim can be harmful if your max_ptes_none allows to recrea= te > THP back. Thanks! We will consider it. >