From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11DF8C432C3 for ; Mon, 25 Nov 2019 18:24:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AA44220674 for ; Mon, 25 Nov 2019 18:24:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA44220674 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 57CB86B0270; Mon, 25 Nov 2019 13:24:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 52D776B0271; Mon, 25 Nov 2019 13:24:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46B176B0272; Mon, 25 Nov 2019 13:24:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id 2E4526B0270 for ; Mon, 25 Nov 2019 13:24:49 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id BF9462C14 for ; Mon, 25 Nov 2019 18:24:48 +0000 (UTC) X-FDA: 76195625856.09.bulb04_29d7661020923 X-HE-Tag: bulb04_29d7661020923 X-Filterd-Recvd-Size: 4323 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 25 Nov 2019 18:24:47 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04427;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0Tj5DfgO_1574706281; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0Tj5DfgO_1574706281) by smtp.aliyun-inc.com(127.0.0.1); Tue, 26 Nov 2019 02:24:44 +0800 Subject: Re: [RFC PATCH] mm: shmem: allow split THP when truncating THP partially To: "Kirill A. Shutemov" Cc: hughd@google.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org References: <1574471132-55639-1-git-send-email-yang.shi@linux.alibaba.com> <20191125093611.hlamtyo4hvefwibi@box> From: Yang Shi Message-ID: <3a35da3a-dff0-a8ca-8269-3018fff8f21b@linux.alibaba.com> Date: Mon, 25 Nov 2019 10:24:38 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20191125093611.hlamtyo4hvefwibi@box> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11/25/19 1:36 AM, Kirill A. Shutemov wrote: > On Sat, Nov 23, 2019 at 09:05:32AM +0800, Yang Shi wrote: >> Currently when truncating shmem file, if the range is partial of THP >> (start or end is in the middle of THP), the pages actually will just g= et >> cleared rather than being freed unless the range cover the whole THP. >> Even though all the subpages are truncated (randomly or sequentially), >> the THP may still be kept in page cache. This might be fine for some >> usecases which prefer preserving THP. >> >> But, when doing balloon inflation in QEMU, QEMU actually does hole pun= ch >> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not use= d. >> So, when using shmem THP as memory backend QEMU inflation actually doe= sn't >> work as expected since it doesn't free memory. But, the inflation >> usecase really needs get the memory freed. Anonymous THP will not get >> freed right away too but it will be freed eventually when all subpages= are >> unmapped, but shmem THP would still stay in page cache. >> >> To protect the usecases which may prefer preserving THP, introduce a >> new fallocate mode: FALLOC_FL_SPLIT_HPAGE, which means spltting THP is >> preferred behavior if truncating partial THP. This mode just makes >> sense to tmpfs for the time being. > We need to clarify interaction with khugepaged. This implementation > doesn't do anything to prevent khugepaged from collapsing the range bac= k > to THP just after the split. Yes, it doesn't. Will clarify this in the commit log. > >> @@ -976,8 +1022,31 @@ static void shmem_undo_range(struct inode *inode= , loff_t lstart, loff_t lend, >> } >> unlock_page(page); >> } >> +rescan_split: >> pagevec_remove_exceptionals(&pvec); >> pagevec_release(&pvec); >> + >> + if (split && PageTransCompound(page)) { >> + /* The THP may get freed under us */ >> + if (!get_page_unless_zero(compound_head(page))) >> + goto rescan_out; >> + >> + lock_page(page); >> + >> + /* >> + * The extra pins from page cache lookup have been >> + * released by pagevec_release(). >> + */ >> + if (!split_huge_page(page)) { >> + unlock_page(page); >> + put_page(page); >> + /* Re-look up page cache from current index */ >> + goto again; >> + } >> + unlock_page(page); >> + put_page(page); >> + } >> +rescan_out: >> index++; >> } > Doing get_page_unless_zero() just after you've dropped the pin for the > page looks very suboptimal. If I don't drop the pins the THP can't be split. And, there might be=20 more than one pins from find_get_entries() if I read the code correctly.=20 For example, truncate 8K length in the middle of THP, the THP's refcount=20 would get bumpped twice since=C2=A0 two sub pages would be returned. >