From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D58C8C27C79 for ; Mon, 17 Jun 2024 12:47:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D3636B018D; Mon, 17 Jun 2024 08:47:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 482426B018E; Mon, 17 Jun 2024 08:47:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34A0D6B018F; Mon, 17 Jun 2024 08:47:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 171F66B018D for ; Mon, 17 Jun 2024 08:47:25 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9FD7F41525 for ; Mon, 17 Jun 2024 12:47:24 +0000 (UTC) X-FDA: 82240356408.16.350FC9E Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.9]) by imf10.hostedemail.com (Postfix) with ESMTP id 8B9E1C0004 for ; Mon, 17 Jun 2024 12:47:21 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=MHCsWtP1; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf10.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718628437; a=rsa-sha256; cv=none; b=2IzT7Gsw8qlZbVUSqB/y3ynR5YJVOKNmcuL8C4Fpkqdsw82kDKCsPUcy1M2FaIgpVrjA5D fR4DDFb6AWXWSzjcwLbqSNIOc1QMaNb1HK3QsfDwWwbhUeL+xwi+97JX4PEvDMTl3OxX3G pwa79KPA9eTM5NzfMMm2U3yxll4ElnE= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=MHCsWtP1; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf10.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718628437; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R5LyUIEwiLclBT3CxY82I/xcMzkMOsc/hxhcloLbi+A=; b=7vKuWg9n1FIiAcidS6D/Ewj7OAEOZFk3/HCU7LNIPBEpFp7mAS+9hDLZLDFE7YhYZw1BVN kVc3tUQ0hnM1u4dBLkdwJpgL0WIWo67XG9fFILYfesA8QBnBdzbdIsVzHmTJvYHWHLAFA+ 5qAbg8kkhDddqmTQ4hNm2ewX+yAB1eA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Subject:From:Message-ID:Date:MIME-Version: Content-Type; bh=R5LyUIEwiLclBT3CxY82I/xcMzkMOsc/hxhcloLbi+A=; b=MHCsWtP1NYRYx+dgG3/2KP9MupkwZOHnrbkr68B1DKsIPv01eeSPrrJFkzq/zX qxkb8zJOzLevCpk5zKfFp1nuSoQLS41Kp+O97Eu/skgrPQ26Knw7gYcNAt+BBuB7 S4LxY7HC/0chsM+NeAM8s4HVgTNeCHus2TFwNeNkRaOoE= Received: from [172.21.21.216] (unknown [118.242.3.34]) by gzga-smtp-mta-g0-5 (Coremail) with SMTP id _____wD3X7VNMHBmBbycBg--.64029S2; Mon, 17 Jun 2024 20:47:11 +0800 (CST) Subject: Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating non-CMA THP-sized page To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn References: <1717492460-19457-1-git-send-email-yangge1116@126.com> From: yangge1116 Message-ID: <2e3a3a3f-737c-ed01-f820-87efee0adc93@126.com> Date: Mon, 17 Jun 2024 20:47:09 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wD3X7VNMHBmBbycBg--.64029S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxAF4kJw4rury3WFW8XrWrXwb_yoWrJw1rpF WfG3Waka15Xry7C3Z2y3Z0kryF9w1rKrWUGr1rZr18uwnxCFyIy397tFn5XFyrAry5Ja10 qFWjq3ZxCF4qyaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07bjiSLUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiWQMBG2VLaxlLXQAAsK X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 8B9E1C0004 X-Stat-Signature: jy85rapdfdk4d47w38i8e1xk5nff4r7q X-Rspam-User: X-HE-Tag: 1718628441-72168 X-HE-Meta: U2FsdGVkX19vwlb2e2ikD2d5UZSfxWi4YUoWgYNGjzt4PHS0MVx7Kbe2dZjQi8i5+Eb15sZ65NiKjqX6sZtVT+Ttpxv05sOQ0A2ctHS+DIKk7gjRf3M+TdSPEjfwF9ezp4qWQ059ESbt9PpaUt4tLDDEJnEhUiTER5pnSsFPREOPpqFctIruIvr625k7bK30izhSAeuPEjOAvqxiFfkqjDGNCpolbpfTwUhCnmFvM/njddDtVn5nhAmXbcuT1+TcFjyofycs58pIJTIWfv0/3UPFaRpEybroPraXT5b3ZQtmNLl1MuPzvg66OIDU4ZRG0CfbmhHaV/b0tnxowNllkRq39TUE+E+phnSE/usTXNeWZ2+Dr9YBIMko7sS0Q0Ct5i5V1CvTpyBQL4THP9CqHeV56bWFc11x9Ur3VO6lzh3uCG2Whgq9gnClviKtJ9nA92HyXsueEw/QStcejOGwnvWZTtZIDd3Tqmr2bAaHwfY1WW7U81cVua/5VwlM4Z0hKBuaKxLpWvcQaxGH90w46rNUlKDJDSamTaiSatoddw8jV/WHJSc3FkqOjQU3Bk1cgbnU9wNOCqzc+m/e82iY62k/yDcGVeo0577dIM8XYO9sSPCvBN8gkF6evMl4BqDtR5CoFs5wqja26QgBfX+voh1W/Tr5R1DK/v6+vQdKAhibC0599hJQAixH2BOvnsgrgAHeqsXSu/YZ7GT2UsBOxhl4u8NI96jiooJR4414PFsfTrEv+0gTEB0xFlJ4jk8g7KAjBHNHaVHoe6tE2urUhk2ZC+QQMLCeWn1qbchelLj4asDTDflPmFvMyIaNqJPJVvcfmH5yYtRVE72YJ/cdZ7+0gyt/RlabEBc3cmE2j7Ur6RFSbIA8DZxj4Dqqqyhkiyy3WCXLPYkBrt5zk+YGT+qlDc9QuffWE1yzVp1jCUyiEkH/+xmP5sqqZDJdt5lZ4rp0xGAFG3hIkkox+Uj bnnbPWr4 aTH2GeCDmSxdIVkYrPHhJKtcpBH3JTPqnrxsXV5ShqMgoXMRo6+5hTTKIsz/YHHRBNHjazb3WxAoDxvFqvXGjFdigz9XoraB/VyhzLLbvWxX7OIMu46a9JfjPH2HwIIPuUXdC6iQFNN5EePjb3YBEoe+hNoKAYNI+H/Dhoym0zeD8TbXRZlLd2LcnRQRqyijqcabO0HUpOPURseFq5g+zJOQmMPINc3xn4ezuYLUsHm74OhFO7CD5mlDBNQB2EN8alWsz8oO6EOICBMDB3telHLrXgFREoJPMpkIZkt7Eq2Z50dgIwtvFvWzcCRdLQedkt204JtvqQj9lBe4OnBujQpJ2vejAxYDGbj2NFTMySf6uOeCbYuYWoScntLhLI00QfpXi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/6/17 下午6:26, Barry Song 写道: > On Tue, Jun 4, 2024 at 9:15 PM wrote: >> >> From: yangge >> >> Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for >> THP-sized allocations") no longer differentiates the migration type >> of pages in THP-sized PCP list, it's possible to get a CMA page from >> the list, in some cases, it's not acceptable, for example, allocating >> a non-CMA page with PF_MEMALLOC_PIN flag returns a CMA page. >> >> The patch forbids allocating non-CMA THP-sized page from THP-sized >> PCP list to avoid the issue above. > > Could you please describe the impact on users in the commit log? If a large number of CMA memory are configured in the system (for example, the CMA memory accounts for 50% of the system memory), starting virtual machine with device passthrough will get stuck. During starting virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. If a page is in CMA area, pin_user_pages_remote() will migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM flag. If non-movable allocation requests return CMA memory, pin_user_pages_remote() will enter endless loops. backtrace: pin_user_pages_remote ----__gup_longterm_locked //cause endless loops in this function --------__get_user_pages_locked --------check_and_migrate_movable_pages //always check fail and continue to migrate ------------migrate_longterm_unpinnable_pages ----------------alloc_migration_target // non-movable allocation > Is it possible that some CMA memory might be used by non-movable > allocation requests? Yes. > If so, will CMA somehow become unable to migrate, causing cma_alloc() to fail? No, it will cause endless loops in __gup_longterm_locked(). If non-movable allocation requests return CMA memory, migrate_longterm_unpinnable_pages() will migrate a CMA page to another CMA page, which is useless and cause endless loops in __gup_longterm_locked(). backtrace: pin_user_pages_remote ----__gup_longterm_locked //cause endless loops in this function --------__get_user_pages_locked --------check_and_migrate_movable_pages //always check fail and continue to migrate ------------migrate_longterm_unpinnable_pages >> >> Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations") >> Signed-off-by: yangge >> --- >> mm/page_alloc.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 2e22ce5..0bdf471 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2987,10 +2987,20 @@ struct page *rmqueue(struct zone *preferred_zone, >> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); >> >> if (likely(pcp_allowed_order(order))) { >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA || >> + order != HPAGE_PMD_ORDER) { >> + page = rmqueue_pcplist(preferred_zone, zone, order, >> + migratetype, alloc_flags); >> + if (likely(page)) >> + goto out; >> + } > > This seems not ideal, because non-CMA THP gets no chance to use PCP. But it > still seems better than causing the failure of CMA allocation. > > Is there a possible approach to avoiding adding CMA THP into pcp from the first > beginning? Otherwise, we might need a separate PCP for CMA. > >> +#else >> page = rmqueue_pcplist(preferred_zone, zone, order, >> migratetype, alloc_flags); >> if (likely(page)) >> goto out; >> +#endif >> } >> >> page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags, >> -- >> 2.7.4 > > Thanks > Barry >