From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E96DFC433FE for ; Thu, 20 Oct 2022 07:15:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81DDE6B0071; Thu, 20 Oct 2022 03:15:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CEA26B0073; Thu, 20 Oct 2022 03:15:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BD3D6B0074; Thu, 20 Oct 2022 03:15:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5F87F6B0071 for ; Thu, 20 Oct 2022 03:15:34 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 32EA6160D61 for ; Thu, 20 Oct 2022 07:15:34 +0000 (UTC) X-FDA: 80040467388.20.910E10A Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by imf08.hostedemail.com (Postfix) with ESMTP id 62373160039 for ; Thu, 20 Oct 2022 07:15:32 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R851e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0VSe4JAy_1666250126; Received: from 30.97.48.62(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VSe4JAy_1666250126) by smtp.aliyun-inc.com; Thu, 20 Oct 2022 15:15:28 +0800 Message-ID: <70610ea1-5932-a19f-5eba-c4fba06335da@linux.alibaba.com> Date: Thu, 20 Oct 2022 15:15:26 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Subject: Re: [RFC PATCH] mm: Introduce new MADV_NOMOVABLE behavior To: David Hildenbrand , akpm@linux-foundation.org Cc: arnd@arndb.de, jingshan@linux.alibaba.com, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org References: <6227ba4c-9455-9652-7434-7842b2b3edcb@redhat.com> <8007f4fc-d2e6-7aae-7297-805326adce2a@linux.alibaba.com> <470dc638-a300-f261-94b4-e27250e42f96@redhat.com> From: Baolin Wang In-Reply-To: <470dc638-a300-f261-94b4-e27250e42f96@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666250133; a=rsa-sha256; cv=none; b=VBDi8usXD3eFY0rQobsL6RR3gLrnIBLxgbkIzPEo1pEdqWHJQF3m1ICGlfb6r6jrQxDy2+ WmYNHsPD/6VENQ7MJw/5nYCaNV6E8jAdKrEwRiC/ETxZbjgb6Da5LtJlYhQcZlAUqG3ZsR xrKLRY5/ThExVz/c0yk7x+WWZWvzf9E= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf08.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.42 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666250133; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bJMWMMdgwRaXHTxzAD8aesZ6DfKntYodTRxQr1tq5NM=; b=AFRvU3RSvDqe5vjRylxw/n/xyhEfKTTAzt2gw1u2WGn6TWxhGwN2osFY0ewOv5uIW0MInG mI6yolapjF0qwI1XkomydyAw+Um8hvL4+EuZSLqxe82FvCx69/AZNhuVNlGasFAgyw/mfj L3X5y+boX4dGK4iQvZpUcdSb5WPNunE= X-Stat-Signature: xwr6wqc67kczeie39fkng9s3tqtujkju X-Rspamd-Queue-Id: 62373160039 X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf08.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.42 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com X-Rspamd-Server: rspam11 X-HE-Tag: 1666250132-527883 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/19/2022 11:17 PM, David Hildenbrand wrote: >> I observed one migration failure case (which is not easy to reproduce) >> is that, the 'thp_migration_fail' count is 1 and the >> 'thp_split_page_failed' count is also 1. >> >> That means when migrating a THP which is in CMA area, but can not >> allocate a new THP due to memory fragmentation, so it will split the >> THP. However THP split is also failed, probably the reason is temporary >> reference count of this THP. And the temporary reference count can be >> caused by dropping page caches (I observed the drop caches operation in >> the system), but we can not drop the shmem page caches due to they are >> already dirty at that time. >> >> So we can try again in migrate_pages() if THP split is failed to >> mitigate the failure of migration, especially for the failure reason is >> temporary reference count? Does this sound reasonable for you? > > It sound reasonable, and I understand that debugging these issues is > tricky. But we really have to figure out the root cause to make these > pages that are indeed movable (but only temporarily not movable for > reason XYZ) movable. > > We'd need some indication to retry migration longer / again. OK. Let me try this and see if there are other possible failure cases in the products. >> >> However I still worried there are other possible cases to cause >> migration failure, so no CMA allocation for our case seems more stable >> IMO. > > Yes, I can understand that. But as one example, you're approach doesn't > handle the case that a page that was allocated on !CMA/!ZONE_MOVABLE > would get migrated to CMA/ZONE_MOVABLE just before you would try pinning > the page (to migrate it again off CMA/ZONE_MOVABLE). Indeed, like you said before, just helpful to minimize page migration now. Maybe I can take MADV_PINNABLE into considering when allocating new pages, such as alloc_migration_target(). Anyway let me try to fix the root cause first to see if it can solve our problem. > We really have to fix the root cause. OK. Thanks for your input.