From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6AE6C0015E for ; Wed, 9 Aug 2023 13:39:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 422616B0071; Wed, 9 Aug 2023 09:39:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D3326B0072; Wed, 9 Aug 2023 09:39:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29B326B0075; Wed, 9 Aug 2023 09:39:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 14F246B0071 for ; Wed, 9 Aug 2023 09:39:18 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E64C61A0C41 for ; Wed, 9 Aug 2023 13:39:17 +0000 (UTC) X-FDA: 81104672754.14.4EE62BB Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf28.hostedemail.com (Postfix) with ESMTP id 23966C0005 for ; Wed, 9 Aug 2023 13:39:14 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691588355; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WldecN6R3cTLDxcj3wMgyUdPXQZRnNjt/80ZLUzl3N8=; b=Tu+JLfGFsYo687LNj4XYQT09XE6WNksQov9GcJADTbpCyxF86fzZKn1izDxv2hzCpSsiaB lP7Oojc48KPComso4x70TsrY4W8wrUQwDg1nRKNe6L07/dWy2qeB4xGOIJFkePg5Ymfm1s H5urXkRHpd0qmtPR7N/Y2aBnt9dnrTY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691588355; a=rsa-sha256; cv=none; b=gI9mDBog8VYKC23Dm4A+4yE019sl5l7Vto/KmNaKutC7jGMJe+hpoNuUCI4T9rqYpx1ZKX lsMe3d0j8nyRLY+ds1/KUWGn2+k57uSJWjC/8w89ytqE2aG2LMHJqUZs15T9oIUSQz39tJ 31TarFGlzQ8wOcOivj5s7IAlbYquDcQ= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6F724D75; Wed, 9 Aug 2023 06:39:56 -0700 (PDT) Received: from [10.57.79.142] (unknown [10.57.79.142]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 83F963F6C4; Wed, 9 Aug 2023 06:39:12 -0700 (PDT) Message-ID: Date: Wed, 9 Aug 2023 14:39:11 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status To: David Hildenbrand , Alistair Popple Cc: Andrew Morton , linux-mm@kvack.org, mhocko@suse.com, jhubbard@nvidia.com, ying.huang@intel.com, osalvador@suse.de, baolin.wang@linux.alibaba.com, ziy@nvidia.com, shy828301@gmail.com References: <20230807063945.911582-1-apopple@nvidia.com> <20230807063945.911582-2-apopple@nvidia.com> <9de42ace-dab1-5f60-af8a-26045ada27b9@arm.com> <87leen2f4y.fsf@nvdebian.thelocal> <460ae27d-987f-ec45-9d37-1566f7c4f576@arm.com> <87fs4sal9n.fsf@nvdebian.thelocal> Content-Language: en-GB From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 23966C0005 X-Rspam-User: X-Stat-Signature: nsqqrrsygotshpxaiw3y3nbd4gtbia47 X-Rspamd-Server: rspam03 X-HE-Tag: 1691588354-955265 X-HE-Meta: U2FsdGVkX1+2Uj7MaXm653DKTD6RVehJsJE9cJJWROnD3QKBT5BrvMIOkCGZCPUxtbgUtqbR+6PJdoIDErcU060txqV0ipEhr2DXPn4lWZ0BtQ9x3L3dm7J/KAJGzFgbtASlXYLVO2Av3zNWCJ398uMYMlTggh9LCg+Ac0yXJ8mv8ok5amWyqY6pXR9YaDATX75NHQOGnA0fIscBBCALiwY23UNnFzzbCjhXEjdC87WAnYPSSHcYOP8xi3NU9FHCQ+MSBTcy7s8d6IoMhOId0vw0Ngqaqd2+ea16lU+L+cXJ3MC6NsG5202IouWhtwLmDLvHoVuZTwrHP3doXt0wu5NJK+lynreMtuSzvuesmVLu+qOfCsukiHr+XfkV2NCfqpfTbEFB02SysefUTf9LmtlDmZRdwPekp45m0dB1bTB0C5Mi8DyWbgrBzmF62zrBGX9GXscyK+PDxVYzH9m/XtJnolDvV4IBCwx6tqpY0+6qDfX4bO2JZnV9ooeMG2TcW5tYrYZmzF8dGR4zFaFLTlrkmk+EAJMte63O6jyEjugVVCwjglNEBZiXwHsZXkt17EmLoMkNGz17N5AX5zbPy28oW/dJlD5KfnLWmNvDqc/WYg3H1sezhqw1LCfFyuRb8ztJypqUpccbxLYqX+dcEHmF/Y+sjpnBIkWgjSgMyCkH/TITxP7lLIuh9TXOfDRUlcn0HjBZhyxGt48LVB84cvLICJg7xr6SNioHmZPzrxXqkYMfQ9+nH998U0R1AlyxO9vpnwT9Ixgg5BLojFccIEj5agnk421Snyu6Gt6t7e0OEH5oJkHT6pr94ihH4VrRRDhbdKGVY9efegJVvKr+qn7+wXBCpdBT8hj9rBKFpJrAy+JB1kpN7CwC00lVWTQOWBPztThBUkMjNi8Vp7Mt6wQLyWAoB89ZWwtCokj84NodifErAQy28r5eoRo6F9uhYJqozdCXGcSiluoqNou H0RdA7Rx cCdquQSA5KPhYtXwGz1cSb6U+sidix+fEJocpU3cBbYa1PrdAt/CfOlYiZKqVH9nHVPJd8QsJNPbVqoPT2JPHryFp63jg9RteSrCAritzG0hXE1T3aYfjyheZ4GREJCtKv9ksU7JkWJ56Zq3jWVKmjxx3ul1rhU1CGQYvaiLbiUeu0Q6YR5rONlKwxcf+oL1FSyUkXdbQQNFYaE8qpQG9e8nj2Qn11hap6JW8bEvl0CMcwInTnQV0vQ9+pbU/27a4n0DsUSzEhdNdgi8LLsdlHL2sVpBiRoArIZI0vLgyuH53lD2mp2ZJDPUknEO07QxNLd0HXb69pcGhBCFWjohM2guYXd5C0MX2zPjic7WrU22x/AS728XhDLmcBvSLwTfM+ZQmP0G1g/jcsG+lsJtrEtk8+GgoUUb+HOm+h2NUktYyLSmesXNvrZi6EC2GtIMN5IsIzZp3pyswvWouNGCdE2iDoqG5b5AuWuRuReSDSrPetxf+xQ8hQ+WXASZ8uvl6NcqGTJmx/m5X1GE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 09/08/2023 10:34, David Hildenbrand wrote: > On 09.08.23 06:21, Alistair Popple wrote: >> >> Ryan Roberts writes: >> >>> On 07/08/2023 13:41, Alistair Popple wrote: >>>> >>>> Ryan Roberts writes: >>>> >>>>> On 07/08/2023 07:39, Alistair Popple wrote: >>>>>> The migration selftest was only checking the return code and not the >>>>>> status array for migration success/failure. Update the test to check >>>>>> both. This uncovered a bug in the return code handling of >>>>>> do_pages_move(). >>>>>> >>>>>> Also disable NUMA balancing as that can lead to unexpected migration >>>>>> failures. >>>>>> >>>>>> Signed-off-by: Alistair Popple >>>>>> Suggested-by: Ryan Roberts >>>>>> --- >>>>>> >>>>>> Ryan, this will still cause the test to fail if a migration failed. I >>>>>> was unable to reproduce a migration failure for any cases on my system >>>>>> once I disabled NUMA balancing though so I'd be curious if you are >>>>>> still seeing failures with this patch applied. AFAIK there shouldn't >>>>>> be anything else that would be causing migration failure so would like >>>>>> to know what is causing failures. Thanks! >>>>> >>>>> >>>>> Hi Alistair, >>>>> >>>>> Afraid I'm still seeing unmigrated pages when running with these 2 patches: >>>>> >>>>> >>>>> #  RUN           migration.shared_anon ... >>>>> Didn't migrate 1 pages >>>>> # migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) >>>>> (-2) == 0 (0) >>>>> # shared_anon: Test terminated by assertion >>>>> #          FAIL  migration.shared_anon >>>>> not ok 2 migration.shared_anon >>>>> >>>>> >>>>> I added some instrumentation; it usually fails on the second time >>>>> through the loop in migrate() but I've also seen it fail the first >>>>> time. Never seen it get though 2 iterations successfully though. >>>> >>>> Interesting. I guess migration failure is always possible for various >>>> reasons so I will update the test to report the number of failed >>>> migrations rather than making it a test failure. >>> >>> I find it odd that migration always succeeds for the private_anon and >>> private_anon_thp cases, but fails for the fork case. I guess there is something >>> about the page being RO (for CoW) or having higher mapcount/refcount that causes >>> the migration to fail? >> >> But the fork case uses a shared mapping, so there shouldn't be any >> RO/CoW AFAIK. A higher refcount than expected would cause migration >> failure, but there's nothing I can think of that would be causing that >> now NUMA balancing is disabled in the test (that caused migration >> failures for me in the private cases due to the concurrent migration >> attempts). > > Yes, we do have MAP_SHARED | MAP_ANONYMOUS, which is just shmem with extra-steps. Yep my bad - not reading the code properly. I assumed it was private because the other one is. > > In add_page_for_migration(), we do have > >     if (page_mapcount(page) > 1 && !migrate_all) > > but the test seems to always specify MPOL_MF_MOVE_ALL when calling move_pages(), > so it should be ignored. > > It would be helpful to dump the page when migration fails in that case. > > > Note that in add_page_for_migration(), we perform a follow_page() that used to > accidentally honor NUMA hinting faults. Fixed by > > commit 3c471b04db7604974e3bea45e2d9c7c27a905536 > Author: David Hildenbrand > Date:   Thu Aug 3 16:32:02 2023 +0200 > >     mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT > > in mm-unstable. This fix does not change the behaviour of the test; it still fails. HOWEVER... It turns out the test has started passing in mm-unstable due to a commit after this one. See bisect: Note terms are inverted: good => test *fails* bad => test *passes* Initial bounds: good: 3f943756e8b3 => commit suggested by DavidH (test still failing there) bad: 84d9f657acae => mm-unstable head as of a few days ago All of these commits are from mm-unstable; none are in v6.5-rc3, which is where I originally reported the issue. git bisect start # bad: [84d9f657acaecc43dc52f25d52230db85fd5ffdd] mm: move vma locking out of vma_prepare and dup_anon_vma git bisect bad 84d9f657acaecc43dc52f25d52230db85fd5ffdd # good: [3f943756e8b3e0b5d0ea1f3087658eb559b0c7b0] mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT git bisect good 3f943756e8b3e0b5d0ea1f3087658eb559b0c7b0 # good: [7dfbad370c66f35789d193a758575d31aabd25a4] selftests/mm: optionally pass duration to transhuge-stress git bisect good 7dfbad370c66f35789d193a758575d31aabd25a4 # bad: [fc99f767390266b5436575c00445d4445f6c9be6] mips: convert various functions to use ptdescs git bisect bad fc99f767390266b5436575c00445d4445f6c9be6 # bad: [af89742c0bf319979e00cfb066ead6b510f3e296] powerpc/book3s64/vmemmap: switch radix to use a different vmemmap handling function git bisect bad af89742c0bf319979e00cfb066ead6b510f3e296 # good: [dfebce290a7b44985eda4ddd76378cdc82e3541c] maple_tree: adjust node allocation on mas_rebalance() git bisect good dfebce290a7b44985eda4ddd76378cdc82e3541c # good: [9517da22a74a49102bcd6c8556eeceaca965b0a6] mm: move FAULT_FLAG_VMA_LOCK check down from do_fault() git bisect good 9517da22a74a49102bcd6c8556eeceaca965b0a6 # bad: [5ec6b3644e50d859ebf4cba5cc29cfbda0e6d109] mm: change pudp_huge_get_and_clear_full take vm_area_struct as arg git bisect bad 5ec6b3644e50d859ebf4cba5cc29cfbda0e6d109 # bad: [bbecc62bae72ec22e4276464a5ef359511923954] mm: handle faults that merely update the accessed bit under the VMA lock git bisect bad bbecc62bae72ec22e4276464a5ef359511923954 # bad: [2132f14c5bc1b10ea964ab89bd6291fc05eaccaa] mm: handle swap and NUMA PTE faults under the VMA lock git bisect bad 2132f14c5bc1b10ea964ab89bd6291fc05eaccaa # bad: [8890e186b3470fc690d3022656e98c0c41e27eca] mm: run the fault-around code under the VMA lock git bisect bad 8890e186b3470fc690d3022656e98c0c41e27eca # first bad commit: [8890e186b3470fc690d3022656e98c0c41e27eca] mm: run the fault-around code under the VMA lock First commit where test passes: commit 8890e186b3470fc690d3022656e98c0c41e27eca Author: Matthew Wilcox (Oracle) Date: Mon Jul 24 19:54:08 2023 +0100 mm: run the fault-around code under the VMA lock The map_pages fs method should be safe to run under the VMA lock instead of the mmap lock. This should have a measurable reduction in contention on the mmap lock. Link: https://lkml.kernel.org/r/20230724185410.1124082-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Suren Baghdasaryan Cc: Arjun Roy Cc: Eric Dumazet Cc: Punit Agrawal Signed-off-by: Andrew Morton mm/memory.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) It's not really clear to me why this change should turn the fail into a pass though... Perhaps migration tries to get the mmap_lock and if it fails, then it skips migration? > > > Maybe that fix does no longer require you to disable NUMA hinting for this test > case. Maybe you're lucky and it resolves the  shared_anon case as well, but I > somewhat doubt it :) > > It doesn't really explain why the shared case would fail, though... >