From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0786AC4707C for ; Fri, 12 Jan 2024 06:16:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 840B86B0092; Fri, 12 Jan 2024 01:16:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F0EC6B0093; Fri, 12 Jan 2024 01:16:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6DF486B0095; Fri, 12 Jan 2024 01:16:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5DCA66B0092 for ; Fri, 12 Jan 2024 01:16:35 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2D713120152 for ; Fri, 12 Jan 2024 06:16:35 +0000 (UTC) X-FDA: 81669649950.25.28BBCAF Received: from madrid.collaboradmins.com (madrid.collaboradmins.com [46.235.227.194]) by imf05.hostedemail.com (Postfix) with ESMTP id 3B20910001D for ; Fri, 12 Jan 2024 06:16:32 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=5QK0yV41; spf=pass (imf05.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.194 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705040193; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wH7GNgz3oAqqQIQ1YMQhFQGSC42+CAjS/Ur24WT0Ito=; b=7HXUaSeLs6eFShO7VB+WysoFMGWEDYA3+rFOK49S/0v++YZXyixU/AuzOFdqfiuv+xXa4I rIZBEHzIohtliklKddZVz+iiDXu6zDEXzB+zoccOWnvNrJRezh8he4rARL+Tf0nSWy8YS5 ldgueSIBmxxtRdo7cDjbF2FWKEEbTqE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705040193; a=rsa-sha256; cv=none; b=YWin0p1tvBh8BeAIV2nAy4Nv9C3ab5ggHNqEeQb073YC3F4zJtDkaeWlC0tZBBjaaJh+aW ULw7S6i3I96cT8bowfGWt9o53kBixXj6Ri6gFqBkggpiGN20Y1tmBN94heMKvolb8g8GzA 5ixJdVyy7Q6clht4KhOxnuLWFRFhC4w= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=5QK0yV41; spf=pass (imf05.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.194 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1705040191; bh=r92f2aG9W1o5RGaWeOcmFA+zp/Tat8uerpfmCdUCBJs=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=5QK0yV4166uJ+Fb1SPpMaesp9BvEfqbwcqkcRfUEtfeSO4+xtzxIOG96soOBQcyJe k3Exs5xwA67KBbM4daBw2nMBlT8YVMMCLWutZR3D6YjjGKO4A+9GAyGc1hIW0/nNVf MNXZq+Z2Kz3ziUh9p7WXjrUQ44WQDqAxGy46rzZnYaz2azZy1hoX3O/sp4TPuM4mDZ bYurIdwdabygRHPKFNQDqfdq51FzJyp9Gj06ObkhypyvJO+ldGjV1IiEaXdEMyB0B0 Vy0NWNXMhCkzDZUL94TUoINlCq3H9c2N4fYoLsXs+OVjF4cDnEHOBuUKHLbZ8mdtH9 7x6ZMNuLM8p6w== Received: from [100.96.234.34] (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madrid.collaboradmins.com (Postfix) with ESMTPSA id 70F6D3781FE5; Fri, 12 Jan 2024 06:16:26 +0000 (UTC) Message-ID: <772a2c59-7616-4ec7-9050-17d3abf0b6eb@collabora.com> Date: Fri, 12 Jan 2024 11:16:32 +0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: Muhammad Usama Anjum , linmiaohe@huawei.com, mike.kravetz@oracle.com, naoya.horiguchi@nec.com, akpm@linux-foundation.org, songmuchun@bytedance.com, shy828301@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, jthoughton@google.com, "kernel@collabora.com" , "Matthew Wilcox (Oracle)" , Linux Regressions Subject: Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read Content-Language: en-US To: Jiaqi Yan , Sidhartha Kumar References: <20230713001833.3778937-1-jiaqiyan@google.com> <20230713001833.3778937-5-jiaqiyan@google.com> <079335ab-190f-41f7-b832-6ffe7528fd8b@collabora.com> From: Muhammad Usama Anjum In-Reply-To: <079335ab-190f-41f7-b832-6ffe7528fd8b@collabora.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 3B20910001D X-Rspam-User: X-Stat-Signature: t8ejaf1yneggp8tzf7b3y3kdtdp7wq5e X-Rspamd-Server: rspam03 X-HE-Tag: 1705040192-762223 X-HE-Meta: U2FsdGVkX1+YOc0J3g44NBWGE8CKoQhHwngmnoeTYVeQKg498XKhXjV5hxZXNPE1rOZe9j25Ppz9nioWPPpOiNFjZ/49yd04VkHqf7225sRBtThfpF1UPcj5M3uJaMWqNIFDr99LlbFrrv5YWlFaRsbF/fTlLx6C0haa0XinesmLuEYGXuTFLNVRfRJHjamerhBbtN2XU5UTZXdI/30fdLYZNDIq95rKHwPR5F789RzZVeK2R65Fs1pB2wxzLLhEs86hEZgpVkDB/9DroCFbSwlShxQbsAiKxp00devdFTJR7HN24yux3nAOui6V8R+DlWn398/PEhTBR09bf/ZV1CPC6UPgseItcrA0wEdCO9g2h8WXvPGrPC6ipNY9Cyj/nORlEcHjOnZIJWA56Se1G8gmmAggA5YDcNhhLfrDgDCVY0Dl7QKrKbvsSU+d37PDJguebokWfu3SDZDosMjT2eEO+zkFUI4+jPUCEPis1eCnPvCiCCvoshZHvXIvHelsblkr+Ks45vznrNQU/42SQrZsPxRMbsVkK1hCNNYGuVAKFMdsoOoYWKmGhgZfFpGrRGurDRzCK18Qwu1LZ+KJSsu4ZI9DZt0fFMs6NOnZ4Ft9W1CzJlYjS+Esg1c7ULavEdOelPb8Oqt7GUA603sHTzR98DQmUuVk18g8Vua1ZOR9LKdHB9PzCvB/di8FcDigKJ/IiU3lbPYou42OGTx1Ph3Ddb56ELr6k1GTar3fs/SgFNPS7Ae7vlw9DaLhSHxXCZgyJab9Px4aG1mkaX1xZvQLzfnFiktpSDz6M2pNAHg3LbjfhgRI4ca1L1WZZWFYjlVvkWgFYjEZDUUFRs3sK4DIiXphgDr0PoGpttT/de7P2D3hV4YQsgPZwnJR3/NwYIEDCu+N2k2E36NSzFGNxxT/Nd0/uz9JEB8qr+2VHWx0ROY7vjlpA+n5BL7/Xnc6VeAvJG6gDctuWOiNpbi pJy3iFPi gaQ/hau6xQifBj54Ri2jMC4Sm0VS6nKpajWdKxSoG6l3wkRJWjbuDjKnJDLcedwwIO6QYn1dVDzGDxrgRC5wgsfNYLxCoiTBMnum2JzQapKHIfJimwE/qMzcACXyWiezMu7AkLdVhqvS4cJUUhprT86p44zfyjhJ4Ah8sZThgr/5lwVG9oS3E+CGevPMCkPsy5GGjoP8CA0I4oIye6VxS3nIqs2WsypwxaEKWq5OlrNzNWcxu2QAIdmwS97LJkO4JFPm/heDCoaXpi2uGfgVr+vXWzJY86e1SByoTYV/S0DYkuDmKuxc8fiGZ4rutYx1EIFH/zdaYORF94/4sFdADgwGVktuVcI13/XG0wZ1UDrDGE5wVyK7Tn6Hz9rsA6IFsr/OchD5qfhC3kyg7GWun6gy/fRC8Srpg+RIptaSVdk8YXMc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 1/10/24 3:15 PM, Muhammad Usama Anjum wrote: > On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: >> On 1/6/24 2:13 AM, Jiaqi Yan wrote: >>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum >>> wrote: >>>> >>>> Hi, >>>> >>>> I'm trying to convert this test to TAP as I think the failures sometimes go >>>> unnoticed on CI systems if we only depend on the return value of the >>>> application. I've enabled the following configurations which aren't already >>>> present in tools/testing/selftests/mm/config: >>>> CONFIG_MEMORY_FAILURE=y >>>> CONFIG_HWPOISON_INJECT=m >>>> >>>> I'll send a patch to add these configs later. Right now I'm trying to >>>> investigate the failure when we are trying to inject the poison page by >>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test >>>> fails as it doesn't expect any business for the hugetlb memory. I'm not >>>> sure if the poison handling code has issues or test isn't robust enough. >>>> >>>> ./hugetlb-read-hwpoison >>>> Write/read chunk size=0x800 >>>> ... HugeTLB read regression test... >>>> ... ... expect to read 0x200000 bytes of data in total >>>> ... ... actually read 0x200000 bytes of data in total >>>> ... HugeTLB read regression test...TEST_PASSED >>>> ... HugeTLB read HWPOISON test... >>>> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process virtual >>>> address 0x7f28ec101000 >>>> [ 9.282029] Memory failure: 0x102f01: huge page still referenced by 511 >>>> users >>>> [ 9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed >>>> ... !!! MADV_HWPOISON failed: Device or resource busy >>>> ... HugeTLB read HWPOISON test...TEST_FAILED >>>> >>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not. >>> >>> Thanks for reporting this, Usama! >>> >>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c >>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap >>> writeback disabling." >>> >>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) >>> selftests/mm: add tests for HWPOISON hugetlbfs read". The >>> MADV_HWPOISON injection works and and the test passes: >>> >>> ... HugeTLB read HWPOISON test... >>> ... ... expect to read 0x101000 bytes of data in total >>> ... !!! read failed: Input/output error >>> ... ... actually read 0x101000 bytes of data in total >>> ... HugeTLB read HWPOISON test...TEST_PASSED >>> ... HugeTLB seek then read HWPOISON test... >>> ... ... init val=4 with offset=0x102000 >>> ... ... expect to read 0xfe000 bytes of data in total >>> ... ... actually read 0xfe000 bytes of data in total >>> ... HugeTLB seek then read HWPOISON test...TEST_PASSED >>> ... >>> >>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process >>> virtual address 0x7f75e3101000 >>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge >>> page: Recovered >>> ... >>> >>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we >>> should be able to figure it out via bisection (and of course by >>> reading delta commits between them, probably related to page >>> refcount). >> Thank you for this information. >> >>> >>> That being said, I will be on vacation from tomorrow until the end of >>> next week. So I will get back to this after next weekend. Meanwhile if >>> you want to go ahead and bisect the problematic commit, that will be >>> very much appreciated. >> I'll try to bisect and post here if I find something. > Found the culprit commit by bisection: > > a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3 > mm/filemap: remove hugetlb special casing in filemap.c #regzbot title: hugetlbfs hwpoison handling #regzbot introduced: a08c7193e4f1 #regzbot monitor: https://lore.kernel.org/all/20240111191655.295530-1-sidhartha.kumar@oracle.com > > hugetlb-read-hwpoison started failing from this patch. I've added the > author of this patch to this bug report. > >> >>> >>> Thanks, >>> Jiaqi >>> >>> >>>> >>>> Regards, >>>> Usama >>>> > -- BR, Muhammad Usama Anjum