From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D05D1C8303C for ; Mon, 7 Jul 2025 12:37:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64B8F6B03F7; Mon, 7 Jul 2025 08:37:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6234F6B03F8; Mon, 7 Jul 2025 08:37:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 511F26B03F9; Mon, 7 Jul 2025 08:37:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3F6A86B03F7 for ; Mon, 7 Jul 2025 08:37:25 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E100A592E4 for ; Mon, 7 Jul 2025 12:37:24 +0000 (UTC) X-FDA: 83637419208.14.FB9D5D5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 8A980C000F for ; Mon, 7 Jul 2025 12:37:22 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DEn2ELqk; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751891842; a=rsa-sha256; cv=none; b=eUtqjDY00yvqrdqznVRm1K1wOs/cw1jnhfy+UfEx6oRMqlPCyGjjWc0UjdpevT45kpbbPt gTB8DYNYZRbVmJ/zohDrZEgJRORdogw6ga4kCE+8or9fS34GHgbZR791lcS/0PDGoTEwWA n3n47AS/NNQLEd2Cm0RxqDZdwgaWhgs= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DEn2ELqk; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751891842; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wX2X+Lr+1DCq+5ZcU/VSPpY0EWrbj5ADMqaMYP0Xis4=; b=0cRawmnPx7Jk39KWE0Pk9CO4C/7AEVnTfph91aQUfQZ8WBLiBT4tEzVZ4XW8DUgg0+65Pl G3lpbfOeiPs4mO81Ymh4ygS9NzK1pO/EqYbs7NU+Hc08zF5Dqv0L7V2bHwWmKCk7JL4KTI ekOPbn8GxAbeyyEElSWXpe2xiWqK4yI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751891841; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=wX2X+Lr+1DCq+5ZcU/VSPpY0EWrbj5ADMqaMYP0Xis4=; b=DEn2ELqk+YxLou+/Q3StAjoGt+jE0dTV9tQG3BsmhlrOrsCs0v9S8RPxQf2LwW+eyPZfm7 mvJ6yBu4t6MQhjlAAI4mstTL5bNY78qV6x/eJ9ngP2Wi/JKRp9DZmxowMgoVEHg4d3OKny LaTq8OnY+mZ+yNeRpz0XJzA5TLeowyM= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-399-AANzhZrAP8yF_W3KhXGnGw-1; Mon, 07 Jul 2025 08:37:18 -0400 X-MC-Unique: AANzhZrAP8yF_W3KhXGnGw-1 X-Mimecast-MFC-AGG-ID: AANzhZrAP8yF_W3KhXGnGw_1751891838 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-451d5600a54so24026495e9.2 for ; Mon, 07 Jul 2025 05:37:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751891837; x=1752496637; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=wX2X+Lr+1DCq+5ZcU/VSPpY0EWrbj5ADMqaMYP0Xis4=; b=OtRokiyRuYmoaKU82xGeq+jhtPFR9OEoMU8M7ivZzdSH6Wj/VLj0aGRkrpVxA+RbOq +BAzIOhJVQKsyJBe9DhN6hqDyKwUwqU86qDJym4F0VpZcM4bVWhiG32ebEaHF4Su0Qo8 m4vp+vmX9Z0/bSStDJAo94yn0wSwozABnQUpXB49Vw0Do42poRin0HzWLs6sNuBhrzII gYkmEagC+OxYPzlV3asiSg8VAJJbiqxdZDHyhNIQCIWGDFnMVpadrtNaF7rhGpjE/Bw4 zr3qu2nIeoijQcg+cVIGiNOHjbRQ/aHbMz2hxfBN1jHIRsnT8D7DKyAtBawE992OtnOP C4Sw== X-Forwarded-Encrypted: i=1; AJvYcCUbqSVp7kSguXSbL4ZcN3gylhe4eHAKdpIHPRaJe0LEGC4hI0N9m99QQoWFDllyBFfqHCtkHU+ZFg==@kvack.org X-Gm-Message-State: AOJu0YzmeEUrn3BFlaCqJjo64KH0shHH4XlYKgS9tIFs84IWbQYPvbtp VyMjBVVeAV40Es7P2HDjGBrTaI5YpLeOFNBPQq4l2PTAPoIeGcZ2qPxNdLfTogX5yJZT/2xz4bQ x9Zej8TVbbGVIzTm7RhvFDlENMLbQzOKn/OPI2AaG2KAqDG0LoKyR X-Gm-Gg: ASbGncvmVgWAgOi4Ktpv4vtuS852A0w2nez35lW5Qbc/9BPwn3tGuNNB627M4K0XLkR TjnYl+XyF9Qhldn0j+9J069rB/k7k1Vnunp4+WFbvBZGF3PYZm5rhUX7zrdJQqeQTo9v1vOYJh6 73OvdqGuWTXzqXtSzxUWt8VqBG9wyVRBqCh/U5+z+qX/HvCHmLBqy8KTYPcySSO4SrvcW7O7SXt h3+SYylmJ76Pztt0qEZ7QT+1HhLupr3Am2NPNhDzxTUWtllia2CyE0PIMhxkLkFh5z1532IaZMw 6VAQxyDHE7FWU7j/Ur53CaZhyYu3juqbEDATL4Qt2wtnmN4RQQGTpEAwkWKDY0wQx4DYCaPvrwp f8jM6ljBYqXcMJsQiPxc5q4q2E1O2Bqkq1ypr+dMDJm09YtAVIw== X-Received: by 2002:a05:600c:3107:b0:43c:fa24:873e with SMTP id 5b1f17b1804b1-454bb819c80mr76300605e9.13.1751891837416; Mon, 07 Jul 2025 05:37:17 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEVDKXxnn3X74Ec6ieuKIqYxm9IVlA9UFeTwuj3Bncx59yn4qD8rOucKemKn1toas1SvcF06A== X-Received: by 2002:a05:600c:3107:b0:43c:fa24:873e with SMTP id 5b1f17b1804b1-454bb819c80mr76300245e9.13.1751891836849; Mon, 07 Jul 2025 05:37:16 -0700 (PDT) Received: from ?IPV6:2003:d8:2f38:1d00:657c:2aac:ecf5:5df8? (p200300d82f381d00657c2aacecf55df8.dip0.t-ipconnect.de. [2003:d8:2f38:1d00:657c:2aac:ecf5:5df8]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3b471b966cbsm9886384f8f.49.2025.07.07.05.37.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Jul 2025 05:37:16 -0700 (PDT) Message-ID: Date: Mon, 7 Jul 2025 14:37:15 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 2/2] mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range To: Jinjiang Tu , Oscar Salvador Cc: akpm@linux-foundation.org, linmiaohe@huawei.com, mhocko@kernel.org, linux-mm@kvack.org, wangkefeng.wang@huawei.com References: <20250627125747.3094074-1-tujinjiang@huawei.com> <20250627125747.3094074-3-tujinjiang@huawei.com> <373d02c5-2b62-8543-b786-8fd591ad56eb@huawei.com> <61325284-d1d6-a973-8aa7-c0f226db95fa@huawei.com> <7b2c054b-fc33-4127-aaa9-9edf6a63e142@redhat.com> <924d9d25-e53c-f159-6ec0-e1fd4e96d6e2@huawei.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <924d9d25-e53c-f159-6ec0-e1fd4e96d6e2@huawei.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: HE6uHLRkL2fSAPiRc9NmuoOUFxxovNWWb7hkC2wnpSc_1751891838 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8A980C000F X-Stat-Signature: we3rdfgwijj7kwqrz7qwcrkyb1zxiyzj X-Rspam-User: X-HE-Tag: 1751891842-132796 X-HE-Meta: U2FsdGVkX1/JpseRgVo/dL24W2qZyQrKiskbI+EaYAHiTBt5L1UcBuZTofxFsLvE5xEk2dJKlTfF6lvOl4lxFq1yNkVQ5+OVoP1v2lHJyBO4pDCBGzmxpDhXzpmD5KvdMcMJwRi3zlDGrnoe8R4kDq1ZsfCJKSDX8bfXApLz3r+rNpVuOdzCxneFJf2F/ytA3J0BOc24Pxlv5mR9ES6JsLBHj0Q+j/PRY+dBj9Om5qJkjcQ0PUaB8Gc6FVzIP2E10kfB2UD+jeNqvyX0R9vjIJyxfSuPOM+q4C4pK61p8D9LGyXcuIzq5A5o/9kxA9GmZbsXdhk99pEwUN7vnqFUKLG8KkKbDIZXFUOWHKz3PR4Gj+UDoS0jw9hZeqbaTrTYoWOU79T4lymdglxFHWRC5ZpQDbS1eAngHt2LN8sEQsJx1vblXXykd9Gc/UTa09IQl8dq3BFAJbb8yDEHKzYxf2EF5qWUGfgrTT6eDoVgEkTtF0Qpf+b6fcWmX9U2QZI9hI+LtBX/QySy7YhvXPM53jDthugyJr9Muc9hb4jxPf/banY2dqmff+ljCEhK7GUWcKvVM/pZVC9p71TfWQia2bvKUkCWdGBBJkBy9tcp471xbYvdAWRuTnv3HKxUJ0wJtORl++pBSduwv6erOornE2vsYG6/kV0qGxG9Kdpj0WhI8PEJmWJC5/VEFadmX0/3dsVVvnNFrwkITdwC1jWfD/0ojHU0T+yIoKQ8YSvRCBHHZY+EfRXMgiT070vt1+S2vhIlC+11zCgBL7lzHDUMqSfH6nlHKrmCCTKXosCbiF3dh0nqFhSP69GDfBZYYg12ER3ozWkV8okdFe/0h3auK6cBCHTphQbhu5ZbEauThlEnLfe9RT0sE9a5SwgBFgJBUGTieJud3IVlUmQ6OTUzjuEg4ykIzZKwvUR7ZZdnLnmEtR9Mc0UulfEy9AR248Z2TTufJVsRN93xY18bULe z/d81tEs ReMBhDQ4+w2v643qqLdN+TEpYv0bfoVd7wW/Wy+PgyITJ0iJe/Eb+YKZ1h5uDpaJIPc65a8dFqdSlf+nr9orNonDJ+wh8bONzAIbMqunYJ5n50o05QFcppPMidoFRSozI3cq6u0Fmbekiq1bFGywNNWhvapnn2tfBxFhhwBwllvC2M2kblrqeMvFI2/1A1Fv56f9KEEsqFTdTq9GhCGMKjR6j8TOu8eKGpx9z/xMFpzqHif1JOJIiV2adx9xPGFpqK8A+r/F2oGSYmEGmyMY0QquevwqvD3JFGPhNOKkVjqBd/BRg8CBLdC0TRQT5ufs48cRzIP2tAO/k1jNwzI3vAxqzzWNl8u5oQVz2kyQ93+AVGWjFVp4pKTpRIkEUIe4Iwr9yF4ADsvMJHT5VhwPb8kE7Jpq/Z9DQLuQpbCKEHveb0tc35ST1W+WDH83neJUDzPCxx310SCFO+mLlPJe49ykID8bhJXDCeEE7VRom3VKdcPdqxfIfqn1vTk0kheJf3Bo7AajAaSAtYGIS4HH9lWc9IJqk0JJ6xtYWxWw2Yb8lytBdVhCD5MiITA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07.07.25 13:51, Jinjiang Tu wrote: > > 在 2025/7/3 17:06, David Hildenbrand 写道: >> On 03.07.25 10:24, Jinjiang Tu wrote: >>> >>> 在 2025/7/3 15:57, David Hildenbrand 写道: >>>> On 03.07.25 09:46, Jinjiang Tu wrote: >>>>> >>>>> 在 2025/7/1 22:21, Oscar Salvador 写道: >>>>>> On Fri, Jun 27, 2025 at 08:57:47PM +0800, Jinjiang Tu wrote: >>>>>>> In do_migrate_range(), the hwpoisoned folio may be large folio, >>>>>>> which >>>>>>> can't be handled by unmap_poisoned_folio(). >>>>>>> >>>>>>> I can reproduce this issue in qemu after adding delay in >>>>>>> memory_failure() >>>>>>> >>>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000000 >>>>>>> Workqueue: kacpi_hotplug acpi_hotplug_work_fn >>>>>>> RIP: 0010:try_to_unmap_one+0x16a/0xfc0 >>>>>>>     >>>>>>>     rmap_walk_anon+0xda/0x1f0 >>>>>>>     try_to_unmap+0x78/0x80 >>>>>>>     ? __pfx_try_to_unmap_one+0x10/0x10 >>>>>>>     ? __pfx_folio_not_mapped+0x10/0x10 >>>>>>>     ? __pfx_folio_lock_anon_vma_read+0x10/0x10 >>>>>>>     unmap_poisoned_folio+0x60/0x140 >>>>>>>     do_migrate_range+0x4d1/0x600 >>>>>>>     ? slab_memory_callback+0x6a/0x190 >>>>>>>     ? notifier_call_chain+0x56/0xb0 >>>>>>>     offline_pages+0x3e6/0x460 >>>>>>>     memory_subsys_offline+0x130/0x1f0 >>>>>>>     device_offline+0xba/0x110 >>>>>>>     acpi_bus_offline+0xb7/0x130 >>>>>>>     acpi_scan_hot_remove+0x77/0x290 >>>>>>>     acpi_device_hotplug+0x1e0/0x240 >>>>>>>     acpi_hotplug_work_fn+0x1a/0x30 >>>>>>>     process_one_work+0x186/0x340 >>>>>>> >>>>>>> In this case, just make offline_pages() fail. >>>>>>> >>>>>>> Besides, do_migrate_range() may be called between memory_failure set >>>>>>> hwposion flag and ioslate the folio from lru, so remove WARN_ON(). >>>>>>> In other >>>>>>> places, unmap_poisoned_folio() is called when the folio is >>>>>>> isolated, obey >>>>>>> it in do_migrate_range() too. >>>>>>> >>>>>>> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned >>>>>>> pages to be offlined") >>>>>>> Signed-off-by: Jinjiang Tu >>>>>> ... >>>>>>> @@ -2041,11 +2048,9 @@ int offline_pages(unsigned long start_pfn, >>>>>>> unsigned long nr_pages, >>>>>>>                    ret = scan_movable_pages(pfn, end_pfn, &pfn); >>>>>>>                 if (!ret) { >>>>>>> -                /* >>>>>>> -                 * TODO: fatal migration failures should bail >>>>>>> -                 * out >>>>>>> -                 */ >>>>>>> -                do_migrate_range(pfn, end_pfn); >>>>>>> +                ret = do_migrate_range(pfn, end_pfn); >>>>>>> +                if (ret) >>>>>>> +                    break; >>>>>> I am not really sure about this one. >>>>>> I get the reason you're adding it, but note that migrate_pages() can >>>>>> also return >>>>>> "fatal" errors and we don't propagate that. >>>>>> >>>>>> The moto has always been to migrate as much as possible, and this >>>>>> changes this >>>>>> behaviour. >>>>> If we just skip to next pfn, offline_pages() will deadloop meaningless >>>>> util received signal. >>>> >>>> Yeah, that's also not good, >>>> >>>>> It seems there is no document to guarantee memory offline have to >>>>> migrate as much as possible. >>>> >>>> We should try offlining as good as possible. But if there is something >>>> we just cannot possibly migrate, there is no sense in retrying. >>>> >>>> Now, could we run into this case here because we are racing with other >>>> code, and actually retrying again could make it work? >>>> >>>> Remind me again: how exactly do we arrive at this point of having a >>>> large folio that is hwpoisoned but still mapped? >>>> >>>> In memory_failure(), we do on a  large folio >>>> >>>> 1) folio_set_has_hwpoisoned >>>> 2) try_to_split_thp_page >>>> 3) if splitting fails, kill_procs_now >>> If 2) is executed when do_migrate_range() increment the refcount of the >>> folio, the split fails, and retry is meaningless. >> >> kill_procs_now will kill all processes, effectively unmapping the >> folio in that case? >> >> So retrying would later just ... get us an unmapped folio and we can >> make progress? >> > kill_procs_now()->collect_procs() collects the tasks to kill. But not > all tasks that maps the folio > will be collected, > collect_procs_anon()->task_early_kill()->find_early_kill_thread() will not > select the task (not current) if PF_MCE_PROCESS isn't set and > sysctl_memory_failure_early_kill > isn't enabled (this is the default behaviour). I think you're right, that's rather nasty. We fail to split, but keep the folio mapped into some processes. And we can't unmap it because unmap_poisoned_folio() does not properly support large folios yet. We really should unmap the folio when splitting fail. :( -- Cheers, David / dhildenb