linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shuai Xue <xueshuai@linux.alibaba.com>
To: James Morse <james.morse@arm.com>, Borislav Petkov <bp@alien8.de>
Cc: rafael@kernel.org, wangkefeng.wang@huawei.com,
	tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com,
	linmiaohe@huawei.com, naoya.horiguchi@nec.com,
	gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org,
	linux-acpi@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-edac@vger.kernel.org,
	acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org,
	x86@kernel.org, justin.he@arm.com, ardb@kernel.org,
	ying.huang@intel.com, ashish.kalra@amd.com,
	baolin.wang@linux.alibaba.com, tglx@linutronix.de,
	mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org,
	hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com,
	xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com
Subject: Re: [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code
Date: Fri, 1 Dec 2023 11:37:47 +0800	[thread overview]
Message-ID: <e7d55b9b-9819-434e-b642-8325728b638b@linux.alibaba.com> (raw)
In-Reply-To: <1758585c-219b-c5df-a3cd-35be8b020fd2@arm.com>



On 2023/12/1 01:39, James Morse wrote:
> Hi Boris, Shuai,
> 
> On 29/11/2023 18:54, Borislav Petkov wrote:
>> On Sun, Nov 26, 2023 at 08:25:38PM +0800, Shuai Xue wrote:
>>>> On Sat, Nov 25, 2023 at 02:44:52PM +0800, Shuai Xue wrote:
>>>>> - an AR error consumed by current process is deferred to handle in a
>>>>>   dedicated kernel thread, but memory_failure() assumes that it runs in the
>>>>>   current context
>>>>
>>>> On x86? ARM?
>>>>
>>>> Pease point to the exact code flow.
> 
> 
>>> An AR error consumed by current process is deferred to handle in a
>>> dedicated kernel thread on ARM platform. The AR error is handled in bellow
>>> flow:
> 
> Please don't think of errors as "action required" - that's a user-space signal code. If
> the page could be fixed by memory-failure(), you may never get a signal. (all this was the
> fix for always sending an action-required signal)
> 
> I assume you mean the CPU accessed a poisoned location and took a synchronous error.

Yes, I mean that CPU accessed a poisoned location and took a synchronous error.
> 
> 
>>> -----------------------------------------------------------------------------
>>> [usr space task einj_mem_uc consumd data poison, CPU 3]         STEP 0
>>>
>>> -----------------------------------------------------------------------------
>>> [ghes_sdei_critical_callback: current einj_mem_uc, CPU 3]		STEP 1
>>> ghes_sdei_critical_callback
>>>     => __ghes_sdei_callback
>>>         => ghes_in_nmi_queue_one_entry 		// peak and read estatus
>>>         => irq_work_queue(&ghes_proc_irq_work) <=> ghes_proc_in_irq // irq_work
>>> [ghes_sdei_critical_callback: return]
>>> -----------------------------------------------------------------------------
>>> [ghes_proc_in_irq: current einj_mem_uc, CPU 3]			        STEP 2
>>>             => ghes_do_proc
>>>                 => ghes_handle_memory_failure
>>>                     => ghes_do_memory_failure
>>>                         => memory_failure_queue	 // put work task on current CPU
>>>                             => if (kfifo_put(&mf_cpu->fifo, entry))
>>>                                   schedule_work_on(smp_processor_id(), &mf_cpu->work);
>>>             => task_work_add(current, &estatus_node->task_work, TWA_RESUME);
>>> [ghes_proc_in_irq: return]
>>> -----------------------------------------------------------------------------
>>> // kworker preempts einj_mem_uc on CPU 3 due to RESCHED flag	STEP 3
>>> [memory_failure_work_func: current kworker, CPU 3]	
>>>      => memory_failure_work_func(&mf_cpu->work)
>>>         => while kfifo_get(&mf_cpu->fifo, &entry);	// until get no work
>>>             => memory_failure(entry.pfn, entry.flags);
>>
>> From the comment above that function:
>>
>>  * The function is primarily of use for corruptions that
>>  * happen outside the current execution context (e.g. when
>>  * detected by a background scrubber)
>>  *
>>  * Must run in process context (e.g. a work queue) with interrupts
>>  * enabled and no spinlocks held.
>>
>>> -----------------------------------------------------------------------------
>>> [ghes_kick_task_work: current einj_mem_uc, other cpu]           STEP 4
>>>                 => memory_failure_queue_kick
>>>                     => cancel_work_sync - waiting memory_failure_work_func finish
>>>                     => memory_failure_work_func(&mf_cpu->work)
>>>                         => kfifo_get(&mf_cpu->fifo, &entry); // no work
>>> -----------------------------------------------------------------------------
>>> [einj_mem_uc resume at the same PC, trigger a page fault        STEP 5
>>>
>>> STEP 0: A user space task, named einj_mem_uc consume a poison. The firmware
>>> notifies hardware error to kernel through is SDEI
>>> (ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED).
>>>
>>> STEP 1: The swapper running on CPU 3 is interrupted. irq_work_queue() rasie
>>> a irq_work to handle hardware errors in IRQ context
>>>
>>> STEP2: In IRQ context, ghes_proc_in_irq() queues memory failure work on
>>> current CPU in workqueue and add task work to sync with the workqueue.
>>>
>>> STEP3: The kworker preempts the current running thread and get CPU 3. Then
>>> memory_failure() is processed in kworker.
>>
>> See above.
>>
>>> STEP4: ghes_kick_task_work() is called as task_work to ensure any queued
>>> workqueue has been done before returning to user-space.
>>>
>>> STEP5: Upon returning to user-space, the task einj_mem_uc resumes at the
>>> current instruction, because the poison page is unmapped by
>>> memory_failure() in step 3, so a page fault will be triggered.
>>>
>>> memory_failure() assumes that it runs in the current context on both x86
>>> and ARM platform.
>>>
>>>
>>> for example:
>>> 	memory_failure() in mm/memory-failure.c:
>>>
>>> 		if (flags & MF_ACTION_REQUIRED) {
>>> 			folio = page_folio(p);
>>> 			res = kill_accessing_process(current, folio_pfn(folio), flags);
>>> 		}
>>
>> And?
>>
>> Do you see the check above it?
>>
>> 	if (TestSetPageHWPoison(p)) {
>>
>> test_and_set_bit() returns true only when the page was poisoned already.
>>
>>  * This function is intended to handle "Action Required" MCEs on already
>>  * hardware poisoned pages. They could happen, for example, when
>>  * memory_failure() failed to unmap the error page at the first call, or
>>  * when multiple local machine checks happened on different CPUs.
>>
>> And that's kill_accessing_process().
>>
>> So AFAIU, the kworker running memory_failure() would only mark the page
>> as poison.
>>
>> The killing happens when memory_failure() runs again and the process
>> touches the page again.
>>
>> But I'd let James confirm here.
> 
> Yes, this is what is expected to happen with the existing code.
> 
> The first pass will remove the pages from all processes that have it mapped before this
> user-space task can restart. Restarting the task will make it access a poisoned page,
> kicking off the second path which delivers the signal.
> 
> The reason for two passes is send_sig_mceerr() likes to clear_siginfo(), so even if you
> queued action-required before leaving GHES, memory-failure() would stomp on it.
> 
> 
>> I still don't know what you're fixing here.
> 
> The problem is if the user-space process registered for early messages, it gets a signal
> on the first pass. If it returns from that signal, it will access the poisoned page and
> get the action-required signal.
> 
> How is this making Qemu go wrong?

The problem here is that we need to assume, the first pass memory failure
handle and unmap the poisoned page successfully.

- If so, it may work by the second pass action-requried signal because it
  access an unmapped page. But IMHO, we can improve by just sending one
  pass signal, so that the Guest will vmexit only once, right?

- If not, there is no second pass signal. The exist code does not handle
  the error code from memory_failure(), so a exception loop happens
  resulting a hard lockup panic.

Besides, in production environment, a second access to an already known
poison page will introduce more risk of error propagation.

> 
> 
> As to how this works for you given Boris' comments above: kill_procs() is also called from
> hwpoison_user_mappings(), which takes the flags given to memory-failure(). This is where
> the action-optional signals come from.
> 
> 

Thank you very much for involving to review and comment.

Best Regards,
Shuai


  reply	other threads:[~2023-12-01  3:38 UTC|newest]

Thread overview: 148+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20221027042445.60108-1-xueshuai@linux.alibaba.com>
2023-03-17  7:24 ` [PATCH v3 0/2] ACPI: APEI: handle synchronous exceptions " Shuai Xue
2023-03-20 18:03   ` Rafael J. Wysocki
2023-03-30  6:11     ` Shuai Xue
2023-03-30  9:52       ` Rafael J. Wysocki
2023-03-21  7:17   ` mawupeng
2023-03-22  1:27     ` Shuai Xue
2023-03-17  7:24 ` [PATCH v3 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-03-17  7:24 ` [PATCH v3 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-06 12:39   ` Xiaofei Tan
2023-04-07  2:21     ` Shuai Xue
2023-04-08  9:13 ` [PATCH v4 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-08  9:13 ` [PATCH v4 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-08  9:13 ` [PATCH v4 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-11  1:44   ` Xiaofei Tan
2023-04-11  3:16     ` Shuai Xue
2023-04-11  9:02       ` Xiaofei Tan
2023-04-11  9:48         ` Shuai Xue
2023-04-11 10:48 ` [PATCH v5 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-11 10:48 ` [PATCH v5 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-11 14:17   ` Kefeng Wang
2023-04-12  2:54     ` Shuai Xue
2023-04-12  3:55   ` Xiaofei Tan
2023-04-13  1:42     ` Shuai Xue
2023-04-11 10:48 ` [PATCH v5 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-11 14:28   ` Kefeng Wang
2023-04-12  2:58     ` Shuai Xue
2023-04-12  4:05   ` Xiaofei Tan
2023-04-13  1:49     ` Shuai Xue
2023-04-12 11:27 ` [PATCH v6 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-12 11:28 ` [PATCH v6 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-12 11:28 ` [PATCH v6 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-04-17  1:14 ` [PATCH v7 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Shuai Xue
2023-04-24  6:24   ` Shuai Xue
2023-05-08  1:55     ` Shuai Xue
2023-04-17  1:14 ` [PATCH v7 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-04-17  1:14 ` [PATCH v7 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-09-19  2:21 ` [RESEND PATCH v8 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Shuai Xue
2023-09-19  2:21 ` [RESEND PATCH v8 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-09-25 14:43   ` Jarkko Sakkinen
2023-09-26  6:23     ` Shuai Xue
2023-09-19  2:21 ` [RESEND PATCH v8 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-09-25 15:00   ` Jarkko Sakkinen
2023-09-26  6:38     ` Shuai Xue
2023-10-03  8:28   ` Naoya Horiguchi
2023-10-07  2:01     ` Shuai Xue
2023-10-07  7:28 ` [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Shuai Xue
2023-11-21  1:48   ` Shuai Xue
2023-11-23 15:07   ` Borislav Petkov
2023-11-25  6:44     ` Shuai Xue
2023-11-25 12:10       ` Borislav Petkov
2023-11-26 12:25         ` Shuai Xue
2023-11-29 18:54           ` Borislav Petkov
2023-11-30  2:58             ` Shuai Xue
2023-11-30 14:40               ` Borislav Petkov
2023-11-30 17:43                 ` James Morse
2023-12-01  2:58                   ` Shuai Xue
2023-11-30 17:39             ` James Morse
2023-12-01  3:37               ` Shuai Xue [this message]
2023-10-07  7:28 ` [PATCH v9 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-11-30 17:39   ` James Morse
2023-12-01  5:22     ` Shuai Xue
2023-10-07  7:28 ` [PATCH v9 2/2] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-11-30 17:39   ` James Morse
2023-12-01  7:03     ` Shuai Xue
2023-12-18  6:45 ` [PATCH v10 0/4] ACPI: APEI: handle synchronous errors in task work with proper si_code Shuai Xue
2023-12-18  6:45 ` [PATCH v10 1/4] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Shuai Xue
2023-12-18  6:53   ` Greg KH
2023-12-21 13:55   ` Rafael J. Wysocki
2023-12-22  1:07     ` Shuai Xue
2023-12-18  6:45 ` [PATCH v10 2/4] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2023-12-18  6:54   ` Greg KH
2023-12-18  6:45 ` [PATCH v10 3/4] mm: memory-failure: move memory_failure() return value documentation to function declaration Shuai Xue
2023-12-18  6:54   ` Greg KH
2023-12-18  6:45 ` [PATCH v10 4/4] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2023-12-18  6:54   ` Greg KH
2024-02-04  8:01 ` [PATCH v11 0/3] ACPI: APEI: handle synchronous exceptions in task work to send correct SIGBUS si_code Shuai Xue
2024-02-19  1:46   ` Shuai Xue
2024-02-04  8:01 ` [PATCH v11 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-02-19  9:25   ` Borislav Petkov
2024-02-22  2:07     ` Shuai Xue
2024-02-23  5:26       ` Dan Williams
2024-02-23 12:08         ` Jonathan Cameron
2024-02-23 12:17           ` Jonathan Cameron
2024-02-24  6:08             ` Shuai Xue
2024-02-26 10:29               ` Borislav Petkov
2024-02-27  1:23                 ` Shuai Xue
2024-02-24 19:42             ` Dan Williams
2024-02-24 19:40     ` Dan Williams
2024-02-04  8:01 ` [PATCH v11 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-02-26 10:46   ` Borislav Petkov
2024-02-27  1:27     ` Shuai Xue
2024-02-04  8:01 ` [PATCH v11 3/3] ACPI: APEI: handle synchronous exceptions in task work to send correct SIGBUS si_code Shuai Xue
2024-02-29  7:05   ` Shuai Xue
2024-03-08 10:18   ` Borislav Petkov
2024-03-12  6:05     ` Shuai Xue
2024-09-02  3:00 ` [PATCH v12 0/3] ACPI: APEI: handle synchronous errors in task work Shuai Xue
2024-09-18  2:16   ` Shuai Xue
2024-09-02  3:00 ` [PATCH v12 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-09-03 16:09   ` Jarkko Sakkinen
2024-09-05  3:04     ` Shuai Xue
2024-09-05 14:14       ` Jarkko Sakkinen
2024-09-05 14:17         ` Jarkko Sakkinen
2024-09-06  1:53           ` Shuai Xue
2024-09-06 14:42             ` Jarkko Sakkinen
2024-09-02  3:00 ` [PATCH v12 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-09-03 16:10   ` Jarkko Sakkinen
2024-09-05  3:08     ` Shuai Xue
2024-09-02  3:00 ` [PATCH v12 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-09-03 16:11   ` Jarkko Sakkinen
2024-09-05  3:09     ` Shuai Xue
2024-09-20  4:30 ` [PATCH v13 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-09-20  4:30 ` [PATCH v13 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-09-20 11:35   ` Jarkko Sakkinen
2024-09-20  4:30 ` [PATCH v13 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-09-20 11:35   ` Jarkko Sakkinen
2024-09-20  4:30 ` [PATCH v13 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-09-20 11:44   ` Jarkko Sakkinen
2024-09-20 12:14     ` Shuai Xue
2024-10-14  8:42 ` [PATCH v14 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-10-14  8:42 ` [PATCH v14 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-10-17  9:39   ` Jonathan Cameron
2024-10-17 23:41     ` Shuai Xue
2024-10-14  8:42 ` [PATCH v14 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-10-17  9:41   ` Jonathan Cameron
2024-10-17 23:43     ` Shuai Xue
2024-10-14  8:42 ` [PATCH v14 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-10-17  9:56   ` Jonathan Cameron
2024-10-18  0:08     ` Shuai Xue
2024-10-22  1:11   ` Shuai Xue
2024-10-25 14:40     ` Jarkko Sakkinen
2024-10-26  6:46       ` Shuai Xue
2024-10-28  8:11 ` [PATCH v15 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-10-28  8:11 ` [PATCH v15 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-10-29 20:48   ` Yazen Ghannam
2024-10-30  1:54     ` Shuai Xue
2024-10-30 14:08       ` Yazen Ghannam
2024-10-31  1:36         ` Shuai Xue
2024-10-28  8:11 ` [PATCH v15 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-10-28  8:11 ` [PATCH v15 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue
2024-11-04  1:54 ` [PATCH v16 0/3] ACPI: APEI: handle synchronous errors " Shuai Xue
2024-11-04  1:54 ` [PATCH v16 1/3] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Shuai Xue
2024-11-05 15:09   ` Yazen Ghannam
2024-11-05 16:56     ` Jarkko Sakkinen
2024-11-06  6:07     ` Shuai Xue
2024-11-04  1:54 ` [PATCH v16 2/3] mm: memory-failure: move return value documentation to function declaration Shuai Xue
2024-11-05 15:18   ` Yazen Ghannam
2024-11-06  6:12     ` Shuai Xue
2024-11-04  1:54 ` [PATCH v16 3/3] ACPI: APEI: handle synchronous exceptions in task work Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7d55b9b-9819-434e-b642-8325728b638b@linux.alibaba.com \
    --to=xueshuai@linux.alibaba.com \
    --cc=acpica-devel@lists.linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=ashish.kalra@amd.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=justin.he@arm.com \
    --cc=lenb@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lvying6@huawei.com \
    --cc=mawupeng1@huawei.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=rafael@kernel.org \
    --cc=robert.moore@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tanxiaofei@huawei.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=xiexiuqi@huawei.com \
    --cc=ying.huang@intel.com \
    --cc=zhuo.song@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox