From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02195C433F5 for ; Mon, 30 May 2022 11:37:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 889DE6B0073; Mon, 30 May 2022 07:37:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 838C76B0074; Mon, 30 May 2022 07:37:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 725FD8D0001; Mon, 30 May 2022 07:37:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 648DE6B0073 for ; Mon, 30 May 2022 07:37:38 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2CF5033F19 for ; Mon, 30 May 2022 11:37:38 +0000 (UTC) X-FDA: 79522209396.23.CAB5CA8 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf31.hostedemail.com (Postfix) with ESMTP id 9FB9A2006C for ; Mon, 30 May 2022 11:36:57 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id bo5so10368520pfb.4 for ; Mon, 30 May 2022 04:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=oTn7xiqfMVG/Ets8BTsPYZb54RZ76weVWzmNwY5pPOo=; b=Sh2IF70NIweZY1YxFBbcJj+h+93R3gV45P301sqtBAYvTC1LNCDTVqNx/GSSKFseWa 036lnyaVMqBdidXXcWzlLqYbAfPD0hzUxzNox8v3xnfLv+i5LbHZhjiNZ8yWdpfNl3G3 r4C4g95JJpHGovTumWZYkpfVmrLEopIK4AehBE72PgZl8X4XClsrN0lOOBRnalM+mC5d udC0XbG5Vm4ceV5WRHq5u07rZcJdbZW27bvY8sxba93MwzgTx74f1gLC1QMyximZtc/J RpJdnh0P03nLKATKs+sTq8cow6FQuGkrgwkTOkS8kMXQexvE3T4U5h29KV6xAdOofSnn Saxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=oTn7xiqfMVG/Ets8BTsPYZb54RZ76weVWzmNwY5pPOo=; b=g5ovYa8Wu9QROlL9OYfPXjU95fNYkWWxnXww0VKCjt34dImEl5dLjgJpov4KbqwuWQ Hf7xgXNijVkEKRr3CZLUQMWx+q25obwpWcVTDVSg8kS8pHho1n281PKz/agRxPFYK7Lz 0DobcMoj0IvJdf9GTSkCVO9LCM/MUbX36GDDIycyxg7UJQeyK884a/72t9KeJrytOa/j BrLphV4e4PjyN+T0RBr7pnTMcvlkxDtfxZi0hCO7VVfNGd1eSqa3loWGZZUgNnJ+EGIG CJpK8rD8u/Y0KrJiYCGyzCabn6zd5ERSyDmK+qk76sFiTy6eOCunrt6WGPfwf0UBBddn ue+A== X-Gm-Message-State: AOAM531eR0NexS79Iw/AxLBiPrnyFmOMpJ8Q1adUCwiFpuA6VcFLy+nQ kua00P+sLBoANcZJrqzXZrCigA== X-Google-Smtp-Source: ABdhPJzonQ9UZ2K3KajjLc5t3cYmEGAE0YbTTOskwZ6mVqc54jbu+C/Juy1kTZLlt6QZEc2VPOnxMQ== X-Received: by 2002:a65:618e:0:b0:3fb:177f:d365 with SMTP id c14-20020a65618e000000b003fb177fd365mr17788413pgv.265.1653910655546; Mon, 30 May 2022 04:37:35 -0700 (PDT) Received: from [10.255.89.136] ([139.177.225.249]) by smtp.gmail.com with ESMTPSA id i1-20020a17090a718100b001e2608203d4sm4426742pjk.5.2022.05.30.04.37.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 May 2022 04:37:34 -0700 (PDT) Message-ID: <4b0c3e37-b882-681a-36fc-16cee7e1fff0@bytedance.com> Date: Mon, 30 May 2022 19:33:35 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: Re: [PATCH 0/3] recover hardware corrupted page by virtio balloon Content-Language: en-US To: David Hildenbrand , Peter Xu , Jue Wang , Paolo Bonzini Cc: Andrew Morton , jasowang@redhat.com, LKML , Linux MM , mst@redhat.com, =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org References: <24a95dea-9ea6-a904-7c0b-197961afa1d1@bytedance.com> <0d266c61-605d-ce0c-4274-b0c7e10f845a@redhat.com> From: zhenwei pi In-Reply-To: <0d266c61-605d-ce0c-4274-b0c7e10f845a@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: n56f8e8hsjeoeozwqxi6gejc671rujur Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Sh2IF70N; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf31.hostedemail.com: domain of pizhenwei@bytedance.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=pizhenwei@bytedance.com X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9FB9A2006C X-HE-Tag: 1653910617-475685 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/30/22 15:41, David Hildenbrand wrote: > On 27.05.22 08:32, zhenwei pi wrote: >> On 5/27/22 02:37, Peter Xu wrote: >>> On Wed, May 25, 2022 at 01:16:34PM -0700, Jue Wang wrote: >>>> The hypervisor _must_ emulate poisons identified in guest physical >>>> address space (could be transported from the source VM), this is to >>>> prevent silent data corruption in the guest. With a paravirtual >>>> approach like this patch series, the hypervisor can clear some of the >>>> poisoned HVAs knowing for certain that the guest OS has isolated the >>>> poisoned page. I wonder how much value it provides to the guest if the >>>> guest and workload are _not_ in a pressing need for the extra KB/MB >>>> worth of memory. >>> >>> I'm curious the same on how unpoisoning could help here. The reasoning >>> behind would be great material to be mentioned in the next cover letter. >>> >>> Shouldn't we consider migrating serious workloads off the host already >>> where there's a sign of more severe hardware issues, instead? >>> >>> Thanks, >>> >> >> I'm maintaining 1000,000+ virtual machines, from my experience: >> UE is quite unusual and occurs randomly, and I did not hit UE storm case >> in the past years. The memory also has no obvious performance drop after >> hitting UE. >> >> I hit several CE storm case, the performance memory drops a lot. But I >> can't find obvious relationship between UE and CE. >> >> So from the point of my view, to fix the corrupted page for VM seems >> good enough. And yes, unpoisoning several pages does not help >> significantly, but it is still a chance to make the virtualization better. >> > > I'm curious why we should care about resurrecting a handful of poisoned > pages in a VM. The cover letter doesn't touch on that. > > IOW, I'm missing the motivation why we should add additional > code+complexity to unpoison pages at all. > > If we're talking about individual 4k pages, it's certainly sub-optimal, > but does it matter in practice? I could understand if we're losing > megabytes of memory. But then, I assume the workload might be seriously > harmed either way already? > Yes, resurrecting a handful of poisoned pages does not help significantly. And, in some ways, it seems nice to have. :D A VM uses RAM of 2M huge page. Once a MCE(@HVAy in [HVAx,HVAz)) occurs, the 2M([HVAx,HVAz)) of hypervisor becomes unaccessible, but the guest poisons 4K (@GPAy in [GPAx, GPAz)) only, it may hit another 511 MCE ([GPAx, GPAz) except GPAy). This is the worse case, so I want to add '__le32 corrupted_pages' in struct virtio_balloon_config, it is used in the next step: reporting 512 * 4K 'corrupted_pages' to the guest, the guest has a chance to isolate the other 511 pages ahead of time. And the guest actually loses 2M, fixing 512*4K seems to help significantly. > > I assume when talking about "the performance memory drops a lot", you > imply that this patch set can mitigate that performance drop? > > But why do you see a performance drop? Because we might lose some > possible THP candidates (in the host or the guest) and you want to plug > does holes? I assume you'll see a performance drop simply because > poisoning memory is expensive, including migrating pages around on CE. > > If you have some numbers to share, especially before/after this change, > that would be great. > The CE storm leads 2 problems I have even seen: 1, the memory bandwidth slows down to 10%~20%, and the cycles per instruction of CPU increases a lot. 2, the THR (/proc/interrupts) interrupts frequently, the CPU has to use a lot time to handle IRQ. But no corrupted page occurs. Migrating VM to another healthy host seems a good choice. This patch does not handle CE storm case. -- zhenwei pi