From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A959AC433FE for ; Wed, 10 Nov 2021 11:21:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6226F61055 for ; Wed, 10 Nov 2021 11:21:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6226F61055 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C6D796B006C; Wed, 10 Nov 2021 06:21:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C1D3B6B0071; Wed, 10 Nov 2021 06:21:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABEBD6B0072; Wed, 10 Nov 2021 06:21:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0109.hostedemail.com [216.40.44.109]) by kanga.kvack.org (Postfix) with ESMTP id 98FAE6B006C for ; Wed, 10 Nov 2021 06:21:40 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 56470813C1BF for ; Wed, 10 Nov 2021 11:21:40 +0000 (UTC) X-FDA: 78792780192.19.A6D4F77 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 10DAF50985A4 for ; Wed, 10 Nov 2021 11:21:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636543299; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/3cfEC9aiivP2bx7O5xROQIw7YWNfjoAbKFhm+axhP4=; b=dC3l8tSssRi3tqCTvQGAYEQlKAU9qUtH4xjaHB1KrvRbtwphCw3atXHBOcIlz81EWFJkpr agVdmZ501s3ilDEXrjfMW2DEBI1FaYb6rsWWDSDoZfLrDKUozLTKuNNLIu6n9d88HqbqPp uOR5j88/MPu1HDuNDyf1bp4oX/8ryBY= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-41-OSwpaKnQMN-z6XbmVf1Rtg-1; Wed, 10 Nov 2021 06:21:38 -0500 X-MC-Unique: OSwpaKnQMN-z6XbmVf1Rtg-1 Received: by mail-wm1-f70.google.com with SMTP id 145-20020a1c0197000000b0032efc3eb9bcso2969178wmb.0 for ; Wed, 10 Nov 2021 03:21:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=/3cfEC9aiivP2bx7O5xROQIw7YWNfjoAbKFhm+axhP4=; b=lx1t3VRJNTxFml4im4KFd35JQfZJTz345ta8DceJw2TGXSbS7ffWvKwaCh0UFAR71X GOyH5UnDa2aUcmS8PgJfLBV/U+csm3p9vjWug4JjJvw25gStiESwFuwOWkBxbBVSHScS c1Iiksm1c7JxstA3AEbSKlEAbJRO1WyflLUK+KV0tVoFSR/f+PVLTi9gz5ErmZXP/7ZX wGvooXSBW8c0L6r/7YQmxVtIC1CN3inZTGN26wSD7k2YgcFfGslkdyZLdJCJt0nZ27Bd DutHygmoJKMY4fPrk+IMhFVzdK8a1EL1+OepzxNazAAYJvcszanvOylsZRQX6B8cF0+0 nZaw== X-Gm-Message-State: AOAM532FKpOml1/mnaMz61k+/xLnkUV+qd6InP5LmF9PFpAdHbiL9DKs TPMblfV6xCUfVyECZiRO5Lk0sS6rZMFgToPyYkkkCodjoOgB1btivvMCR/g4cTmXk4V1OA159pB AYSGNS4AS3yY= X-Received: by 2002:adf:ce8b:: with SMTP id r11mr18892084wrn.294.1636543297240; Wed, 10 Nov 2021 03:21:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJzICESowAidUawEb3N55qUG2NJVnuv8Dn3U4BY1k74xFI+2SU+f/eT2ffnmDJWOIOi0xf8+yg== X-Received: by 2002:adf:ce8b:: with SMTP id r11mr18892018wrn.294.1636543296977; Wed, 10 Nov 2021 03:21:36 -0800 (PST) Received: from [192.168.3.132] (p5b0c604f.dip0.t-ipconnect.de. [91.12.96.79]) by smtp.gmail.com with ESMTPSA id z12sm22060569wrv.78.2021.11.10.03.21.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 10 Nov 2021 03:21:36 -0800 (PST) Message-ID: <1cbc6332-8a45-3af1-c648-99437819bb5a@redhat.com> Date: Wed, 10 Nov 2021 12:21:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 To: Dave Young Cc: Baoquan He , boris.ostrovsky@oracle.com, bp@alien8.de, Andrew Morton , hpa@zytor.com, jasowang@redhat.com, jgross@suse.com, linux-mm@kvack.org, mhocko@suse.com, mingo@redhat.com, mm-commits@vger.kernel.org, mst@redhat.com, osalvador@suse.de, rafael.j.wysocki@intel.com, rppt@kernel.org, sstabellini@kernel.org, tglx@linutronix.de, torvalds@linux-foundation.org, vgoyal@redhat.com References: <20211108183057.809e428e841088b657a975ec@linux-foundation.org> <20211109023148.b1OlyuiXG%akpm@linux-foundation.org> <20211110072225.GA18768@MiWiFi-R3L-srv> <0c68b366-38f4-94fd-da11-57e40a44cb48@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [patch 08/87] proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 10DAF50985A4 X-Stat-Signature: 3nu5yitm49d7w6i9y7ufam9jhaxbsge4 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dC3l8tSs; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf01.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1636543285-251007 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10.11.21 12:11, Dave Young wrote: > Hi David, > On 11/10/21 at 09:10am, David Hildenbrand wrote: >> On 10.11.21 08:22, Baoquan He wrote: >>> On 11/08/21 at 06:31pm, Andrew Morton wrote: >>>> From: David Hildenbrand >>>> Subject: proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks >>>> >>>> Let's support multiple registered callbacks, making sure that registering >>>> vmcore callbacks cannot fail. Make the callback return a bool instead of >>>> an int, handling how to deal with errors internally. Drop unused >>>> HAVE_OLDMEM_PFN_IS_RAM. >>>> >>>> We soon want to make use of this infrastructure from other drivers: >>>> virtio-mem, registering one callback for each virtio-mem device, to >>>> prevent reading unplugged virtio-mem memory. >>>> >>>> Handle it via a generic vmcore_cb structure, prepared for future >>>> extensions: for example, once we support virtio-mem on s390x where the >>>> vmcore is completely constructed in the second kernel, we want to detect >>>> and add plugged virtio-mem memory ranges to the vmcore in order for them >>>> to get dumped properly. >>>> >>>> Handle corner cases that are unexpected and shouldn't happen in sane >>>> setups: registering a callback after the vmcore has already been opened >>>> (warn only) and unregistering a callback after the vmcore has already been >>>> opened (warn and essentially read only zeroes from that point on). >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> I am fine with the whole patch except of one concern. As above sentence >>> underscored states, if a callback is unregistered when vmcore has been >>> opened, it will read out zeros from that point on. And it's done by >>> judging global variable 'vmcore_cb_unstable' in pfn_is_ram(). This will >>> cause vmcore dumping in makedumpfile only being able to read out zero >>> page since then, and may cost long extra time to finish. >>> >>> Please see remap_oldmem_pfn_checked(). In makedumpfile, we default to >>> mmap 4M memory region at one time, then copy out. With this patch, and if >>> vmcore_cb_unstable is true, kernel will mmap page by page. The extra >>> time could be huge, e.g on machine with TBs memory, and we only get a >>> useless vmcore because of loss of core data with high probability. >> >> Thanks Baoquan for the quick review! >> >> This code is really just to handle the unlikely case of a driver getting >> unbound from a device that has a callback registered (e.g., a >> virtio-mem-pci device). Something like this will never happen in >> practice in a *sane* environment. >> >> The only known way I know is if userspace manually unbinds the driver >> from a virtio-mem-pci device -- which is possible but especially in a >> kdump environment something without any sane use case. In that case, we'll >> >> pr_warn_once("Unexpected vmcore callback unregistration\n"); >> >> to let user space know that something weird/unsupported is going on. >> >> Long story short: if user space does something nasty, I don't see a >> problem in some action taking a little longer. >> >> >>> >>> I am thinking if we can simply panic in the case, since the left dumping >>> are all zeroed, very likely the vmcore is unavailable any more. >> >> IMHO panic() is a little bit too much. Instead of returning zeroes, we >> could fail the read/mmap operation -- I considered that as an option >> when I crafted/tested this patch, however, this approach here turned out >> to be the easiest way to handle something that's really not >> supported/advised and won't really happen in a sane environment. > > I would still say that the most important task for kdump is to save the > vmcore successfully. Even the above issue is not a common case it could > cause the vmcore to be useless. It is understandable if the zeroed part > is only the virtio-mem part, but if all the remaining vmcore is zeroed > that it is bad and not acceptable for kdump. Again, in a sane environment this will never happen. Why are we discussing on how to optimize a scenario where user space does something that's clearly unsupported and will not happen in real life? My take is to warn and fail as simple as possible, without hacking around the issue (like blocking driver unloading while user space has /proc/vmcore opened. "remaining vmcore is zeroed that it is bad and not acceptable for kdump." Which scenario are you concerned about? User space plays stupid games (unbining a driver from a virtio-mem device in a *kdump kernel* after opening /proc/vmcore) and wins stupid prices (a warning and a vmcore filled (partially) with zeroes). Why isn't a warning sufficient for something like that? I appreciate all the feedback (even if it comes in late :) ), but I'm missing why we are trying to optimize something here. I'm happy to send a patch that does whatever we decide to do, but I really don't see the need for a change. Most probably I'm missing something important? (the patch landed mainline in the meantime) -- Thanks, David / dhildenb