From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD4F4C433E2 for ; Fri, 11 Sep 2020 09:12:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 472132087C for ; Fri, 11 Sep 2020 09:12:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 472132087C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4970C6B0070; Fri, 11 Sep 2020 05:12:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4479B6B0071; Fri, 11 Sep 2020 05:12:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35D816B0072; Fri, 11 Sep 2020 05:12:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id 1DC696B0070 for ; Fri, 11 Sep 2020 05:12:55 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D716D6C13 for ; Fri, 11 Sep 2020 09:12:54 +0000 (UTC) X-FDA: 77250215868.02.water98_2100360270ed Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id B366B100E110E for ; Fri, 11 Sep 2020 09:12:54 +0000 (UTC) X-HE-Tag: water98_2100360270ed X-Filterd-Recvd-Size: 4717 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Fri, 11 Sep 2020 09:12:54 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 85D8DABEA; Fri, 11 Sep 2020 09:13:08 +0000 (UTC) Date: Fri, 11 Sep 2020 11:12:52 +0200 From: Michal Hocko To: David Hildenbrand Cc: Dave Hansen , Gerald Schaefer , "akpm@linux-foundation.org" , Greg KH , Jan =?iso-8859-1?Q?H=F6ppner?= , Heiko Carstens , "linux-mm@kvack.org" , linux-api@vger.kernel.org, Dave Hansen , "linux-kernel@vger.kernel.org" Subject: Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? Message-ID: <20200911091252.GD7986@dhcp22.suse.cz> References: <3E00A442-7107-48DA-8172-EED95F6E1663@redhat.com> <20200911072035.GC7986@dhcp22.suse.cz> <02cdbf90-b29f-a9ec-c83d-49f2548e3e91@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <02cdbf90-b29f-a9ec-c83d-49f2548e3e91@redhat.com> X-Rspamd-Queue-Id: B366B100E110E X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 11-09-20 10:09:07, David Hildenbrand wrote: [...] > Consider two cases: > > 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to > online/offline the whole thing. HW can effectively only plug/unplug the > whole thing. It makes sense in some (most?) setups to represent one DIMM > as one memory block device. Yes, for the physical hotplug it doesn't really make much sense to me to offline portions that the HW cannot hotremove. > 2. Hot(un)plugging small memory increments. This is mostly the case in > virtualized environments - especially hyper-v balloon, xen balloon, > virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC, > you want at least all (16MB!) memory block devices that can get > unplugged again individually ("LMBs") as separate memory blocks. Same on > s390x on memory increment size (currently effectively the memory block > size). Yes I do recognize those usecase even though I will not pretend I consider it quesitonable. E.g. any hotplug with a smaller granularity than the memory model in Linus allows is just dubious. We simply cannot implement that without a lot of wasting and then the question is what is the real point. > In summary, larger memory block devices mostly only make sense with > DIMMs (and for boot memory in some cases). We will still end up with > many memory block devices in other configurations. And that is fine because the boot time memory is still likely the primary source of memory. And reducing memory devices for those is a huge improvement already (just think of a multi TB system with gazillions pointless memory devices). > I do agree that a "disable sysfs" option is interesting - even with > memory hotplug (we mostly need a way to configure it and a way to notify > kexec-tools about memory hot(un)plug events). I am currently (once > again) looking into improving auto-onlining support in the kernel. > > Having that said, I much rather want to see smaller improvements (that > can be fine-tuned individually - like allowing variable-sized memory > blocks) than doing a switch to "new shiny" and figuring out after a > while that we need "new shiny2". There is only one certainty. Providing a long term interface with ever growing (ab)users is a hard target. And shinyN might be needed in the end. Who knows. My main point is that the existing interface is hitting a wall on usecases which _do_not_care_ about memory hotplug. And that is something we should be looking at. > I consider removing "phys_device" as one of these tunables. The question > would be how to make such sysfs changes easy to configure > ("-phys_device", "+variable_sized_blocks" ...) I am with you on that. There are more candidates in memory block directories which have dubious value. Deprecation process is a PITA and that's why I thought that it would make sense to focus on something that we can mis^Wdesign with exising and forming usecases in mind that would get rid of all the cruft that we know it doesn't work (removable would be another one. I am definitely not going to insist and I appreciate you are trying to clean this up. That is highly appreciated of course. -- Michal Hocko SUSE Labs