From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E96F1E64AB5 for ; Tue, 3 Dec 2024 14:33:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E5356B00A9; Tue, 3 Dec 2024 09:33:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 793B36B00AA; Tue, 3 Dec 2024 09:33:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 634486B00AB; Tue, 3 Dec 2024 09:33:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 42C9B6B00A9 for ; Tue, 3 Dec 2024 09:33:19 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AB7C5809BD for ; Tue, 3 Dec 2024 14:33:18 +0000 (UTC) X-FDA: 82853890056.30.53D0035 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf23.hostedemail.com (Postfix) with ESMTP id BA6CD14001D for ; Tue, 3 Dec 2024 14:33:07 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=VwSoI1Ax; spf=pass (imf23.hostedemail.com: domain of sumanthk@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=sumanthk@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733236383; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rt42MDgeK03o2gsfWT114x3hcN+2YJ+aVG8pLK9z4/A=; b=fiZWYbhhHAgTaSV8+FE9H3HJxfJbsl9L1GH2DACx/U7hbmr4tI6jUOa8A2OFc1pUNC3H2N yxVhZGg2LYB55qXWSZOZb+tx1Sa+H1VFXD7HqV9woGasO4QUwS4wbHF1QOrx8gqQT0UZmY MqptSz8GhZouzx73ewNkpLlGkVi4Lxg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733236383; a=rsa-sha256; cv=none; b=AsILaJk/60LDnj9yXEg4RMDMXEp1qjrO1xqGd1IvT730Zg4i6NK656v16RsuRDRzpDnGR/ GRfQf5jkWzr5spSe3wQR2EvcZEB6QP2Ca9I4bu9czNGxwDJPdZOum1jAW9Hhdoo9Bq6f8p Tv03/2EF4hiPHusPapn/SKr7oW8XvVA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=VwSoI1Ax; spf=pass (imf23.hostedemail.com: domain of sumanthk@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=sumanthk@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4B3CrBHM011897; Tue, 3 Dec 2024 14:33:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pp1; bh=rt42MDgeK03o2gsfWT114x3hcN+2YJ +aVG8pLK9z4/A=; b=VwSoI1Ax8xzUJrCv4kItpgZ+9TMC+RdXwDDo5mZh1RktEu od70T4XW5SjDdAoE6V36/yF/P8O8uDlu7p9jQWLd/R5y+SKkzOustiJbLlfeLO8z ttYnzDiOW2/JF+n6kQdizDS5n4i+4Xxd5PdXx4DLr9yCxoidEUWpbQ78+fTJhaKz d/GVtdeB2My+RYTqj1xm0Lf5zuNKzQC1idobT5Fo7w6GGnPuPlqDpbQP90gpVuhK 7yujaQ7mFaIIFfyf/bHtS+cZDZVlDEc4ChldiqYExt06kBR1EXHhVdbLMKsVfPTb UwQEKggbY4gEFDsv1cqb6jdsP1qAxjtjbTs4s1DA== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 437s4j28df-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 03 Dec 2024 14:33:15 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 4B3D0RMK031726; Tue, 3 Dec 2024 14:33:14 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 438ehkwb6r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 03 Dec 2024 14:33:14 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4B3EXAuR31588924 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 3 Dec 2024 14:33:11 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D22482004B; Tue, 3 Dec 2024 14:33:10 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 23D0A20040; Tue, 3 Dec 2024 14:33:10 +0000 (GMT) Received: from li-2b55cdcc-350b-11b2-a85c-a78bff51fc11.ibm.com (unknown [9.171.16.180]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTPS; Tue, 3 Dec 2024 14:33:10 +0000 (GMT) Date: Tue, 3 Dec 2024 15:33:08 +0100 From: Sumanth Korikkar To: David Hildenbrand Cc: linux-mm , Andrew Morton , Oscar Salvador , Gerald Schaefer , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , linux-s390 , LKML Subject: Re: [RFC PATCH 1/4] mm/memory_hotplug: Add interface for runtime (de)configuration of memory Message-ID: References: <20241202082732.3959803-1-sumanthk@linux.ibm.com> <20241202082732.3959803-2-sumanthk@linux.ibm.com> <3151b9a0-3e96-4820-b6af-9f9ec4996ee1@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3151b9a0-3e96-4820-b6af-9f9ec4996ee1@redhat.com> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 9ItduTtDKXqtELposVLLu0zuhPtcE-6J X-Proofpoint-ORIG-GUID: 9ItduTtDKXqtELposVLLu0zuhPtcE-6J X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 suspectscore=0 adultscore=0 mlxscore=0 mlxlogscore=596 lowpriorityscore=0 bulkscore=0 impostorscore=0 phishscore=0 spamscore=0 malwarescore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2411120000 definitions=main-2412030124 X-Rspamd-Queue-Id: BA6CD14001D X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: 316edrcpihwtx5acj1ga18p7fcfjqcn5 X-HE-Tag: 1733236387-650495 X-HE-Meta: U2FsdGVkX196vCvKymR/lvgRBfQ2NfJVz0xtW9Iu2T+CYQ4+t4H/fmsUJ8jYromFXAP0Dv340wRRqLjdsfR2o6nIp6BTVxJmLWcbVb2Ow95eHRpCjZupThCbr5hq/byfo4RRd52f35sk/uOoJxJbfUsxIdbqCigSYPr+axmi5ABf89O1ApO4v+RZ8zA9VxRqToDyAQuu8LQh0SW/S2gXcXKrVbMjnU9TrSTXdEW/mLs+1bpnsQk3AgkghqXa0bEseZf4D7AG1JE3agOSD9clADghl06Qlz57Tb0gCxGcox5pr3rgtPo+xc9GEXyU0PvskKDm4pGjbmSvzoeX1MNgUZv3kHRnxkiWcEVagz2F9KcVAjpqC1UTqG4oB6/sbhgCnK8Oqepgo9hnnQMx7NCFLePJh1IN71Yxzuo5PjB+PBkf1n5vzwIKxyDxeaing6F+3SWt5BQoM+WDYjllRcdPhGLbPgcIY80MEvOLLArk+xcpfd6T1MYeebvEQXGv1VMM09pkVNqBYlA7hRTuQ4X2v3YCnX4RY09H8jVBsKVhuJVm4Vbzeoj5T1oqra6BxXg3AFnqeLcQqULgGi4xNaqT1a3o9KR/UD4fmz570lSUJgOzyLhGLhcypGI6RPZ4nkFLNKETS4ozcw8bnO+knRsbKRUlMFLegfHo7tQtM5x6BZBA+i7//KI9tBChJ/j2o0CuluUhbZ/I3D4FBnQnqlMIUAIDqGKXuQYqMRkLX30Cw/NwU6ZQW2THmh6EPyax38Rr9PNrNen8+CWclmUy5MBfwFKlcvje7vBV/OTTNlwSV1hHSDsEHzbrehns/XbMmU+Kg0ou/zWf1sjETXg30zMIecDARF05eGDmw57snPXZLZ7FiPsKoNNeITzGg72IpntsEK011CwiigK57tyfT0up/Eq9MRfjfomifi+zz8xGFSvgbar1kFI1UUPOZimKo1Vs3lIqiyW5sFsJrwaodfQ 5snhPw1X LginSQRCNf+AE7w8rdO02gmOjvQQRr5kRlww+OLXmER7wCjMhNNVCCVWuJDcQButCerrtrNksf6oDGZpSGF0p8Ou33uGkBjZl9xGYdDAS8cN/WIUaSASB+dpJZTeLiqfdJcu3LlwkIgDvRR+cMHfeyxsvwsip7HrHqT8HM+KFNJ4S+k15C3R7cg1YA2R+hLOde08I6KIiGS/XbpJRAaib7GesNxz0fWIwhkV+q46izDLko7nWohWywgLnv/W8Zi5HNvoc+2nxpbglFkec2KJoyOgqcQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 02, 2024 at 05:55:19PM +0100, David Hildenbrand wrote: > Hi! > > Not completely what I had in mind, especially not that we need something > that generic without any indication of ranges :) > > In general, the flow is as follows: > > 1) Driver detects memory and adds it > 2) Something auto-onlines that memory (e.g., udev rule) > > For dax/kmem, 1) can be controlled using devdax, and usually it also tries > to take care of 2). > > s390x standby storage really is the weird thing here, because it does 1) and > doesn't want 2). It shouldn't do 1) until a user wants to make use of > standby memory. Hi David, The current rfc design doesnt do 1) until user initiates it. The current rfc design considers the fact that there cannot be memory holes, when there is a availability of standby memory. (which holds true for both lpars and zvms) With number of online and standby memory ranges count (max_configurable), prototype lsmem/chmem could determine memory ranges which are not yet configured i.e. (configurable_memory = max_configurable - online ranges from sysfs /sys/devices/system/memory/memory*). Example prototype implementation of lsmem/chmem looks like: ./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP RANGE SIZE STATE BLOCK ALTMAP 0x0000000000000000-0x00000002ffffffff 12G online 0-95 0 0x0000000300000000-0x00000003ffffffff 4G deconfigured 96-127 - # Configure range with altmap ./chmem -c 0x0000000300000000-0x00000003ffffffff -a ./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP RANGE SIZE STATE BLOCK ALTMAP 0x0000000000000000-0x00000002ffffffff 12G online 0-95 0 0x0000000300000000-0x00000003ffffffff 4G offline 96-127 1 # Online range ./chmem -e 0x0000000300000000-0x00000003ffffffff && ./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP RANGE SIZE STATE BLOCK ALTMAP 0x0000000000000000-0x00000002ffffffff 12G online 0-95 0 0x0000000300000000-0x00000003ffffffff 4G online 96-127 1 Memory block size: 128M Total online memory: 16G Total offline memory: 0B Total deconfigured: 0B # offline range ./chmem -d 0x0000000300000000-0x00000003ffffffff && ./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP RANGE SIZE STATE BLOCK ALTMAP 0x0000000000000000-0x00000002ffffffff 12G online 0-95 0 0x0000000300000000-0x00000003ffffffff 4G offline 96-127 1 Memory block size: 128M Total online memory: 12G Total offline memory: 4G Total deconfigured: 0B # Defconfigure range. ./chmem -g 0x0000000300000000-0x00000003ffffffff && ./lsmem -o RANGE,SIZE,STATE,BLOCK,ALTMAP RANGE SIZE STATE BLOCK ALTMAP 0x0000000000000000-0x00000002ffffffff 12G online 0-95 0 0x0000000300000000-0x00000003ffffffff 4G deconfigured 96-127 - Memory block size: 128M Total online memory: 12G Total offline memory: 0B Total deconfigured: 4G The user can still determine the available memory ranges and make them configurable using tools like lsmem or chmem with this approach atleast on s390 with this approach. > My thinking was that s390x would expose the standby memory ranges somewhere > arch specific in sysfs. From there, one could simply trigger the adding > (maybe specifying e.g, memmap_on_memory) of selected ranges. As far as I understand, sysfs interface limits the size of the buffer used in show() to 4kb. When there are huge number of standby memory ranges, wouldnt it be an issue to display everything in one attribute? Or use sysfs binary attributes to overcome the limitation? Please correct me, If I am wrong. Questions: 1. If we go ahead with this sysfs interface approach to list all standby memory ranges, could the list be made available via /sys/devices/system/memory/configurable_memlist? This could be helpful, as /sys/devices/system/memory/configure_memory performs architecture independent checks and could also be useful for other architectures in the future. 2. Whether the new interface should also be compatible with lsmem/chmem? 3. OR can we have a s390 specific path (eg: /sys/firmware/memory/standy_range) to list all standby memory range which are in deconfigured state and also use the current design (max_configurable) to make it easier for lsmem/chmem tool to detect these standby memory ranges? > To disable standby memory, one would first offline the memory to then > trigger removal using the arch specific interface. It is very similar to > dax/kmem's way of handling offline+removal. ok > Now I wonder if dax/kmem could be (ab)used on s390x for standby storage. > Likely a simple sysfs interface could be easier to implement. I havent checked dax/kmem in detail yet. I will look into it. Thank you