From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D102AC61DA4 for ; Thu, 9 Feb 2023 22:23:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 696326B00B0; Thu, 9 Feb 2023 17:23:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 646576B00B1; Thu, 9 Feb 2023 17:23:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C0E86B00B2; Thu, 9 Feb 2023 17:23:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 39E246B00B0 for ; Thu, 9 Feb 2023 17:23:02 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E2A9E14080B for ; Thu, 9 Feb 2023 22:23:01 +0000 (UTC) X-FDA: 80449179762.01.7D51287 Received: from mail-ot1-f46.google.com (mail-ot1-f46.google.com [209.85.210.46]) by imf17.hostedemail.com (Postfix) with ESMTP id 615B34000C for ; Thu, 9 Feb 2023 22:22:59 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=XrNKJSt4; spf=pass (imf17.hostedemail.com: domain of viacheslav.dubeyko@bytedance.com designates 209.85.210.46 as permitted sender) smtp.mailfrom=viacheslav.dubeyko@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675981380; a=rsa-sha256; cv=none; b=I3VP2FV2a569vJX8Leu3gk8BCukyowoBToUilvmx0brAAEPafDu99FAq4qA6bGZvV9LUnf 7sYNzk+Xhe7cS4LazT57YqL2VeEO1Alw6WjgvDTxCb2DxXGFq/XifhUcM7/dPSRLzTakA1 7OEsPPgwYASRgGa4yje0tUwUs47FTJA= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=XrNKJSt4; spf=pass (imf17.hostedemail.com: domain of viacheslav.dubeyko@bytedance.com designates 209.85.210.46 as permitted sender) smtp.mailfrom=viacheslav.dubeyko@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675981380; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qFZR6Xfgk9KAiE3ijmG6Tw9Tq94CGVcZz40hqRYtxcY=; b=ALKSkFpI+oylgv4RxogeUVAPmfZ+w2eQv0lXv5MQmLGUiSXxUig3R4d1PWBYIiUMMuC9Ow eS4DAF4Os2rpURhB0AZqhUticz60voiJQ+aZAZgeHi+42XNvUkc5/plw+HCujXUp2Yno20 i0HgmVyXhrqQhpXhFBM1mBlrKJ+PEhM= Received: by mail-ot1-f46.google.com with SMTP id x26-20020a056830115a00b0068bbc0ee3eeso1043348otq.0 for ; Thu, 09 Feb 2023 14:22:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qFZR6Xfgk9KAiE3ijmG6Tw9Tq94CGVcZz40hqRYtxcY=; b=XrNKJSt4zO+Z7rgysrOQRgXXbshDNUZwdld4u1om2RtQiZ2w2GGsoTRPQRWxpgYvwq zFL3Do+0RPo3nmuG4UimZ7r2bnEKaafuAuIRgWn71jtOmPHfBOoxkhTHAssqmlyJx8X9 IhJIpy+UZSXPHD1Jaku9QiwxIGfx0zZcIg9S76TB6GAMMWBL7RHKE5r8QI1ve1u6sM9I JYdiosDsBB3lIchaRTmO12euDqBL8RJpjotXrM1bqsFcMqk5OoTySMREokSiMWJOUrfl pUjN0mxpq6jr7Gg22xErTcUm6/UY4Y34WZuUvqJTHCEdT/zyjjQH8DNKq/YPCdpSDpfp dh3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qFZR6Xfgk9KAiE3ijmG6Tw9Tq94CGVcZz40hqRYtxcY=; b=kXiOycuw7517lNNQScWTlf8WLrLMq+WjqSTnAX28NDIjz2as2R8AgGUL8G3nBDhYZs 1TACJot9/3baGSfQ98BpRzaArvquqr7ZgFnC/X7uqq1T7KZYWpE5Mc3kD55TmLbl0okz MkIMGs8V6UNyYOVsfjJTbwmt8P2Mdjw/8aL+4gjoIeQlml73RW2sIbgKuAX5Ml6/a5Rc LtarulrZItK1tTBRV+aj+WpaEadev4wFIuAPbyGLcBQAwKVRUsDw6Cw5vOmtha1q6LOg ALTn+UHCzo56M2Z0nCEsxf3r3azMWwN8WBnhKwnRtLwxTlfVgheahFFLDBS0dP7suX4Q Q4Pg== X-Gm-Message-State: AO0yUKWTSnGiTef1tU3mtXlO+F/3sL1LuXNtrd8Bt9pbopOBvTJJwAc3 t4nwt28NoM7qF8NFNRAzLhNVCQ== X-Google-Smtp-Source: AK7set/cRcC+MHYtZkFWioLDrvm6Ro5wqPLrkQKV/+pKTTs8tPoG+haFcrqrpcW2j8P4bjrB1a0ngA== X-Received: by 2002:a9d:65d3:0:b0:68e:d218:ee44 with SMTP id z19-20020a9d65d3000000b0068ed218ee44mr2092777oth.32.1675981378242; Thu, 09 Feb 2023 14:22:58 -0800 (PST) Received: from smtpclient.apple (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id y3-20020a056830108300b00684e4d974e6sm1306690oto.24.2023.02.09.14.22.56 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Feb 2023 14:22:57 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\)) Subject: Re: [External] [LSF/MM/BPF TOPIC] CXL Fabric Manager (FM) architecture From: "Viacheslav A.Dubeyko" In-Reply-To: <20230209221045.GA416928@bgt-140510-bm01> Date: Thu, 9 Feb 2023 14:22:45 -0800 Cc: Jonathan Cameron , "lsf-pc@lists.linux-foundation.org" , "linux-mm@kvack.org" , "linux-cxl@vger.kernel.org" , Dan Williams , Cong Wang , Viacheslav Dubeyko Content-Transfer-Encoding: quoted-printable Message-Id: <4E272B5A-5952-4680-94A6-14F588ACA7C4@bytedance.com> References: <7F001EAF-C512-436A-A9DD-E08730C91214@bytedance.com> <20230131174115.00007493@Huawei.com> <5671D3B3-83B3-49FF-A662-509648E6D297@bytedance.com> <20230202095402.0000585d@Huawei.com> <20230208163844.GA407917@bgt-140510-bm01> <7E864E85-A36F-487B-8B70-C8C49FBECD73@bytedance.com> <20230209221045.GA416928@bgt-140510-bm01> To: Adam Manzanares X-Mailer: Apple Mail (2.3731.300.101.1.3) X-Rspam-User: X-Rspamd-Queue-Id: 615B34000C X-Rspamd-Server: rspam01 X-Stat-Signature: ir5ofpimjtzjtoknimihusdarsczpbm9 X-HE-Tag: 1675981379-387673 X-HE-Meta: U2FsdGVkX18T4xd9yYwm8RSWgFASoHAHJsVDkeKKE93gBLUOxOPI+GAh8HPe+mU7lryRYbPJ007o4DFKujSJ5uKtN6yD+Abo/ZminrjjCU060txwVbwFaGKCupW/zSBNboV24znJ+sJLHf18AM3fKIcZZN9ljcJWTWrGKrHYRF1nWwo3xzaVVhtfE9lQ5Fxh5KPAmbQD2+NxMc9A+O9LksHLxbQgJgAbYA1XyX5P3HHXoCbEPHDPj60zGwKsprzcA2lIs2HppMTLUnVM3wPRwXyx/pP2Q8XI1zmk05OcOufehB7ng6JK9QOsSYDcfN5v9Av/eOgd643OTDK5TYFctnwP+AGuuFHyajBOUSxJ6j1SbcORBfAwZARfS0Ypsl07duXCC4rPLpM2C++OOBBzifZQkuTUp7y2wHjHlNVtsNiLNV9dNns8cZAck4DVkPRhoHWzkI8M4XmsipdD/T5Ty/Aq85vcGsDBBlYHgp8nvW2s1YCKkc9uB2IxptXBCFeRgYkmzxup4xc6Ra6FAJIynLAKBAn/VWqOWYFJGC2N6UNb2Gtkgi4dWcgFO7o1xGGYvsi/3R8fhb9R+RgbCYo3pe2Bbeh4bIEw27dTEOmyUXJtUER4VcB4fvJbDyxrC/kRSMtWfZagv0DduCST3IMaCRYK0Zl9015BcG2l+7unQrAWDVZntyBA7bKG95FWAtwIKsZT9dpJ0hp2pv5v56nJd4vxK17x73ZDqwsSFoATLspMrEvijKO630CECzFv02mXV9B9yd6+ZwMJ2FbeAbJTLQQWaQNPf29RlfdaF6CGxma82GB/VF6qeC5jpxwlMi6QknjGa1kIfOx6YXvMZG1JugXVA8h3IwA6ZIF1u8k/j37G3ySbWSQwdeJHzDR51Xwl4edgOSVXWFRjqvTw5LwpyQT6dWSJ6/7dTJFYbFnQ7pLiBpZU3QcwnHAmBJWNUNp693+2UYo8UtJ4PLeO+ZZ Mzbb99VF 7Bml3nCr/ajAnWvVKFCz6vzasie1HKx5SC/34FI7CiRXWktiHE6L8oFNGSsJWj2Waaj09nq/s7cHMpJ+Vlt4bJHpru64SBKro99YACcXHVaHu3Sp02I8w7KHXnp8oIcAbv4nmFgRznGH9CDb+jsMbTJ63bojPwuFmlbEsJy3+lj+Uewu9ZkjAqwkJzN39rOUccveNbvKb1Zpf2vE8B6MOFO2z5t/HKjHdw8tPnj4uZSI0U8YsTqfGDGr/7MflC+8L1ghKxPbhjitQVyoj8ZMCIct4zMIYxABYyjLxME90HkhBAkVhgkFJWzO0ggOKe7Ej3M38aAtPb4wUH89sDs9GFZfodns5Q+2G6wxlfbaP9OGfSVi1RuL32zuIURVnnkuYv9BK/c6pJeo1Fcnk16FjYp6wrkjV0cqQyoS+CaBxh9rSI+HluhYCUBAwaD87mKfqbWkZa7KPSBIx7lXMQ0fGppDfscdYct3hhmQ2YPLqcOsYUFW1wsGlKXZ4lit6tdRfN233yvmf9cmCH7irfj/X43XpwGNq0lSUiI2/XbQZuo5DgPOD6gjc3kyeycPae0lWdG0pO2+X2o8yrDk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Feb 9, 2023, at 2:10 PM, Adam Manzanares = wrote: >=20 > On Wed, Feb 08, 2023 at 10:03:57AM -0800, Viacheslav A.Dubeyko wrote: >>=20 >>=20 >>> On Feb 8, 2023, at 8:38 AM, Adam Manzanares = wrote: >>>=20 >>> On Thu, Feb 02, 2023 at 09:54:02AM +0000, Jonathan Cameron wrote: >>>> On Wed, 1 Feb 2023 12:04:56 -0800 >>>> "Viacheslav A.Dubeyko" wrote: >>>>=20 >>>>>>=20 >>=20 >> >>=20 >>>>>=20 >>>>> Most probably, we will have multiple FM implementations in = firmware. >>>>> Yes, FM on host could be important for debug and to verify = correctness >>>>> firmware-based implementations. But FM daemon on host could be = important >>>>> to receive notifications and react somehow on these events. Also, = journalling >>>>> of events/messages/events could be important responsibility of FM = daemon >>>>> on host.=20 >>>>=20 >>>> I agree with an FM daemon somewhere (potentially running on the BMC = type chip >>>> that also has the lower level FM-API access). I think it is = somewhat >>>> separate from the rest of this on basis it may well just be talking = redfish >>>> to the FM and there are lots of tools for that sort of handling = already. >>>>=20 >>>=20 >>> I would be interested in particpating in a BOF about this topic. I = wonder what >>> happens when we have multiple switches with multiple FMs each on a = separate BMC. >>> In this case, does it make more sense to have an owner of the global = FM state=20 >>> be a user space application. Is this the job of the orchestrator? >>>=20 >>> The BMC based FM seems to have scalability issues, but will we hit = them in >>> practice any time soon. >>=20 >> I had discussion recently and it looks like there are interesting = points: >> (1) If we have multiple CXL switches (especially with complex = hierarchy), then it is >> very compute-intensive activity. So, potentially, FM on firmware side = could be not >> capable to digest and executes all responsibilities without potential = performance >> degradation. >> (2) However, if we have FM on host side, then there is security = concerns because >> FM sees everything and all details of multiple hosts and subsystems. >> (3) Technically speaking, there is one potential capability that = user-space FM daemon >> can run as on host side as on CXL switch side. I mean here that if we = implement >> user-space FM daemon, then it could be used to execute FM = functionality on CXL >> switch side (maybe????). :) >>=20 >> >>=20 >>>>>>> - Manage surprise removal of devices =20 >>>>>>=20 >>>>>> Likewise, beyond reporting I wouldn't expect the FM daemon to = have any idea >>>>>> what to do in the way of managing this. Scream loudly? >>>>>>=20 >>>>>=20 >>>>> Maybe, it could require application(s) notification. Let=E2=80=99s = imagine that application >>>>> uses some resources from removed device. Maybe, FM can manage = kernel-space >>>>> metadata correction and helping to manage application requests to = not existing >>>>> entities. >>>>=20 >>>> Notifications for the host are likely to come via inband means - so = type3 driver >>>> handling rather than related to FM. As far as the host is = concerned this is the >>>> same as case where there is no FM and someone ripped a device out. >>>>=20 >>>> There might indeed be meta data to manage, but doubt it will have = anything to >>>> do with kernel. >>>>=20 >>>=20 >>> I've also had similar thoughts, I think the OS responds to = notifications that >>> are generated in-band after changes to the state of the FM are made = through=20 >>> OOB means. >>>=20 >>> I envision the host sends REDFISH requests to a switch BMC that has = an FM >>> implementation. Once the changes are implemented by the FM it would = show up >>> as changes to the PCIe hierarchy on a host, which is capable of = responding to >>> such changes. >>>=20 >>=20 >> I think I am not completely follow your point. :) First of all, I = assume that if host >> sends REDFISH request, then it will be expected the confirmation of = request execution. >> It means for me that host needs to receive some packet that informs = that request >> executed successfully or failed. It means that some subsystem or = application requested >> this change and only after receiving the confirmation requested = capabilities can be used. >> And if FM is on CXL switch side, then how FM will show up the = changes? It sounds for me >> that some FM subsystem should be on the host side to receive = confirmation/notification >> and to execute the real changes in PCIe hierarchy. Am missing = something here? >=20 > Hopefully I have a point ;). I do expect a host to receive a response = for a > given REDFISH request, but the request/response would be OOB. I would = go back > to the example of hot plugging in a PCIe based devices. For example if = an nvme > SSD is hot plugged, then the OS notified by HW that a new PCIe device = has been > added. Going back to changes made by the FM, if the changes impact the = CXL > hiearchy that is visible to a host, it is my expectation that the host = OS will > be informed of the changes requested of the FM when the host HW = becomes aware > of the changes (the in-band change). >=20 You are right if we talk about hardware directly connected to the host. = It means that CPU (or any other hardware subsystem) can receive interrupt and kernel = can process this hardware change. But FM can be remote and be shared by multiple = hosts. In such case, we need to have some software subsystem on host(s) side = that can execute polling or expects to receive network packet with notification = or confirmation of the change. Or we need to have some hardware subsystem on every host = that can interact with remote FM in the background and issues the interrupt = locally with the goal to refresh kernel metadata. Thanks, Slava.