From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE01EC3DA78 for ; Tue, 17 Jan 2023 15:33:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 888A66B0078; Tue, 17 Jan 2023 10:33:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 837CB6B007B; Tue, 17 Jan 2023 10:33:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FF826B007D; Tue, 17 Jan 2023 10:33:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 604F26B0078 for ; Tue, 17 Jan 2023 10:33:32 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 250F51C5FA9 for ; Tue, 17 Jan 2023 15:33:32 +0000 (UTC) X-FDA: 80364685464.10.61FB30C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 9C4B7C000D for ; Tue, 17 Jan 2023 15:33:29 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IE+1QX5n; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673969610; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lpLspA5PSonBw/vmV1OrZLpzelfm/8/OIBNoQk0pktg=; b=bKeiCmEjd3pfk1z2uQrio92PtRIL1qUnYau/1qyPkWZlI+2RZ6Csmo0/kMrK5EyXYGxznn k2XPpuR/bra99uehOVvQpJ5vFnBwJsv96vR0oYkHxtyHPpVuixZZF3i+35abagB7lE5YIk V11YdgWT1F9nQi6QE1qHBLFSx/eThgA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IE+1QX5n; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673969610; a=rsa-sha256; cv=none; b=8e/zIooLVCFtKcYBEH5+B8xYFvtVDEglvoyAeBcNa3t8sJUlHekzTEmjYtNdwnN0voSQuD 1UuWmTSZU+s+LJa5Y90lRw248X1LN5lKcLBFMX3iUUZHDln59nL5C53INTgwBr2HpracW3 Ck4PMJZn2iHyIlZssrJtMe9GwnlFT64= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673969609; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lpLspA5PSonBw/vmV1OrZLpzelfm/8/OIBNoQk0pktg=; b=IE+1QX5np+VLT7YxawtLC2NOkO5x/fZw6tGxPpStq3iLYxFVOUDlFEXRLnQPCw6N14Q0tG FIsIQmL+TEJwGW2K6Yrpoj9dv5DJ6CH18DRH3hgfS/578IdJuXnjSeS/eQYXCu7JIJsgfE kACD4VbEZEQvs4xDSodSOvD1VMyzhXc= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-149-dNawztU_PtGSH0HZ0zf2jw-1; Tue, 17 Jan 2023 10:33:25 -0500 X-MC-Unique: dNawztU_PtGSH0HZ0zf2jw-1 Received: by mail-wr1-f69.google.com with SMTP id i2-20020adfaac2000000b002bc40f98167so5459223wrc.2 for ; Tue, 17 Jan 2023 07:33:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lpLspA5PSonBw/vmV1OrZLpzelfm/8/OIBNoQk0pktg=; b=X6gfWS+iRDElqBR3sfi6m1vJc/5YTR/B6ruvQqxCHi3nJy2weTytG1LG9mV6K9JEEp mRF42CKxU5uAIRm2BOZNB7q6tu2Pxe8MgXCOJuWH5mcSisijEmfNCVlNnXYKI1IPxw48 QBWfYG+Lf/b0Yubfw77vJLNOnYiSwjH/I8fvA/pQHRcMIpl/irBq11ZIzznnPf6nDgRb p88H7yJ6XrrBtAGxxa4GkAmHA4+o3v9gulTSOGq6R1mOCTyVsMhV5FjaaJsZZwruPU+J xavIBUjxt13DEK7S8aXO8dyysLAe2IcvPO+KNk0+NrQgWQ1UgJyGsOABcpz5TUo+Jo1P iQBw== X-Gm-Message-State: AFqh2kortFpCOH3/AB9E4gM2EYn6QxTFIkiNLhMsuc5+08vVkz2qKBJ+ kUEhHu2Wm4klmmo/iXTeTpguze8LVzW527WKNlingEgRK/8lhttiCKB7DPH0PmdDGGvHlsaLfSS RTriuwdhU/jk= X-Received: by 2002:a05:600c:3083:b0:3da:e4d:e6ba with SMTP id g3-20020a05600c308300b003da0e4de6bamr3462890wmn.14.1673969584684; Tue, 17 Jan 2023 07:33:04 -0800 (PST) X-Google-Smtp-Source: AMrXdXvCh6pUXHdET5pfuUv9PvBeMbDsI9H3dt7KkiP1Worcaf06CcwLqp9GwiSRlxDjBnrsjANXng== X-Received: by 2002:a05:600c:3083:b0:3da:e4d:e6ba with SMTP id g3-20020a05600c308300b003da0e4de6bamr3462862wmn.14.1673969584377; Tue, 17 Jan 2023 07:33:04 -0800 (PST) Received: from ?IPV6:2003:cb:c708:f00:323e:5956:8da1:9237? (p200300cbc7080f00323e59568da19237.dip0.t-ipconnect.de. [2003:cb:c708:f00:323e:5956:8da1:9237]) by smtp.gmail.com with ESMTPSA id bi6-20020a05600c3d8600b003d9df9e59c4sm34116110wmb.37.2023.01.17.07.33.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Jan 2023 07:33:03 -0800 (PST) Message-ID: Date: Tue, 17 Jan 2023 16:33:02 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: Sudarshan Rajagopalan , Johannes Weiner , Suren Baghdasaryan , Mike Rapoport , Oscar Salvador , Anshuman Khandual , mark.rutland@arm.com, will@kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-arm-msm@vger.kernel.org Cc: "Trilok Soni (QUIC)" , "Sukadev Bhattiprolu (QUIC)" , "Srivatsa Vaddagiri (QUIC)" , "Patrick Daly (QUIC)" References: <072de3f4-6bd3-f9ce-024d-e469288fc46a@quicinc.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory In-Reply-To: <072de3f4-6bd3-f9ce-024d-e469288fc46a@quicinc.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9C4B7C000D X-Stat-Signature: dfgy6g8ed135oke5esmrsdy4ggrx8iuo X-Rspam-User: X-HE-Tag: 1673969609-931906 X-HE-Meta: U2FsdGVkX1+Mv7v8gSGFfgnfbIx5/jfLc9D9v8RFqqJfKqd7k07R3J2vdat/eN4gVGcIn8I4ui5uWlBrycWZC5qU3O6Th9X+PZAsFj0mi3YTZKAP2ngKN+DPjlGLfluEOV9McjsIcmB2rZlVeJwZ/pzGsI9rNEC5lReEIWz6o70Vss5wxf939jpYTBmo5yGnSiezRvRNUmw792Ec4vktTilLuJ7D0e8ZIPwBalzJmMf1+2NiQb5IOOHJy5Y35x2tvH92GCyrBFoMYOaXXW8YsaOa0j4QbzR9s4leGd8ztZNl9x2DSs32twY18zMUOHY7f43wn8KTEexdxjhH+fPS4W8ek7wKmTuOg4d/1vqoD27UDoaa+s9suFOre9bSwTHKfEI1g5ipMGNEYb4jOVxxw9pQAteeZ7C2zLCJV2/z4rMlCGEnk6VxvbY0lUdDiCqnryO0Aa4VJI+U5jquz68pz5960FB3DZ/7wKdwRgmx3ReMF8HM4HeRTpEbls5XH/T9bvPaxCbsW9FKUD7uISs7kCbkgs3LLns7ja+5NKCbPH4NqyJjxwz8kzRBkOQaTbKolB2wM0eyvj4aYTeWe1BnIx24lcTuDQWBZRe13tjVqtPzU1cDCYV7UQYcI19RyUT7PgNhDz04loRMU87hN9U82ks+nqDd7XUnWGhLMGq5LrNoF1KcHA2G3RYpPasmH+CQ6vTnaa9xB4hxZlK9+f40A2ibhi7OEvpjK/COJ5fJstqa+TUYZK592AanCLEKqa1C/WvYSM2uEQjhHVGqWfs4XOeY3dkOngXFrCNkFvj6c29cFQA22G5Jr5VwbiNHaZi/LTBwprjn4Fagcb+dMio4cp4AjN2Zd6oqpxVuG819ZjVr60selP7McrhfgEwaCuB+X3Z7765pyy4+6neOvO0XDfFrnr8bHZlb8XZUZgMcBCESbKqMBh5MgLbOT/GpOEF6yPDhn0kB2/0kFE/ORu/ l/6/iv7E nLbgT1Aa3d1/lz2kRcyymrgTQ4qllcJXnnlfpua5UZiKdcPBUpcltVPSmT6k84NKTUeW0hEA7UyCA7oqeoYrXloLjqNDjUq3L7QXOrWCJDRUlQf2nMBraN3l8F6c2SyiGyf0oKIinRG90p0EldZCgVvTXUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 15.01.23 04:57, Sudarshan Rajagopalan wrote: > Hello all, > Hi, I'll focus on the virtio-mem side of things :) > We’re from the Linux memory team here at Qualcomm. We are currently > devising a VM memory resizing feature where we dynamically inflate or > deflate the Linux VM based on ongoing memory demands in the VM. We > wanted to propose few details about this userspace daemon in form of RFC > and wanted to know the upstream’s opinion. Here are few details – I'd avoid using the terminology of inflating/deflating VM memory when talking about virtio-mem. Just call it "dynamically resizing VM memory". virtio-mem is one way of doing it using memory devices. Inflation/deflation, in contrast, reminds one of a traditional balloon driver, along the lines of virtio-balloon. > > 1. This will be a native userspace daemon that will be running only in > the Linux VM which will use virtio-mem driver that uses memory hotplug > to add/remove memory. The VM (aka Secondary VM, SVM) will request for > memory from the host which is Primary VM, PVM via the backend hypervisor > which takes care of cross-VM communication. > > 2. This will be guest driver. This daemon will use PSI mechanism to > monitor memory pressure to keep track of memory demands in the system. > It will register to few memory pressure events and make an educated > guess on when demand for memory in system is increasing. Is that running in the primary or the secondary VM? > > 3. Currently, min PSI window size is 500ms, so PSI monitor sampling > period will be 50ms. In order to get quick response time from PSI, we’ve > reduced the min window size to 50ms so that as small as 5ms increase in > memory pressure can be reported to userspace by PSI. > > /* PSI trigger definitions */ > -#define WINDOW_MIN_US 500000   /* Min window size is 500ms */ > +#define WINDOW_MIN_US 50000    /* Min window size is 50ms */ > > 4. Detecting increase in memory demand – when a certain usecase starts > in VM that does memory allocations, it will stall causing PSI mechanism > to generate a memory pressure event to userspace. To simply put, when > pressure increases certain set threshold, it can make educated guess > that a memory requiring usecase has ran and VM system needs memory to be > added. > > 5. Detecting decrease in memory pressure – the reverse part where we > give back memory to PVM when memory is no longer needed is bit tricky. > We look for pressure decay and see if PSI averages (avg10, avg60, > avg300) go down, and along with other memory stats (such as free memory > etc) we make an educated guess that usecase has ended and memory has > been free’ed by the usecase, and this memory can be given back to PVM > when its no longer needed. > > 6. I’m skimming much on the logic and intelligence but the daemon relies > on PSI mechanism to know when memory demand is going up and down, and > communicates with virtio-mem driver for hot-plugging/unplugging memory. For now, the hypervisor is in charge of triggering a virtio-mem device resize request. Will the Linux VM expose a virtio-mem device to the SVM and request to resize the SVM memory via that virtio-mem device? > We also factor in the latency involved with roundtrips between SVM<->PVM > so we size the memory chuck that needs to be plugged-in accordingly. > > 7. The whole purpose of daemon using PSI mechanism is to make this si > guest driven rather than host driven, which currently is the case mostly > with virtio-mem users. The memory pressure and usage monitoring happens > inside the SVM and the SVM makes the decisions to request for memory > from PVM. This avoids any intervention such as admin in PVM to monitor > and control the knobs. We have also set max limit of how much SVMs can > grow interms of memory, so that a rouge VM would not abuse this scheme. Something I envisioned at some point is to 1) Have a virtio-mem guest driver to request a size change. The hypervisor will react accordingly by adjusting the requested size. Such a driver<->device request could be communicated via any other communication mechanism to the hypervisor, but it already came up a couple of times to do it via the virtio-mem protocol directly. 2) Configure the hypervisor to have a lower/upper range. Within that range, resize requests by the driver can be granted. The current values of these properties can be exposed via the device to the driver as well. Is that what you also proposing here? If so, great. > > This daemon is currently in just Beta stage now and we have basic > functionality running. We are yet to add more flesh to this scheme to Good to hear that the basics are running with virtio-mem (I assume :) ). > make sure any potential risks or security concerns are taken care as well. It would be great to draw/explain the architecture in more detail. -- Thanks, David / dhildenb