From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0ABE1C4345F for ; Mon, 15 Apr 2024 08:41:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E6096B008C; Mon, 15 Apr 2024 04:41:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 96F236B0092; Mon, 15 Apr 2024 04:41:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80FB16B0093; Mon, 15 Apr 2024 04:41:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 60C316B008C for ; Mon, 15 Apr 2024 04:41:32 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 154911204CF for ; Mon, 15 Apr 2024 08:41:32 +0000 (UTC) X-FDA: 82011122424.03.BCB52E4 Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) by imf10.hostedemail.com (Postfix) with ESMTP id 990D1C0007 for ; Mon, 15 Apr 2024 08:41:28 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=jNSbu0rb; spf=pass (imf10.hostedemail.com: domain of pizhenwei@bytedance.com designates 209.85.166.175 as permitted sender) smtp.mailfrom=pizhenwei@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713170490; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=8wfjIlgO/QZNkC/nNAH/lmmSlY11ytftqlWP0lFvruo=; b=dTq+zQanhLJ5kyd8E6ZB5QrlzSvI9oQx87WB6tX2Mpr6Q4VftEzAec/g7kbngNueCgzTQs XMmP0MCbyOeOG9GeDU9czu0YHoagX5fkmO/IrDxfdYGFugBzLcZ8pOL5qGaP/noMtLCJHp ck6mKLbaixlvDCUgXwyBuaDdOGvwDpQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713170490; a=rsa-sha256; cv=none; b=7XTVTOW7DwQY8hrnCEVa0afAbe6C3lVFMBnnnQM+dOFYxm73RUbB/bheTDBw7pzqEH+R48 nXL9t3UhRNBJUJVtiQVqJW/P2XX50Gi/6MgRGbdxYp6x/4buR/fVDETOvEuce+OAXvlbMU 2k3lW5bF+ATeq8Ejc+Monmw6aG+VYCo= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=jNSbu0rb; spf=pass (imf10.hostedemail.com: domain of pizhenwei@bytedance.com designates 209.85.166.175 as permitted sender) smtp.mailfrom=pizhenwei@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-il1-f175.google.com with SMTP id e9e14a558f8ab-36a3b9bc797so14371905ab.2 for ; Mon, 15 Apr 2024 01:41:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1713170487; x=1713775287; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=8wfjIlgO/QZNkC/nNAH/lmmSlY11ytftqlWP0lFvruo=; b=jNSbu0rblBhQB/C39glmflmUddDIwcLNqjOsrQVBFcrLClDfFNN6+Ojnmby1Z7wrY9 ndfPOnXU+uSLtWss0W2bVgHfYE7SGsMaJMPSKjuhfR9MUoc/5C5erlJHNV4kDqUKpM/7 0z7h0HG2mXLSGwuaSwrkm9Iits9POV+4e2yPB8ubkVS/QtYAK94GRF74sdIsEKI7uUvQ d5pFTGkcw8/QYFGeRpY63bIBhOHzlbqsgsdXvxXabOGtfAU4YZDy3dq+nqB7gX2vxwuW XRWXGuqIydZA+2UaTY+htupKSDiaqPcNC7aqmDTAeIJTbbc0XOzyrrtaqasUnKNCYdU9 YGGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713170487; x=1713775287; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8wfjIlgO/QZNkC/nNAH/lmmSlY11ytftqlWP0lFvruo=; b=Dx2euyWx1V8h67rMcZgllg91RiYTsf2EwXuLWZDITPF6bVwPbWhCuRxo1cGfvX+RgB CUbb/pUkymmnf8fKpT1Hv+kShS5FsbQ8wo+L/hEn7zDPiOxp0F8e1D8GgiZCN6DC0xKj gH7Nlsei4Y5geDDdDOEU6XswAICCBaK+9noqEZrlXb/+qM/XTJvrrUlgAhFdVvx1/Xen 0iHqDKn6XRgRM1DKsnf0wIuUEBHg3DRJi6Fj+T6ZBNDmXfLMDGPZWbrKSaJle39gN1Ke /7AkwBdcPXrKQIQBDadvrkzc9Z5h7KusudtSIwV4dyXqqjxn8BblpDd+jxF2796XEFPo zQtw== X-Forwarded-Encrypted: i=1; AJvYcCVtMLigZX32eRpLRzlITNehXLdv+6eRNBd8BZjT0tgmgC8h0B7vJZHXozmhEpvrqldVLCwbUJa6o4gwmBF1fu4zx5o= X-Gm-Message-State: AOJu0YwnxdpYvjGemtpS+R7MiBokdrmTfZTrR2W3q1PNaDoQXJKqmSbt AzBvgsVUSWzs/rNDnWnMUJYO/090+PwwYHnfD5YFaHb5/quK7MDSJZ8BQwgE52o= X-Google-Smtp-Source: AGHT+IFVVhy+/eBfwZCCc946KkwERHHh+W5/607TDMXjGxAgcPrElcXF4+Vtf1WNq3+wczBthWoaiQ== X-Received: by 2002:a05:6e02:1e07:b0:36b:85e:7d69 with SMTP id g7-20020a056e021e0700b0036b085e7d69mr13433947ila.10.1713170487436; Mon, 15 Apr 2024 01:41:27 -0700 (PDT) Received: from libai.bytedance.net ([61.213.176.11]) by smtp.gmail.com with ESMTPSA id k187-20020a636fc4000000b005d6a0b2efb3sm6575685pgc.21.2024.04.15.01.41.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Apr 2024 01:41:27 -0700 (PDT) From: zhenwei pi To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux.dev Cc: mst@redhat.com, david@redhat.com, jasowang@redhat.com, xuanzhuo@linux.alibaba.com, akpm@linux-foundation.org, zhenwei pi Subject: [RFC 0/3] Improve memory statistics for virtio balloon Date: Mon, 15 Apr 2024 16:41:10 +0800 Message-Id: <20240415084113.1203428-1-pizhenwei@bytedance.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 990D1C0007 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: dr6tf34adbxw6nbkeu7wxr6yr5js3ycb X-HE-Tag: 1713170488-380766 X-HE-Meta: U2FsdGVkX1/rWLaOwhBrwUkxff8wk7SKW0nyq2wh8RREejtg7ISiKff+eHl/v86kWVTwWLGvsVbq7Od7a+vEPbMh4VSz+DTH8+e4LUnnDJGyYEcq34c1G+Udt4uEIvJ9QWq0VLeeSyh22ePpsXd8yBlnpVtCfHR4syLfjD0bbhlPqYnHwgd4PzXblNIKbh7VrBS4ead32B+bsfK5v5iFL38PrZBUyOmKkoKVZYJWYDhK7J1i2RFbboxDoCAtj9t4Lx1xYXioYO92zz4tkyMxQBYYmebPgicHWjTQiMfoM6g2B86RvBKW1fs9+eDNsCGv0rdy0ilo8KCPaqRLbuk5FMTEPEdKlsdhUbU2UCuTFKIpGYtEf+mQjkhUTyv8M2t6UwNWCEFbEAoA3ogHmc44A7/NuJMPI4nwCxmvS5Br5kT2Ts1tnZQmkqMKlq38MAzHwfcnN3WjOff2YevF9SYli8/xiId6eQsoP9NlcvuCvFZNC8FZD8Oeoe5AakuLC0dr99HlKsPOlnEFXJw5b12iyuO82XSYOsQTKslUi1FHerxi1bn3mKx3oQc7ioKg0+Ya1YZgT3DrrzpwveTdnp8Pq26XH9ICJ2IqEa6ZcaXUPhQB5eYwmr9s2O7kvvyUuKgRI0/hGsw4bRh8y99niThQe1acRWI6OifVK3xZMzSzK9w+yu66ad/gL2zmo4F7FROibbRi5mzR5jpJy08VIo4NQKDVMwLIspFe2uPnB6KGCibDUTBfnsOqSHXxwpPDs0eWZtiKCHsbMkLZosMC7ZF8T9pTmABkzMuwC7HzlRtpslLVmxrOJ4i1NJwWEP5VBkzbRVA/ExP+W1UN7NLhnUAirxXtVOYw0J5/+qHu59nJdhkaW38RTLJdVBLnx0MznBBKHAdPvqOeTFuf1vkjskS/Qob0yC5+QL8mkD2WK0CJ8ENcNqBSudS/XTBDuYZ7Olkx9DbuY2euiciKODP6XZX GD8gu8bR i2Jx9i2WVkJmFJdsS1Wj8QB/EuhIWUFH1r7g5AKibscHjZj9OtyYYIh5Gs7S35/Dnh8sKoxODkyk3Vs6Hpj/jcXxfMpTNL8Jb3NeilCrUexUFjO2N0pBJnSkey3Zo0GOlHxUV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, When the guest runs under critial memory pressure, the guest becomss too slow, even sshd turns D state(uninterruptible) on memory allocation. We can't login this VM to do any work on trouble shooting. Guest kernel log via virtual TTY(on host side) only provides a few necessary log after OOM. More detail memory statistics are required, then we can know explicit memory events and estimate the pressure. I'm going to introduce several VM counters for virtio balloon: - oom-kill - alloc-stall - scan-async - scan-direct - reclaim-async - reclaim-direct Then we have a metric to analyze the memory performance: [also describe this metric in patch 'virtio_balloon: introduce memory scan/reclaim info'] y: counter increases n: counter does not changes h: the rate of counter change is high l: the rate of counter change is low OOM: VIRTIO_BALLOON_S_OOM_KILL STALL: VIRTIO_BALLOON_S_ALLOC_STALL ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]: the guest runs under really critial memory pressure - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]: the memory allocation stalls due to cgroup, not the global memory pressure. - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]: the memory allocation stalls due to global memory pressure. The performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows quite effective memory reclaiming. - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]: the memory allocation stalls due to global memory pressure. the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing heavily, the serious case leads poor performance and difficult trouble shooting. Ex, sshd may block on memory allocation when accepting new connections, a user can't login a VM by ssh command. - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]: the low ratio between ARCLM/ASCAN shows that the guest tries to reclaim more memory, but it can't. Once more memory is required in future, it will struggle to reclaim memory. zhenwei pi (3): virtio_balloon: introduce oom-kill invocations virtio_balloon: introduce memory allocation stall counter virtio_balloon: introduce memory scan/reclaim info drivers/virtio/virtio_balloon.c | 30 ++++++++++++++++++++++++++++- include/uapi/linux/virtio_balloon.h | 16 +++++++++++++-- 2 files changed, 43 insertions(+), 3 deletions(-) -- 2.34.1