From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F9A2C38142 for ; Sat, 28 Jan 2023 03:00:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D1916B0072; Fri, 27 Jan 2023 22:00:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 781CE6B0073; Fri, 27 Jan 2023 22:00:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64A126B0074; Fri, 27 Jan 2023 22:00:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 550F76B0072 for ; Fri, 27 Jan 2023 22:00:22 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2123BC0353 for ; Sat, 28 Jan 2023 03:00:22 +0000 (UTC) X-FDA: 80402704284.16.92CFE69 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf01.hostedemail.com (Postfix) with ESMTP id 2BC584000D for ; Sat, 28 Jan 2023 03:00:19 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PnDYYJrl; spf=pass (imf01.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674874820; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Kf1srqWLiAwQJb6khMqwccVVI6dGAIwYciecs0JihFw=; b=to7s4n8JcaKv7e50bNu76YfGiQZkR2CR+910Tk3GVPM41x0eA/wCxouPr1D1zZm/p3OZui +g1RABUq7j3Ji2+khpdLLVDpREog2St8CzX+f/rxNM4WrqKprcqW1pDmCb3Z+LqDH8ZtD+ 3hTzPBOlf13hQfm8Ao/xl7BKQ5AU3uk= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PnDYYJrl; spf=pass (imf01.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674874820; a=rsa-sha256; cv=none; b=2tDDUJFZAOhlT1FRdU3CF2uTkdux3EZKTxKNmCb27EPQoIdHvlL3Xx14iXEYE41UzKsznE f6spPrSXv2Li0vASGYQDcxuMe5qaL0C6ttsiN0c8xsI2Pl2xQlNzZPqKFV46Vl7XVrLyG6 g+qMDql/iUR0mOGNXwLjlGVJ4jhsoOk= Received: by mail-pf1-f172.google.com with SMTP id z3so4505943pfb.2 for ; Fri, 27 Jan 2023 19:00:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=Kf1srqWLiAwQJb6khMqwccVVI6dGAIwYciecs0JihFw=; b=PnDYYJrl5/HaL/66Vaho3VwAxBYUuYrHz7sMKUcrXTWf35wQGaRCqc9qG2DD8WcMcY f48AWxAJEJInkZiDBr1usAv2I0md7+HuXdmSDSRMRIXh4Yi/YPOWr6xh/RwoneOqSPe4 c3pr5+3zNSTEJ2BrwiK7Z8iOUmpCAKem9MiwY60Evz76Aff+hGFYG/JsR3NUgMqt3S9L /kHpXdeyjxfYROf6fNyBV73W/z8sm3j87wQQ/Qxu17PW1BKOKacAvM263xRakDaK6S7z lbdbX6mkIa1xb/yYRlcjrfGftm10NOyXP/1tDiHi7soJLvCVgqRDATo1cAbMeO/zFreZ 7u9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Kf1srqWLiAwQJb6khMqwccVVI6dGAIwYciecs0JihFw=; b=2WLjPhMleQJHYHY0BYKkV/YHby/04GbR0K4PEXWhvDp1kFBQrFiHXTzaQlkymmoIzh 8KjWwakzKZyvtHI5/8G3jizu8o1auyEsM2ez1DoIJYDpSKDd+8bchYh4DSyTH+/TdYdO y6uwy3WTZIQm6TP/814GpoU6m+GSOM7l3vJodqJK38cnQWj9/8FY/4MUmDCtmVY21pce qctMBEPTv7O9UEZaV+mIJld1/lBOm5uzoXmDKu4mIrhwLPSsU1IwIROfg8MWOst7xoJc sdONa98OH+Y+U/xM6mF6RMueo/PmxbiqIfmIa5v8dUadspXlcDrwxdIprhNpY6CAIq70 FrfQ== X-Gm-Message-State: AO0yUKVgM7ky0C1B69mBFaxfxZ21PGpWMKUy8FX51Te8YSW87HNspelk NR2u6vcNaVLzMTWwVqh7wRo= X-Google-Smtp-Source: AK7set+yb44oBZDoiywyb2FVP/S1XOb7gj9HbHztCRuKfPl2QPgPINlnRyksIUK/2/CaaVVkWjCb2A== X-Received: by 2002:aa7:8484:0:b0:593:89ab:2ec4 with SMTP id u4-20020aa78484000000b0059389ab2ec4mr1907103pfn.10.1674874818748; Fri, 27 Jan 2023 19:00:18 -0800 (PST) Received: from google.com ([2620:15c:211:201:967e:a55c:7b58:6c3e]) by smtp.gmail.com with ESMTPSA id e22-20020a62aa16000000b0058db5d4b391sm3426668pff.19.2023.01.27.19.00.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jan 2023 19:00:17 -0800 (PST) Date: Fri, 27 Jan 2023 19:00:15 -0800 From: Minchan Kim To: Michal Hocko Cc: Andrew Morton , Suren Baghdasaryan , Matthew Wilcox , linux-mm , LKML Subject: Re: [PATCH v2] mm/madvise: add vmstat statistics for madvise_[cold|pageout] Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2BC584000D X-Stat-Signature: bcwagderyzgu1uwhixyjnpaq3dpuzf3j X-HE-Tag: 1674874819-627086 X-HE-Meta: U2FsdGVkX181FMYA2pnskL/fNFq6LaVemNscfEOHap+SO65CgqB7WcsbU7T9MeJ530vzpgTNyGifsyQT+ukUUY5nih/svL+Z7VcTaok2DPnPGon5gEjvyUJy41EqUqSGzsQKCXQYvoQdBNvdqjAeBj89WIjEUnalteA2tyA8MSYIeVTM0jw6xlX2f0ChHkWk+aiQO9HxQwGgFyI0jZ1HKoV4jsqevOgNMfe2RXppauCMFQqQV4GX+vkUSqt8jZL7kMcKPhRPd9v6CmUKles3WbodyrfSQUtw6amlh9rbOf/RhGFFunc/zwUAcOa+yivGX2TTZ4eBPOf2viKE70m+STYDJx3Do6ha25IZVg7aGIKuj+WGzlzwSx80rlEFMi7fjkM70XQvgLTOZ5NWBaxLD8/N+nzhx681L72lvqi7ryOMhUIb3rvHm3icfvb/580eLontuBFXw0zso6a3uRsrDEPJazGv2ay6dRmjnhTQXtClxm3pZu8QB0jvSNHMpKkwvSTpGNLcb3LVAXuSM3W1nr6Ina21H9jy2LLn3tECnFIMIhqWV5rgpOL40AEGb83nxbxsDx0dJAWGOydjnGwhpbL6iQLK/vJnxWl9ki//phHkcjMwsC+xK2IfM4XSGeyCv4BmMaTpi82777Ro8sOyyBLop4Ihw3PPSra8zpNPXj8tfMmoPdGtCnCFh3JDGbOmt7gztq2BDBRHW4fhuznY3ttx4Ycrs8wiRNto6URIe1eaBQo3SQ0HcDBcrp5tJx/h+sSDy4egPJDfD6ne+SzbWBMMrqarGD/8xDoDzI3y0nWXLYhU2KpyyWDPkrb0R7UtQ6VYtyLphvmCbuiKUI4sed+USB3h3+DlfVgf0VRs9oGoWxStjvmACPLfhA3GvhYpCrEb4cilAZQBtpEwLXVK6lwYe+kjmNTOZpuAlSxqILIy98/2nAGilR6aoFTjOFHqybSZKMCrX9v+L4UyNAJ 9W7Qmug5 xslMD8HIqHFByUp3QzGD4EEGxqvUJnjMahf5kom1msPw+Alvj3pZoZ4H6BfpZgbHUHxEkFP0KJVj7cRIenWsd3EDPvVI5x9vk3GoQEjdtyFt79Rl4ccOFl9zDW93ETSDgph0A5/rhWg5wRfJvHFaLWMSVAYv/EQLzUWkFexqGzImbbhjKHVbe96812R+5D3gPu+Vd3DXu+xj3avPLLVWPiBIYyb0kRjAEVDNCNWapKKcE0gwpQm4tU+R3XkYhFRioTa7nGjCrbgKvDjTBfx0T5px5hOB07y2sUZZO5ndAldbMSVduoz9Us6cV7dszTZNZ1gNxWL0JPUvvyeYyiLxibFhNzWE0VmAzlM0OxJojRSaBEFFIP0Ok93LJfiLeAb7KKvlMKTe5ejr66WkjgIPh+Fd2+K6HjW5/rrsMCWn3e7sjrB5yk5xiL4Ve9mko5UqxVl2QBl/97pavB954DORuAtxvea9Zoyq2l9tFQtrxCadTmJRoBdSS/8JusZCtA4YK5gO7hFAeukejDx7k10JGwA2wkhcCzcByyGsEH37dZtfXZqeGFJ9H7QBphWQ2b5G2SKkfdA0ZZby0fIf3MuA+HCA4NJvtbug77nlv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 27, 2023 at 10:48:25AM +0100, Michal Hocko wrote: > On Thu 26-01-23 16:08:43, Minchan Kim wrote: > > On Thu, Jan 26, 2023 at 08:58:57PM +0100, Michal Hocko wrote: > > > On Thu 26-01-23 09:10:46, Minchan Kim wrote: > > > > On Thu, Jan 26, 2023 at 09:50:37AM +0100, Michal Hocko wrote: > > > [...] > > > > > I suspect you try to mimic pgscan/pgsteal effectivness metric on the > > > > > address space but that is a fundamentally different thing. > > > > > > > > I don't see anything different, fundamentally. > > > > > > OK, this really explains our disconnect here. Your metric reports > > > nr_page_tables (nr_scanned) and number of aged and potentially reclaimed > > > pages. You do not know whether that reclaim was successful. So you > > > effectively learn how many pages have already been unmapped before your > > > call. Can this be sometimes useful? Probably yes. Does it say anything > > > about the reclaim efficiency? I do not think so. You could have hit > > > pinned pages or countless other conditions why those pages couldn't have > > > been reclaimed and they have stayed mapped after madvise call. > > > > > > pgsteal tells you how many pages from those scanned have been reclaimed. > > > See the difference? > > > > That's why my previous version kept counting exact number of reclaimed/ > > deactivated pages but I changed mind since I observed majority of failure > > happened from already-paged-out ranges and shared pages rather than minor > > countless other conditions in real practice. Without finding present pages, > > the mavise hints couldn't do anything from the beginning and that's the > > major cost we are facing. > > I cannot really comment on your user space reclaim policy but I would > have expected that you at least check for rss before trying to use > madvise on the range. Learning that from the operation sounds like a > suboptimal policy to me. Current rss couldn't say where is the present pages among huge address spaces. And that's not what I want to from the operation but keep monitoring trending under fleet. > > > Saing again, I don't think the global stat could cover all the minor > > you are insisting and I agree tracepoint could do better jobs to pinpoint > > root causes but the global stat still have a role to provides basic ground > > to sense abnormal and guides us moving next steps with easier interface/ > > efficient way. > > I hate to repeat myself but the more we discuss this the more I am > convinced that vmstat is a bad fit. Sooner or later you end up realizing > that nr_reclaimed/nr_scanned is insufficient metric because you would > need to learn more anout those reclaim failures. Really what you want is > to have a tracepoint with a full reclaim metric and grow monitoring tooling > around that. This will deal with the major design flaw of global stat > mentioned ealier (that you cannot attribute specific stats to the > corresponding madvise caller). Then, let me ask back to you. What statistcis in the current vmstat fields or pending fields (to be merged) among accumulated counter stats sound reasonable to be part of vmstat fields not tracepoint from your perspective? Almost every stat would have corner cases by various reasons and people would want to know the reason from process, context, function or block scope depending on how they want to use the stat. Even, tracepoint you're loving couldn't tell all the detail what they want without adding more and more as on growing code chages. However, unlike your worry, people has used such an high level vague vmstat fields very well to understand/monitor system health even though it has various miscounting cases since they know the corner cases are really minor. I am really curious what metric we could add in the vmstat instead of tracepoint in future if we follow your logic.