From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EC8CC001E0 for ; Tue, 15 Aug 2023 00:14:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B4F57900010; Mon, 14 Aug 2023 20:14:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFECC90000B; Mon, 14 Aug 2023 20:14:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A03E900010; Mon, 14 Aug 2023 20:14:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8B89090000B for ; Mon, 14 Aug 2023 20:14:24 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5FA1412016F for ; Tue, 15 Aug 2023 00:14:24 +0000 (UTC) X-FDA: 81124417248.30.BED8B8C Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf03.hostedemail.com (Postfix) with ESMTP id 6825B20003 for ; Tue, 15 Aug 2023 00:14:22 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=r1OXbsdN; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf03.hostedemail.com: domain of htejun@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=htejun@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692058462; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y7Vt7fHEVDI/tRvKfSZv8+YY3N1pXVZVv5h9aG4SLow=; b=BMLEG7m3LZfA6S2+ud8t/wPNUJ8XfBVci/2KQvaieYnHuDNQPBXz+2Qz+btFAIw5RMHW79 iuqHyRIx/66yKhwYkUBwVw0HcwLFkEUHn82/6MZMtakoJfwB4aLy91sO3G3iNReZeFlU10 xYK9UtqSq98Uc1LFiYCmCBmkUuo7FyI= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=r1OXbsdN; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf03.hostedemail.com: domain of htejun@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=htejun@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692058462; a=rsa-sha256; cv=none; b=kknwsWi3Ox5/2fZLsxaBDZ9tA+8huhKOONlv2NPBwJAcYq/sTHDISqlETD1PMn4sZPF3A6 9h6X0cBhRtl+amBcVUS5WaXw2tTqoW5cxOur/3k9Zjzf73PLGr2WLdfLvJGRyx8HptO8of eCqtChuW1lD8WyuCEZhV0oVpyqkRzVs= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1bc34b32785so30359425ad.3 for ; Mon, 14 Aug 2023 17:14:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692058461; x=1692663261; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=y7Vt7fHEVDI/tRvKfSZv8+YY3N1pXVZVv5h9aG4SLow=; b=r1OXbsdN20Nql7iOCvB4NfyObEdbLsL9nlgrwnTfvnVKcQI8gbV1N+qgH0e/L9kBuF AqHduK6BI2OUMTB3ZdAvlTFd/GqDiKOtGfwj6Q02TPFRPnXludWFyiz8BzCgIe2F1Yxn WKWu5Gu8V+p/1fagMRlnXkmDQa/z9eBZnxzBovukpxrUEpOBtYSaxokm5ghScvcE2bEn vwa97+m4bECjS57nq5RzJmtH2sFwwVUzQI9LsfE4kENEQ4DiutXVWtPY4bbMGLpFfUGZ e1WvsibWuufO0DXirDHKDr1JqZNPO2tTAdygmXmBS4gtr5KiAUM0WMO/VvFhYOh+oVHk T5rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692058461; x=1692663261; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y7Vt7fHEVDI/tRvKfSZv8+YY3N1pXVZVv5h9aG4SLow=; b=lMmfeLLZue60drjTYiuIqEb3vaf0UsYzS2oqMA0j7OJUzQ9iGFKrBxYG+QzpGyQqVc ckzaibDzZuI+Qianp0ZxUn2qfVsaO5Gxk3/aHfo8Ak1bhKt3RO7xRJFgnAXUr3Jvb5G4 rJe3ih92B+OeuhqlFOQlZM6LxHbop79ZZsyV+3w3Lw88tcEqxqMXjFPCn3fBP/8KQ10U 6grWq74fM9pcV/WVODFsFnbSlieMuQwho566FZmfSge6I84LVeaIbLs8VK1Wd+laIv6M e5JeK2gA9Fg3ywn1xeUycUtEgTyZdksK7UTNf00UtG9TYcVjZ+mLM4qbAFNK1UYYBCSJ n3ag== X-Gm-Message-State: AOJu0Yw3sP4kl8tsTeU8Yoaq5XYhyJsK38ygM2LiR67rhMNoeS2V6VfQ e8KrnCOuCLatSFD5Q0YsBzk= X-Google-Smtp-Source: AGHT+IHVL0FQbqVO5NQn0o+PeK+ghjqkp/5hW1ci9bmy/MBJvxea2JTuMMAMB66MgijQJHq0hEDMRw== X-Received: by 2002:a17:903:1211:b0:1b6:b445:36d4 with SMTP id l17-20020a170903121100b001b6b44536d4mr10492857plh.43.1692058461055; Mon, 14 Aug 2023 17:14:21 -0700 (PDT) Received: from localhost ([2620:10d:c090:400::5:93bd]) by smtp.gmail.com with ESMTPSA id f3-20020a170902e98300b001b9e86e05b7sm10070509plb.0.2023.08.14.17.14.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Aug 2023 17:14:20 -0700 (PDT) Date: Mon, 14 Aug 2023 14:14:18 -1000 From: Tejun Heo To: Yosry Ahmed Cc: Michal Hocko , Shakeel Butt , Johannes Weiner , Roman Gushchin , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 6825B20003 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: uhk38jgx9qwapyzd4mqcwcdzqrowyrnk X-HE-Tag: 1692058462-418177 X-HE-Meta: U2FsdGVkX1/dIldNj12hq99ttoceBKPiJyHjqXV1v3ekPEFqIZG7JCaojldAlxpmI62xwZIuOWFjIsij7Nso004motV3kusSLDX67Vuh9yW1VM3kllfr+sHaeL8rEOipeFct0bblMmwCy3hXiGtd1NVkfF0qg/cG+MKeM87NOv71H06OpUMqQvQxaneZ1vP64Gc27KqGptfieAeIUMyuLcaW2eplh04/2tOX/L/fmi01TVyJRSg7q+fpE56ztfpH0kpfRGuftAMXNshR7UkGNC8Ji0QwXyZH+EDh36EL7RvjooW9lc2ZBaxruzGzWnZR4VV4xL4JQFkhntsRP88lpetuEdmJ/1xBnUeEVnz9/J3QAqmPwo+mVXK80YVXl7mvRe19Nm4AYHcys274Wg/CjwbzfxW2IIWifjqIA/bA4qbRIuw9I3JMhzuvgkllJRMzsyMD0bbDHSQDsivzDL+sy9YGArq4+o8t9//C4PuCNZBKCuH5+VLmpjYSaJ14FxPyz/MyR53ozKV9hTsTu0SDv3CCjUb3JCQzD708kJNS5TlWylQH7Ne8RY9sbTKRLN5ZVwENeCXZ985GKSMNwRqDn6FsrPwqoA3Z29jELIUsVzOfa0M2WbUJVU88xs29SkBPpADd+xF6LqK9WUAl6OGrspRM0R+14zs0FvUEF1KrauOTNb8PQGCLL7uXFyGL7c422Rst0+IKEXbGPV4UWPvvW6GrXwwZdRwFA+RZkyWjn646U9f1kpQB4KACaOE8zJCz2xzZCmrbsP7hHLBjudGGm6zA0neFCpfahNP9NnYdXWrEiRFTSy+HrgFy4WhfldKZXTwpILqma21cy7fOS/dpfozsH1uVC7PVJOHqvJLDuZh3y03JoDe9sdPc4CiAMRv7hZ8VLOYn/58iMckq17pZReYt4dzhtzs6AYpGBy/Lzr6Qq5v8mb93j2Ev4sQM081MOXfL1Caff/+4Xc5n4fO 7X4XlO75 fV0NGLaAEUcLPqPMLcGO7BPvcBzUqrj2JLra2EllOd+bgLoy8AIikJ2wTeME8mDP09m3slTi8y3/lG4CHdhHkUPCwGlOwjItJNAVB6rsfGRjMAcdLg8WaBlWuopmQ/B3ehs8E8W9jJjXYLwdjJAmeS4nmZHOqMAhy9qoIemb2C2wGx6DG5r1RwwZqZnX4T/WHDnw/7yGPimsXt9Alx7IPIJ4T2buuamQKVPLfQzztFkkxpNXQVuI5ZMnhOnuHdgBZNUEAVuKj/Sr8qkqY/jG9Cq249742XY1ArTVItzTJc/xUKUSk5ISmddVwLBCOct0HK0lGq+LcLZJl/p6l/7pmPnYoONg5Tvlqo6S8+OEF5qPF08HcwyxPDktsuP+sCl11HWjeZw8cdmEiQkvvbzRRyejbBKMsVyNkqyrmvz1IVN6XrqgSQFdN4t5c1TK/NIV9gz0mhdtizVFzUcQuvbNyEF1iUvnUaYSjiFt7CWcbd65GR7/dEsIGrXU8beifIudDWRFc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Sat, Aug 12, 2023 at 04:04:32AM -0700, Yosry Ahmed wrote: > Taking a step back though, and considering there have been other > complaints about unified flushing causing expensive reads from > memory.stat [1], I am wondering if we should tackle the fundamental > problem. > > We have a single global rstat lock for flushing, which protects the > global per-cgroup counters as far as I understand. A single lock means > a lot of contention, which is why we implemented unified flushing on > the memcg side in the first place, where we only let one flusher > operate and everyone else skip, but that flusher needs to flush the > entire tree. > > This can be unnecessarily expensive (see [1]), and to avoid how > expensive it is we sacrifice accuracy (what this patch is about). I am > exploring breaking down that lock into per-cgroup locks, where a > flusher acquires locks in a top down fashion. This allows for some > concurrency in flushing, and makes unified flushing unnecessary. If we > retire unified flushing we fix both accuracy and expensive reads at > the same time, while not sacrificing performance for concurrent > in-kernel flushers. > > What do you think? I am prototyping something now and running some > tests, it seems promising and simple-ish (unless I am missing a big > correctness issue). So, the original design used mutex for synchronize flushing with the idea being that updates are high freq but reads are low freq and can be relatively slow. Using rstats for mm internal operations changed this assumption quite a bit and we ended up switching that mutex with a lock. Here are some suggestions: * Update-side, per-cpu lock should be fine. I don't think splitting them would buy us anything meaningful. * Flush-side, maybe we can break flushing into per-cpu or whatnot but there's no avoiding the fact that flushing can take quite a while if there are a lot to flush whether locks are split or not. I wonder whether it'd be possible to go back to mutex for flushing and update the users to either consume the cached values or operate in a sleepable context if synchronous read is necessary, which is the right thing to do anyway given how long flushes can take. Thanks. -- tejun