From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A58EDC4332F for ; Tue, 22 Nov 2022 19:46:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC75D6B0071; Tue, 22 Nov 2022 14:46:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E77BE6B0073; Tue, 22 Nov 2022 14:46:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3F286B0074; Tue, 22 Nov 2022 14:46:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C4A726B0071 for ; Tue, 22 Nov 2022 14:46:53 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5AC99AB633 for ; Tue, 22 Nov 2022 19:46:53 +0000 (UTC) X-FDA: 80162111106.13.7753E10 Received: from mail-vs1-f43.google.com (mail-vs1-f43.google.com [209.85.217.43]) by imf15.hostedemail.com (Postfix) with ESMTP id 05DA2A0003 for ; Tue, 22 Nov 2022 19:46:52 +0000 (UTC) Received: by mail-vs1-f43.google.com with SMTP id q127so15502709vsa.7 for ; Tue, 22 Nov 2022 11:46:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=iJupukkEoGhIVTPDQgfM6MLu0R+8+aVVrcwFrJK3N0c=; b=M80awYn8ZIf5OKkEY2DksCS/pgDp0Vlt4QAZaq4A5dWXtc7g9jtLqRw15V+r5kecxR /g6Lj8CXkZVy7zj5+J/Cc/xEv8SWpgJ35wU15k/o0m8ZFMXBIER7TqbblSrplcRN65sR VxwfriTfukZdxbDBMuTOJFjHE2A+ElzVavRiL7FmEX7KHHkpvs3SqIsFOBqGCVKehTta ScsClvj3jyvRwzH4fXxNxd6i7CCG1ApRWrtTG2AOuJJGcKg+wVEXDN7llD2pWGD89ZY5 uSk/LGsGE/YLh9rkKe4Wi/REyuTnDjbG4DfUktJJroFAofiZ6N9GbZ/mt7hhKHw+jX0V aPsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iJupukkEoGhIVTPDQgfM6MLu0R+8+aVVrcwFrJK3N0c=; b=cKQslAqoFSZF63iqU+os/oOdiNJeZGp+vwA+w4VHPC8RBk0SohmlSMDfKGanODz51R HJ9CsJieGy/CH+ka7O/0amovUfKw03PyStWwaRYVFnio7iCOVzqIV8yefglyyrqkNMux b7cTvExbZDEea103tEuuiMWT2O8Imh/5z2DNVUsxvCRVEIcgh2ze7By8PthlFROD0wYx 19PoGrkLgKcs0t8HDnhC6o8Mo8C9cKtsmd6MfkV3ATK0oXxjxUr0F19rAkkKdDVtPVd3 5dOGeYBav5bms1YlYcz2nHLMWlQHV99/7NS488+HBVzFunhfTvg/qLyry6y/WkZlPtat BgUQ== X-Gm-Message-State: ANoB5pke6wHW07bM47aIDGmNKB1B3SfO/Ks7hTErWpYJvmJSvQLuThLA /V3JLOglYb0rqmzVQgiLR7YIT5y9GJvsMM/x488vcw== X-Google-Smtp-Source: AA0mqf4zhuspzMASRV535xF+302qrJJOauH0NL+OEdJsr1TbABXo+Yhb9mLmDvugbzEMGniRMzfswCkaxQecYC7OQtc= X-Received: by 2002:a67:c906:0:b0:3aa:f64:fbfd with SMTP id w6-20020a67c906000000b003aa0f64fbfdmr5403462vsk.15.1669146412036; Tue, 22 Nov 2022 11:46:52 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yu Zhao Date: Tue, 22 Nov 2022 12:46:15 -0700 Message-ID: Subject: Re: Low TCP throughput due to vmpressure with swap enabled To: Ivan Babrou Cc: Linux MM , Linux Kernel Network Developers , linux-kernel , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Eric Dumazet , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Jakub Kicinski , Paolo Abeni , cgroups@vger.kernel.org, kernel-team Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=M80awYn8; spf=pass (imf15.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669146413; a=rsa-sha256; cv=none; b=jJK/JAvWTyYpHvdAXBlx9yIS+Dn0zu0AMNFX6s0SEgFaiDXAjRiVeE+Owya6dOA56zcvki b1Gr7zqR5hwjIWPQQLhhi3JT5RabdywAJlVxk1+v1U1nboj9UZDpW6OlV72+OBteJKDsi+ /F24io+aG5MmANgDZWB0zZeZpQOeZCw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669146413; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iJupukkEoGhIVTPDQgfM6MLu0R+8+aVVrcwFrJK3N0c=; b=eUMnXrWGE6YWLlJPEkoZmjURB+TcXkWhcYQRvARuaVdsgsYTAZTT1B/hfnUVNHq9jRaHrh SIm2yNglKqdt+KF9b83PdO2GfrWm/SCDs0gmgoajCO/zqiHhg2PfTTwb4/QY7sdXj1n85M SgQKi7QwrEz3EBYXwObS4WvxKtb4wpo= X-Stat-Signature: y9ys1zntx8et84cx6dfq3871ky6aumqo X-Rspamd-Server: rspam08 X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=M80awYn8; spf=pass (imf15.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Queue-Id: 05DA2A0003 X-HE-Tag: 1669146412-668169 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 21, 2022 at 5:53 PM Ivan Babrou wrote: > > Hello, > > We have observed a negative TCP throughput behavior from the following commit: > > * 8e8ae645249b mm: memcontrol: hook up vmpressure to socket pressure > > It landed back in 2016 in v4.5, so it's not exactly a new issue. > > The crux of the issue is that in some cases with swap present the > workload can be unfairly throttled in terms of TCP throughput. > > I am able to reproduce this issue in a VM locally on v6.1-rc6 with 8 > GiB of RAM with zram enabled. > > The setup is fairly simple: > > 1. Run the following go proxy in one cgroup (it has some memory > ballast to simulate useful memory usage): > > * https://gist.github.com/bobrik/2c1a8a19b921fefe22caac21fda1be82 > > sudo systemd-run --scope -p MemoryLimit=6G go run main.go > > 2. Run the following fio config in another cgroup to simulate mmapped > page cache usage: > > [global] > size=8g > bs=256k > iodepth=256 > direct=0 > ioengine=mmap > group_reporting > time_based > runtime=86400 > numjobs=8 > name=randread > rw=randread Is it practical for your workload to apply some madvise/fadvise hint? For the above repro, it would be fadvise_hint=1 which is mapped into MADV_RANDOM automatically. The kernel also supports MADV_SEQUENTIAL, but not POSIX_FADV_NOREUSE at the moment. We actually have similar issues but unfortunately I haven't been able to come up with any solution beyond recommending the above flags. The problem is that harvesting the accessed bit from mmapped memory is costly, and when random accesses happen fast enough, the cost of doing that prevents LRU from collecting more information to make better decisions. In a nutshell, LRU can't tell whether there is genuine memory locality with your test case. It's a very difficult problem to solve from LRU's POV. I'd like to hear more about your workloads and see whether there are workarounds other than tackling the problem head-on, if applying hints is not practical or preferrable.