From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07D3EC5475B for ; Fri, 8 Mar 2024 19:19:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 314636B03E8; Fri, 8 Mar 2024 14:19:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 29CF76B03E9; Fri, 8 Mar 2024 14:19:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F1866B03EA; Fri, 8 Mar 2024 14:19:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id ED8716B03E8 for ; Fri, 8 Mar 2024 14:19:10 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C77AA120745 for ; Fri, 8 Mar 2024 19:19:10 +0000 (UTC) X-FDA: 81874834860.30.7408186 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf28.hostedemail.com (Postfix) with ESMTP id E9BCFC0019 for ; Fri, 8 Mar 2024 19:19:08 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DiNBmj+y; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=axelrasmussen@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709925549; a=rsa-sha256; cv=none; b=HZrbL3uLHr9Orv2TNrFtIu5sBbhWudSBADyhewhKAgv2imdVLNGbVQLUeyAsk/+WCRCfKC 9JgtOBihnltuJK4Z5+RikU/jD2awcw0Sd2hRRsDlAnAmBp3BjuqP4VW5VT+bJLqs+BdSoG VkaXcjMVAynxc2P9iuutiU5spKgyKoA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DiNBmj+y; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=axelrasmussen@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709925549; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UVPdnhePflx6M0Y1VqPM0q+L6FmThcZFrCpB9y8TiBA=; b=KL4Qg2W3WxHn3Go7vfOIUO+WXI/9rRz2QGMpRN6Zp7/Jznu0Q4dTVMf5ZgmJdJY/SwSUzR fH8T6pNCzTamD92b1J+Ywl+YOrKzgwHNA0jjOs1qcWTGkfbS4EWBKWRy8Mf4nOQVNeSb4o fFwq1/6FBcRlmpsWr+E4tPeyL3AGaL0= Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-33e7b761073so254509f8f.1 for ; Fri, 08 Mar 2024 11:19:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709925547; x=1710530347; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UVPdnhePflx6M0Y1VqPM0q+L6FmThcZFrCpB9y8TiBA=; b=DiNBmj+yh8iPeD8BQNwB76HdPP7gZY0H78D0WFsecshqkZic/OpJkYkmepbnkPEJOM hKQtGv5My0s7SbewORBxarBbrm+uQ2sXem6Fhz1aIh5OU8JiMqc2uRest6/einTiGtW5 SYVp4lEBjJYY03d+tahZay5aDgJr716nnZbb/adWYIO5sJWGhTvwZWhhYYtYo9L1RREO ay3uK/cx81dLeIE08XZjxjJWdh92yJZ5WU6SntGw/izDAn4tXLJLmm0eZ9W40oY8QObQ k3mnkeUMdEh4B6DFYFJ1Qa2+ATBsQecxpSuQJyg5F1l/sWKtCHhqXD1/jVRe6RRGS6GJ f9qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709925547; x=1710530347; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UVPdnhePflx6M0Y1VqPM0q+L6FmThcZFrCpB9y8TiBA=; b=nIOugSsup4Xh7gL1wVsvhD+qgFeaW224Tb6LkQU68e7sASw60u0mFkEVYZc1HRPx7H nzD0FdnEzlyd7z4g2F/8WQ4WqCAIvFKeOwM0TpQM4MH772LuLPzpPxxlw8UtbgdHRo8E oTjZeNwicKklClJnlD658D47mT9+a4m4WvpfDmKajCDT20ECPA6GzU80071isdX0CXdb gKT/vlfUCJKrR8dKYu2XZ1pI5GEn2uEXFbsVVcQVNJq7VEKX79blDabEUDdq5Pg6FxqT kqtipHqD5+QdXGbuMFVE+ozeg398W1AXYL0i3g1bkN/FxXD5DR9puSluNjDVP+ZOYjCk K0KA== X-Forwarded-Encrypted: i=1; AJvYcCWIrokQrgZn6nD0UQhhXxWV7+h7ZgDwFHAe6tp/sZ1Q5Bf1729ZFc12xXTCupGwo6b1ZVtyFJENEbKb1wjEDZjG+d0= X-Gm-Message-State: AOJu0Yz6gQEay8siwqNwyaVElZrXNcsPOKKXRDP/U4b8FPbG9cQdz4iy sAfjqc6EHwaXqjP7Z5xRsNG2Bu6jEWCYinB2OnMZTBk9JwsrdCudacV+fy0jYAziKn7mGIKGu2/ wkwrTscKaUiEgyDlj4pprm29AxhDyeJzNWUx4 X-Google-Smtp-Source: AGHT+IFdGFOuBjADdIEbMGndT4cXpVw7kecKfUL2+HPEl9NNfe2ePqIdDjMJZlqSnkKRRD4FdpkvD+coDIp1GVneajE= X-Received: by 2002:adf:fd89:0:b0:33d:aaba:aa66 with SMTP id d9-20020adffd89000000b0033daabaaa66mr58532wrr.65.1709925547287; Fri, 08 Mar 2024 11:19:07 -0800 (PST) MIME-Version: 1.0 References: <20240229235134.2447718-1-axelrasmussen@google.com> In-Reply-To: From: Axel Rasmussen Date: Fri, 8 Mar 2024 11:18:28 -0800 Message-ID: Subject: Re: MGLRU premature memcg OOM on slow writes To: Chris Down Cc: cgroups@vger.kernel.org, hannes@cmpxchg.org, kernel-team@fb.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E9BCFC0019 X-Stat-Signature: j1gh1ka8zp3ciupotufyifa1d9s4gtss X-HE-Tag: 1709925548-844000 X-HE-Meta: U2FsdGVkX1/8YmmCnApkO/utN6vPNUY9IB5mK8llzzymKXXkukiYvxPQatNsBmSUOquJGcPdIB18kwdvA8UfPNcG9ozHIZ1sR1WNPpEoHKVk4DXYCmxSxVUAjGB8zq5dm22yAOMlVBVwmOYlZ6aCIDT8rflrwhUmUiOD+8QzfOC4NYfz4vjnR3UjI2PTZT6RQelHImOQWMpXhexF1JWeKwwQx283/lYdMiE2KmwYYFrVqbQpe8ToE9IQ21Hv0m4tgMRI2twgOdNVy7uuweMJxjjfYydB0yz7TaE1kt/b822oNrzp4dGZpCIc2g8QiwmoNqH57MepxlAfY59qCRNI20FWQB31ek3b7fYIsuAST4kg1XkX6a4rv4WTwFLHffs8Amr0Xov1Q2jLN3FAN+FJo4hrvyuUhMa8yFKjN9XfOmOCvGedMxqwt0Kj5+f85jQJTanM3KA387yiVP/tO9Sem3rHb92/TF7abbg8g+v16nrUUvBKdCLvFbJBQOKPPkWvE6XxYLqN+KMADGNioWOrJ753G8IytxLwijDxlsFdX11MKLexJ44qDHv15GZ6TbRCovAs+i47X1ZIJniL1rEgvzex+3VOnCoXkL3gM7uwFZVK0UX9d6ycyPY7DONCeM+crgslNktAawHaB7qLgcFlSV4dUWafObCIePUYyZaIGfKEPf73XgZFrfzaI2XwI5o0t7qY3nlSZfNvEPTip8+UzlEWDC1eJ6fQJEDxnZeKGHuusMeQjEkibQXf5REejSmvuFg8SAQ3R15dMZQVTZ2oP2sU+HNxyOTqwhnXrbQxuEMuN6VrZmPQTmbLjB7Otx/aNlCF3g5ELTP2XQnk9KoEuDQDmoWc4XJ8+vINLXmTBzROwWIeKvcvTZlFXpWHjcSbpapE7XE6DFNvb/riWr0DWTC19ElnuyLXRGbCIx8Jjp8k30vFYvzag59V2kV90rp9HGYHcbhnAkXsm2vIpwA hHJ8ILLw BQqrFXerzI8XwvwHIbZprD5z3lI/gqr4byQf6KFI3SshOkLV6PaeJelLMMwsFDzWJhzN5dzasI/M6erpyLfJLrebLQeQcYJbpb8nriS64GEhGI6iQWT9TRt3R7VlWfpF+ncGuz7wFcANa2xXDWGnFOzQ10Vswslt0lPS/GxtPWEqoOJeXnP+GjU1VYriGTGfVpA8/7yFEmHlEQwT3cV56ktb6FqjSkrg9g4ZL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 29, 2024 at 4:30=E2=80=AFPM Chris Down w= rote: > > Axel Rasmussen writes: > >A couple of dumb questions. In your test, do you have any of the followi= ng > >configured / enabled? > > > >/proc/sys/vm/laptop_mode > >memory.low > >memory.min > > None of these are enabled. The issue is trivially reproducible by writing= to > any slow device with memory.max enabled, but from the code it looks like = MGLRU > is also susceptible to this on global reclaim (although it's less likely = due to > page diversity). > > >Besides that, it looks like the place non-MGLRU reclaim wakes up the > >flushers is in shrink_inactive_list() (which calls wakeup_flusher_thread= s()). > >Since MGLRU calls shrink_folio_list() directly (from evict_folios()), I = agree it > >looks like it simply will not do this. > > > >Yosry pointed out [1], where MGLRU used to call this but stopped doing t= hat. It > >makes sense to me at least that doing writeback every time we age is too > >aggressive, but doing it in evict_folios() makes some sense to me, basic= ally to > >copy the behavior the non-MGLRU path (shrink_inactive_list()) has. > > Thanks! We may also need reclaim_throttle(), depending on how you impleme= nt it. > Current non-MGLRU behaviour on slow storage is also highly suspect in ter= ms of > (lack of) throttling after moving away from VMSCAN_THROTTLE_WRITEBACK, bu= t one > thing at a time :-) Hmm, so I have a patch which I think will help with this situation, but I'm having some trouble reproducing the problem on 6.8-rc7 (so then I can verify the patch fixes it). If I understand the issue right, all we should need to do is get a slow filesystem, and then generate a bunch of dirty file pages on it, while running in a tightly constrained memcg. To that end, I tried the following script. But, in reality I seem to get little or no accumulation of dirty file pages. I thought maybe fio does something different than rsync which you said you originally tried, so I also tried rsync (copying /usr/bin into this loop mount) and didn't run into an OOM situation either. Maybe some dirty ratio settings need tweaking or something to get the behavior you see? Or maybe my test has a dumb mistake in it. :) #!/usr/bin/env bash echo 0 > /proc/sys/vm/laptop_mode || exit 1 echo y > /sys/kernel/mm/lru_gen/enabled || exit 1 echo "Allocate disk image" IMAGE_SIZE_MIB=3D1024 IMAGE_PATH=3D/tmp/slow.img dd if=3D/dev/zero of=3D$IMAGE_PATH bs=3D1024k count=3D$IMAGE_SIZE_MIB || ex= it 1 echo "Setup loop device" LOOP_DEV=3D$(losetup --show --find $IMAGE_PATH) || exit 1 LOOP_BLOCKS=3D$(blockdev --getsize $LOOP_DEV) || exit 1 echo "Create dm-slow" DM_NAME=3Ddm-slow DM_DEV=3D/dev/mapper/$DM_NAME echo "0 $LOOP_BLOCKS delay $LOOP_DEV 0 100" | dmsetup create $DM_NAME || ex= it 1 echo "Create fs" mkfs.ext4 "$DM_DEV" || exit 1 echo "Mount fs" MOUNT_PATH=3D"/tmp/$DM_NAME" mkdir -p "$MOUNT_PATH" || exit 1 mount -t ext4 "$DM_DEV" "$MOUNT_PATH" || exit 1 echo "Generate dirty file pages" systemd-run --wait --pipe --collect -p MemoryMax=3D32M \ fio -name=3Dwrites -directory=3D$MOUNT_PATH -readwrite=3Drandwrite = \ -numjobs=3D10 -nrfiles=3D90 -filesize=3D1048576 \ -fallocate=3Dposix \ -blocksize=3D4k -ioengine=3Dmmap \ -direct=3D0 -buffered=3D1 -fsync=3D0 -fdatasync=3D0 -sync=3D0 \ -runtime=3D300 -time_based