From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B20EC2BD09 for ; Mon, 24 Jun 2024 17:32:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B79996B038F; Mon, 24 Jun 2024 13:32:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B29C96B0397; Mon, 24 Jun 2024 13:32:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CABA6B0398; Mon, 24 Jun 2024 13:32:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7F5D66B038F for ; Mon, 24 Jun 2024 13:32:47 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 14BA3C11F0 for ; Mon, 24 Jun 2024 17:32:47 +0000 (UTC) X-FDA: 82266477174.15.78F1192 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 3721C14002A for ; Mon, 24 Jun 2024 17:32:45 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="iJ/G5sbH"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719250354; a=rsa-sha256; cv=none; b=WQ5vvftqDxh/sNEaawGg7vTFquKHd0S5hVg3DQ4FQzyCMqrOETfHYZBk9kZUEf0FlhPR7q MSWv3vR9li422y7PaR1VfN93i4vzxPWyMiHXgqtRc5pGFa0SIe47wXm2o/U5p5eApXheWw u7SxRi75Cv94HHbWLYrwQAnPk8KmKz0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="iJ/G5sbH"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719250354; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=ONKIJ06lQQFZ/bzHJcIPyP/HDH9k868Uw8F/3bNz1nw=; b=L2oY/fJX5XRjGVDr/3KqoIvJg9eJ/NhSurOhri8U/T69mMTS7VILci4taEd5OqkmPM7rq2 DlJasUkDS/czrltlDA1jm9QjWogelZCP/ynztDqeunHbyaR7T2pcwj8S63L09jXdjsQnry B1U4zcerDJZotgvOsE0hr6U6pcsgEQM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1719250364; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ONKIJ06lQQFZ/bzHJcIPyP/HDH9k868Uw8F/3bNz1nw=; b=iJ/G5sbHZOCJStPiCqL4tfMudZHJzWZFJLHnyxqKIcr7zxpFN7h90OdWPHThn5a3qfE5l5 4TR1EBfYscmvXSh7fhKgKckmlWifRjNWLBNoXUPQ5IGVN9E9gIeQq+nmN1DkIVy/h2fkc9 ftnhqFxOILZA2df/EMONKOqXfMF9WDk= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-38-Fd8pM_KhMCy9WM6iDsRFPg-1; Mon, 24 Jun 2024 13:32:41 -0400 X-MC-Unique: Fd8pM_KhMCy9WM6iDsRFPg-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B1CFC195608D; Mon, 24 Jun 2024 17:32:38 +0000 (UTC) Received: from [10.22.17.135] (unknown [10.22.17.135]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B38971956087; Mon, 24 Jun 2024 17:32:35 +0000 (UTC) Message-ID: <6c3fbc2d-85d9-4502-b43c-0950ccdd6f7e@redhat.com> Date: Mon, 24 Jun 2024 13:32:34 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Johannes Weiner , Chris Down , Yu Zhao , Axel Rasmussen Cc: Linux Kernel Mailing List , Linux Memory Management List , Rafael Aquini , "cgroups@vger.kernel.org" From: Waiman Long Subject: MGLRU OOM problem Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Rspamd-Queue-Id: 3721C14002A X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: jzxwbn647cwh98qyh9hfbdkhuonqqfi7 X-HE-Tag: 1719250365-524942 X-HE-Meta: U2FsdGVkX199rxq+f+R18fdSYo9dPzL9o5lvN7KXzwCHxFZaHhxoQua4fKNJJDPjroB9XJ5DMii7K3M6QYmf210w6jH5sj5ZraGaGximrherphP7NqvwHgMEPiwEmK6iR7YhnwisAoCqy12TeMYtz4p9yEip0vejZ6X/kPylTolQshZtXEcjobBq3uS03SeRr+axSx9sxX29Vqscr2yt72rvD8N6u4yhR9vl9kd1CjWP2B80ck3UALbGxD/OtPuSwhqpYMcz1mb+s/ZEOywOFMzF9UO8rU19QmHtlHZ7JNLiEpxXSuMzqtS5Uchwnoj0ax6KVucWESn2kOrIxl8oVlaKanUIdgosQua1t/sdz0G/KOwOtF8NGI/OGXeaQ8k4/IdLExFCziEaGoGuezQWEshhCobwO+Iy9KxIfOQzuuSqR4SavXivSnDZbKh8+SS7dG7pjyC/GPw9OH2wIqqzqrUrraTm1+mHCkzAMspiUqJcLnOuHJZnz/Kv0LuupaaKONNlNJPT7uV0V9syBjG1pVBHToVNXjhWmeHmaInrW/NN3hnWehb8w7zuoZe0HmQxpiAS7J5FGIZWQhzmOV3b/cufzgmTveMBgvP1AjFHocMiyGnRoZAc1p9ogCKqTpkuZGlmcZlbgaYgt9GoHvlsPV/gnqWlgGAssdljOjEnb9yCGaLKxL+7hX5yQ8o++aPkN3tBvpdS8SzcGR/BvODwyIdwfrj3hfanyq2Td3qBtH0W0e2AzCSMtmlHLBHeyy5Q+sOvqp0rlvEAxE9e2daqvtBSI4OR48rqQVmWHlOB4f8g8qoc25qeuaVbewLOhJRFByGGXzZIUd7Wl0ZY97OzjFMoHzM/G+SlajgQ3MIPKMpmn53P6+YCnt2qZ7bdwi+UaaypgShtkE0qrkvZrhz9sTG4O/bRh4iTIbN0/RrU9eLrlSX9KslKHc9m12NXbvWjBj+4uTtjnr5G2AdugBq xefszSwp xWpH48tbX6Uzk2WuPnlaI7iMjGCnfqQ3z4Iw3G1arY5Sbb9XP5KAZZBfU9+uWvYvAh2t1DIdKpnARXN/YfuH/nHrCBO9FTmYTGN/1AWiTcceoVumD38emRjxi9Xe1ZrlcZE88x8UmINESiIeVI8XACxqTnfEl+8zUZXgYbPhcU7raOZw/irUUpq6OeikIY1rGKKhQfLf63xLnttcoFNwH+8vc6Sb8Iheox6JLEJYdIKgCJIMuLvoPiqh/SnySrJfwbFEi8ty0KQicgHUn88IezrXtBIxNIsF6lWuFvTAf9QcIqR0bZ4976oti0Th9iBYlr6M/jduCwB6SaToTo+wEVGWhknkKcyeLmF/dTbhctjOzuPX59biUAQQx85y16PON6f9wdPIrUH+vk18wvSqNQGJMeFXbYNl1pZeY9iJ24PRPzQ4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, We are hitting an OOM issue with our OpenShift middleware which is based on Kubernetes. Currently, it only sets memory.max when setting a memory limit.  OOM kills are rather frequently encountered when we try to write a large data file that exceeds memory.max to a NFS mount filesystem. I have bisected the problem down to commit 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle"). The following command can be used to cause an OOM kill when running in a memory cgroup with a memory.max limit of 600M on a NFS mount filesystem.  # dd if=/dev/urandom of=/disk/2G.bin bs=32K count=65536 status=progress iflag=fullblock In my case, I can cause an OOM when I ran the reproducer the 2nd time in a test system. In the first successful run, the reported data rate was:   2147483648 bytes (2.1 GB, 2.0 GiB) copied, 57.5474 s, 37.3 MB/s After reverting commit 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle"), OOM can no longer be reproduced and the new data rate was:   2147483648 bytes (2.1 GB, 2.0 GiB) copied, 25.694 s, 83.6 MB/s If I disabled MGLRU (echo 0 > /sys/kernel/mm/lru_gen/enabled), the data rate was:   2147483648 bytes (2.1 GB, 2.0 GiB) copied, 21.184 s, 101 MB/s I know that the purpose of commit 14aa8b2d5c2e to prevent premature aging of SSDs. However I would like to find a way to wake up the flusher whenever the cgroup is under memory pressure and have a lot of dirty pages, but I don't have a solid clue yet. I am aware that there was a previous discussion about this commit in [1], so I would like to engage the same community to see if there can be a proper solution to this problem. [1] https://lore.kernel.org/lkml/ZcWOh9u3uqZjNFMa@chrisdown.name/ Cheers, Longman