From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D144CC4345F for ; Wed, 24 Apr 2024 15:20:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64AAA6B0287; Wed, 24 Apr 2024 11:20:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FA456B0288; Wed, 24 Apr 2024 11:20:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49BD56B0289; Wed, 24 Apr 2024 11:20:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 326CB6B0287 for ; Wed, 24 Apr 2024 11:20:37 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id EF3F1160300 for ; Wed, 24 Apr 2024 15:20:36 +0000 (UTC) X-FDA: 82044787272.13.EC21928 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id BCCF7140016 for ; Wed, 24 Apr 2024 15:20:34 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TNfA9BDf; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713972034; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=E+k+qItisaC2I4JFxN1Z19CUJgZ7zuhVeyf3a7oQINI=; b=qa6FtwBmNI7TLduQ4zYHLSN1ivHT9IchHcelXo8oaDyDQSy0rd9a0G5nAiEo9qHeis6r48 N6/kMUeF9LmAaVI7DUnZoQMSUbJlKOk2XIjHxI71TPFCW1Half2AH7aeu7oO6UaUYBTP4+ o+3nif9M/KzOJuVsGZcCXwvTe1zpraQ= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TNfA9BDf; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713972034; a=rsa-sha256; cv=none; b=H7GZfJvvz0S02/C6tQx/FxjAgP6lqdCPe8NrdhYrVedYvuQcJX7XP4YwFrKJq/d3QyQUHS Th9kgwYX2rUYeWAmtxf6OIYQWnug3V/5yThqOqa7zF1XvAo516iWQNbVng8ViD7aeNM5fu e9SD608tXIl2ntZOmPweUk9oFVLJpOc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1713972034; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=E+k+qItisaC2I4JFxN1Z19CUJgZ7zuhVeyf3a7oQINI=; b=TNfA9BDfP23gKFswtBI5HfPz1FNOzMUlJxo67l5bJABS9Am65cZzgBia1/qpoBF4C0cFAz 8/vRpKYAK37PVu4szwz1BEtXD8gsSL9gGiLYIMzN3aWT6PzlRNW7PZXDUH2/83EeslbkVI skU85ljrypskqntNGYsXsiPJv8xS5F4= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-645-eJWNT7qmNAyLsVJRfbbafw-1; Wed, 24 Apr 2024 11:20:32 -0400 X-MC-Unique: eJWNT7qmNAyLsVJRfbbafw-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6a094939dedso21256d6.3 for ; Wed, 24 Apr 2024 08:20:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713972032; x=1714576832; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=E+k+qItisaC2I4JFxN1Z19CUJgZ7zuhVeyf3a7oQINI=; b=JyAz2JDr81UvIH262CMZAG/GC4w4ae9xuuYN2DjFoRPqR68/IOA3aGDaYjRZIf+4ts 5AbogbO6O6gPsIqk7LHeGaJDPw08TIV5jvjsTW75rCJfsQNAVqInQOEhOmihmgNK9znO /GXuyjXL6oLRdr4wm8dJ4x40lPRTM6gw0RbXACIcWGuMf4VpwmIQCElPF56/yZGnzbyB k5bGrGEItVbwlDzHnApVcGV+GH8p+Vynyz3qiIjfaam4iC1ebQTpD+JxT27KyKQ8YpzV I/y6Mh1TrbVmjhyA5OxJgc3+/vriaXNBBgvBdvJ0ACaD3o5q5hgBwyqks+lh8X60v61m Iqxw== X-Forwarded-Encrypted: i=1; AJvYcCUmajbBp5kxq6SbWRmT4JDK1IkvB+bn5ltt5T79ETumDT3b6aZnT/Ruo+RFhCXDpfXmrdjwrMdL1+cyNwaA3q7sNy0= X-Gm-Message-State: AOJu0YyVQx5OHvEmzLIVRlCXTjlRSuExB4yyMkDwiro8lNCxSkrq+fHT 1xxz2yiWgX0oSWX1Rn8LhR5L4qbVXTI3OkZdN/ld9m4h93bevAPbfRH4+GzMwsM2ehGUcWBvejb 5lSSdxLgC9GZ6am71pH2lHccZCNS0xnbtsv7AMvxzldr/mJ95 X-Received: by 2002:a05:620a:4010:b0:790:731d:f6c9 with SMTP id h16-20020a05620a401000b00790731df6c9mr2914556qko.6.1713972031616; Wed, 24 Apr 2024 08:20:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IET+lyygPbHMkG0I2ttkqu2X0kxmUbLgXRdZpoprC74CeyTODP1wh4UWV/v3kiwNHtLXBBoOA== X-Received: by 2002:a05:620a:4010:b0:790:731d:f6c9 with SMTP id h16-20020a05620a401000b00790731df6c9mr2914503qko.6.1713972030815; Wed, 24 Apr 2024 08:20:30 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id z16-20020ae9c110000000b0078d61d4c810sm1045502qki.0.2024.04.24.08.20.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 08:20:30 -0700 (PDT) Date: Wed, 24 Apr 2024 11:20:28 -0400 From: Peter Xu To: "Matthew Wilcox (Oracle)" Cc: Andrew Morton , linux-mm@kvack.org, Alex Williamson Subject: Re: [PATCH 1/5] mm: Free non-hugetlb large folios in a batch Message-ID: References: <20240405153228.2563754-1-willy@infradead.org> <20240405153228.2563754-2-willy@infradead.org> MIME-Version: 1.0 In-Reply-To: <20240405153228.2563754-2-willy@infradead.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: BCCF7140016 X-Stat-Signature: cicn94p1bima9n5axbt1id6a8914pjg1 X-HE-Tag: 1713972034-463248 X-HE-Meta: U2FsdGVkX195gc50in2AzcgdUsI0thld+lBdle/5hRJ5I2k55AZGnTVoZQ7k5GMC2ZQaMxH01px9x0Wc7yiIUTxzyRIUuQ8jeVeYL+D+avnE+uBJwUKS7vauRdgT6xck6QmY1DBirdVfbXPetskKrah/VWC8cxS/H+FjBACQ8V1hThCJ3bf0AdKpaMAj9yFlA8ycJbj2gIyXu97vnWnHYkfTmyOUJgQCrDzGpSkVLClrjehGl6VApXmJxKNtNfJKNJLL68YJJuVEJ+q2k/LLw7Ba0fqpcIZTnc26Gi7tXdxMus3rz30Ny52juqMdMwXLfE3bPPjx30tjcTotcHWl2F3imMIXNM2NfSmNSmfJNvlHdal8MJfVbEkbjvAInR8sLwyRDTnbjv9C9SG+dOBb3EUWzAvuQAPr+XK+SY4ijRP0+enMr70nTESbJZ7GgiO7YUXViKMdnVa3fViTH2K3IF0+k796w//xAKyrjYxwRZUxlSTAPK2vA615gpGx/PhMaf/a5TikLEdBjeWquktLDjepq9mwFExDp5wEfmCopEwCoUzG/tck2XLkfDJOO+RG8ASUEh620mzs1KCxw2pJxAzlR8+9fIOt6PtGFCYLjmVkj9XzuNxayfamizxUjaI7MEbXbXMeh401P9X1vvVRahRmJlpHMiR691Cw58xqLnn+AOOWNP3qBwYkRNNqS1jaYFzmj5TpXcagzZ8ct9lPpt6pXpmEw0isGxRQPSnwCxU0X2gdZLq2hOUd9uDzz0Q3Djy+vBiMYnjMsnz+loxcPjZ45AJXJsCVUqiU1JJsE98C5id0DewqxMusRMi1og8Vv1zqRsaWBQufMTOy5qXJREuNAnMtoRxkpJCJyemmoOb+lH85wPSiI5cOl+H5CQ1uR8bqgakXRjEboYeI/ynLFWAUJxdWW9yYYtr4Xjza+HKLeA9LdPJ+i8tSjyoxvxDR+lUqoj6sGd7FZew1aZs OWjsjk0L 63ThS7g4IBRtMxH3PvwVBw5PCvEckZID1wLAq/ggpeC2RDJLqqTazetXhsjuYnpogadmfvBl0fVKWtRYY1HpMc7LPLW5ImeXeorweekc7vxPdPbybN4EDdSW7OJFUrtIaFmgECjkDE5V+EWttTi1SItTvRz0dHCin1KdXb7MAs6po+KHJlzlGE4Rh0/RonD9HhISbmRlKrpz5lierKjakRSZx4w5huxVDPlcKkOTprgpyPSJyTv0RuECvzKWYi8ru5+bD8JjQGIuW1vXH5I67TTxgVNP2oQmKEajBMp2kB2cN5roRN1FU0vYLrAtHsTcHTTv5VyZVe+n1WWs9VcVvV2CORCQEMUfAXKWV4LH3aw+pI9QBdALT4IXqNii5zwIKPHCQAYl85/MjaXGpix9kz87op5b1UuIOwbQu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 05, 2024 at 04:32:23PM +0100, Matthew Wilcox (Oracle) wrote: > free_unref_folios() can now handle non-hugetlb large folios, so keep > normal large folios in the batch. hugetlb folios still need to be > handled specially. I believe that folios freed using put_pages_list() > cannot be accounted to a memcg (or the small folios would trip the "page > still charged to cgroup" warning), but put an assertion in to check that. There's such user, iommu uses put_pages_list() to free IOMMU pgtables, and they can be memcg accounted; since 2023 iommu_map switched to use GFP_KERNEL_ACCOUNT. I hit below panic when testing my local branch over mm-everthing when running some VFIO workloads. For this specific vfio use case, see 160912fc3d4a ("vfio/type1: account iommu allocations"). I think we should remove the VM_BUG_ON_FOLIO() line, as the memcg will then be properly taken care of later in free_pages_prepare(). Fixup attached at the end that will fix this crash for me. Thanks, [ 10.092411] kernel BUG at mm/swap.c:152! [ 10.092686] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 10.093034] CPU: 3 PID: 634 Comm: vfio-pci-mmap-t Tainted: G W 6.9.0-rc4-peterx+ #2 [ 10.093628] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 10.094361] RIP: 0010:put_pages_list+0x12b/0x150 [ 10.094675] Code: 6d 08 48 81 c4 00 01 00 00 5b 5d c3 cc cc cc cc 48 c7 c6 f0 fd 9f 82 e8 63 e8 03 00 0f 0b 48 c7 c6 48 00 a0 82 e8 55 e8 03 00 <0f> 0b 48 c7 c6 28 fe 9f 82 e8 47f [ 10.095896] RSP: 0018:ffffc9000221bc50 EFLAGS: 00010282 [ 10.096242] RAX: 0000000000000038 RBX: ffffea00042695c0 RCX: 0000000000000000 [ 10.096707] RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff [ 10.097177] RBP: ffffc9000221bd68 R08: 0000000000000000 R09: 0000000000000003 [ 10.097642] R10: ffffc9000221bb08 R11: ffffffff8335db48 R12: ffff8881070172c0 [ 10.098113] R13: ffff888102fd0000 R14: ffff888107017210 R15: ffff888110a6c7c0 [ 10.098586] FS: 0000000000000000(0000) GS:ffff888276a00000(0000) knlGS:0000000000000000 [ 10.099117] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 10.099494] CR2: 00007f1910000000 CR3: 000000000323c006 CR4: 0000000000770ef0 [ 10.099972] PKRU: 55555554 [ 10.100154] Call Trace: [ 10.100321] [ 10.100466] ? die+0x32/0x80 [ 10.100666] ? do_trap+0xd9/0x100 [ 10.100897] ? put_pages_list+0x12b/0x150 [ 10.101168] ? put_pages_list+0x12b/0x150 [ 10.101434] ? do_error_trap+0x81/0x110 [ 10.101688] ? put_pages_list+0x12b/0x150 [ 10.101957] ? exc_invalid_op+0x4c/0x60 [ 10.102216] ? put_pages_list+0x12b/0x150 [ 10.102484] ? asm_exc_invalid_op+0x16/0x20 [ 10.102771] ? put_pages_list+0x12b/0x150 [ 10.103026] ? 0xffffffff81000000 [ 10.103246] ? dma_pte_list_pagetables.isra.0+0x38/0xa0 [ 10.103592] ? dma_pte_list_pagetables.isra.0+0x9b/0xa0 [ 10.103933] ? dma_pte_clear_level+0x18c/0x1a0 [ 10.104228] ? domain_unmap+0x65/0x130 [ 10.104481] ? domain_unmap+0xe6/0x130 [ 10.104735] domain_exit+0x47/0x80 [ 10.104968] vfio_iommu_type1_detach_group+0x3f1/0x5f0 [ 10.105308] ? vfio_group_detach_container+0x3c/0x1a0 [ 10.105644] vfio_group_detach_container+0x60/0x1a0 [ 10.105977] vfio_group_fops_release+0x46/0x80 [ 10.106274] __fput+0x9a/0x2d0 [ 10.106479] task_work_run+0x55/0x90 [ 10.106717] do_exit+0x32f/0xb70 [ 10.106945] ? _raw_spin_unlock_irq+0x24/0x50 [ 10.107237] do_group_exit+0x32/0xa0 [ 10.107481] __x64_sys_exit_group+0x14/0x20 [ 10.107760] do_syscall_64+0x75/0x190 [ 10.108007] entry_SYSCALL_64_after_hwframe+0x76/0x7e ================================== diff --git a/mm/swap.c b/mm/swap.c index f0d478eee292..8ae5cd4ed180 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -149,7 +149,6 @@ void put_pages_list(struct list_head *pages) free_huge_folio(folio); continue; } - VM_BUG_ON_FOLIO(folio_memcg(folio), folio); /* LRU flag must be clear because it's passed using the lru */ if (folio_batch_add(&fbatch, folio) > 0) continue; -- Peter Xu