linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <koct9i@gmail.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Tim Hartrick <tim@edgecast.com>
Subject: Re: [PATCH] Repeated fork() causes SLAB to grow without bound
Date: Tue, 25 Nov 2014 16:13:16 +0400	[thread overview]
Message-ID: <CALYGNiPZmf4Y1_vX_FaiALKp-BPvct7fAiaPEjnDGnVx9paS9w@mail.gmail.com> (raw)
In-Reply-To: <20141125105953.GC4607@dhcp22.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 16416 bytes --]

On Tue, Nov 25, 2014 at 1:59 PM, Michal Hocko <mhocko@suse.cz> wrote:
> On Mon 24-11-14 11:09:40, Konstantin Khlebnikov wrote:
>> On Thu, Nov 20, 2014 at 6:03 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>> > On Thu, Nov 20, 2014 at 5:50 PM, Rik van Riel <riel@redhat.com> wrote:
>> >> -----BEGIN PGP SIGNED MESSAGE-----
>> >> Hash: SHA1
>> >>
>> >> On 11/20/2014 09:42 AM, Konstantin Khlebnikov wrote:
>> >>
>> >>> I'm thinking about limitation for reusing anon_vmas which might
>> >>> increase performance without breaking asymptotic estimation of
>> >>> count anon_vma in the worst case. For example this heuristic: allow
>> >>> to reuse only anon_vma with single direct descendant. It seems
>> >>> there will be arount up to two times more anon_vmas but
>> >>> false-aliasing must be much lower.
>>
>> Done. RFC patch in attachment.
>
> This is triggering BUG_ON(anon_vma->degree); in unlink_anon_vmas. I have
> applied the patch on top of 3.18.0-rc6.

It seems I've screwed up with counter if anon_vma is merged in anon_vma_prepare.
Increment must be in the next if block:

--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -182,8 +182,6 @@ int anon_vma_prepare(struct vm_area_struct *vma)
                        if (unlikely(!anon_vma))
                                goto out_enomem_free_avc;
                        allocated = anon_vma;
-                       /* Bump degree, root anon_vma is its own parent. */
-                       anon_vma->degree++;
                }

                anon_vma_lock_write(anon_vma);
@@ -192,6 +190,7 @@ int anon_vma_prepare(struct vm_area_struct *vma)
                if (likely(!vma->anon_vma)) {
                        vma->anon_vma = anon_vma;
                        anon_vma_chain_link(vma, avc, anon_vma);
+                       anon_vma->degree++;
                        allocated = NULL;
                        avc = NULL;
                }

I've tested it with trinity but probably isn't long enough.

>
> [   12.380189] ------------[ cut here ]------------
> [   12.380221] kernel BUG at mm/rmap.c:385!
> [   12.380239] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [   12.380272] Modules linked in: i915 cfbfillrect cfbimgblt i2c_algo_bit fbcon bitblit softcursor cfbcopyarea font drm_kms_helper drm fb fbdev binfmt_misc fuse uvcvideo videobuf2_vmalloc videobuf2_memops arc4 videobuf2_core v4l2_common sdhci_pci iwldvm videodev media mac80211 i2c_i801 i2c_core sdhci mmc_core iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm video backlight snd_timer snd
> [   12.380518] CPU: 1 PID: 3704 Comm: kdm_greet Not tainted 3.18.0-rc6-test-00001-gf5bc00c103ff #409
> [   12.380554] Hardware name: Dell Inc. Latitude E6320/09PHH9, BIOS A08 10/18/2011
> [   12.380584] task: ffff8801272bc2c0 ti: ffff8800bcaf0000 task.ti: ffff8800bcaf0000
> [   12.380614] RIP: 0010:[<ffffffff81125f09>]  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [   12.380653] RSP: 0018:ffff8800bcaf3d28  EFLAGS: 00010286
> [   12.380676] RAX: ffff8800bcb3e690 RBX: ffff8800bcb35e28 RCX: ffff8801272bcb60
> [   12.380706] RDX: ffff8800bcb38e70 RSI: 0000000000000001 RDI: ffff8800bcb38e70
> [   12.380734] RBP: ffff8800bcaf3d78 R08: 0000000000000000 R09: 0000000000000000
> [   12.380764] R10: 0000000000000000 R11: ffff8800bcb3e6a0 R12: ffff8800bcb3e680
> [   12.380793] R13: ffff8800bcb3e690 R14: ffff8800bcb38e70 R15: ffff8800bcb38e70
> [   12.380822] FS:  0000000000000000(0000) GS:ffff88012d440000(0000) knlGS:0000000000000000
> [   12.380855] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   12.380880] CR2: 00007fcd2603b0e8 CR3: 0000000001a11000 CR4: 00000000000407e0
> [   12.380908] Stack:
> [   12.380918]  ffff8801272e9dc0 ffff8800bcb35e38 ffff8800bcb35e38 ffff8800bcb3e680
> [   12.380953]  ffff8800bcaf3d78 ffff8800bcb35dc0 ffff8800bcaf3dd8 0000000000000000
> [   12.380989]  0000000000000000 ffff8800bcb35dc0 ffff8800bcaf3dc8 ffffffff81119e26
> [   12.381024] Call Trace:
> [   12.381038]  [<ffffffff81119e26>] free_pgtables+0x8e/0xcc
> [   12.381062]  [<ffffffff81121ac1>] exit_mmap+0x84/0x123
> [   12.381086]  [<ffffffff8103ff09>] mmput+0x5e/0xbb
> [   12.381107]  [<ffffffff81044d8c>] do_exit+0x39c/0x97e
> [   12.381131]  [<ffffffff810f49b4>] ? context_tracking_user_exit+0x79/0x116
> [   12.381160]  [<ffffffff8127f43a>] ? __this_cpu_preempt_check+0x13/0x15
> [   12.381188]  [<ffffffff810453f1>] do_group_exit+0x4c/0xc9
> [   12.381212]  [<ffffffff81045482>] SyS_exit_group+0x14/0x14
> [   12.381238]  [<ffffffff81524f52>] system_call_fastpath+0x12/0x17
> [   12.381262] Code: 32 f5 ff 49 8b 45 78 48 8b 18 4c 8d 60 f0 48 83 eb 10 4d 8d 6c 24 10 4c 3b 6d b8 74 3d 49 8b 7c 24 08 83 bf 98 00 00 00 00 74 02 <0f> 0b f0 ff 8f 88 00 00 00 74 1d 4c 89 ef e8 61 96 15 00 4c 89
> [   12.381445] RIP  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [   12.381473]  RSP <ffff8800bcaf3d28>
> [   12.386659] ---[ end trace 5761ee18fca12427 ]---
> [   12.386662] Fixing recursive fault but reboot is needed!
> [   13.158240] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   13.259294] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   13.259468] IPv6: ADDRCONF(NETDEV_UP): lan0: link is not ready
> [   16.790917] e1000e: lan0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   16.790957] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
> [   18.846524] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
> [   18.846742] iwlwifi 0000:02:00.0: Radio type=0x0-0x3-0x1
> [   18.941594] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [   19.145595] e1000e: lan0 NIC Link is Down
> [   19.287399] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   19.391325] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   19.391475] IPv6: ADDRCONF(NETDEV_UP): lan0: link is not ready
> [   19.573640] e1000e: lan0 NIC Link is Down
> [   19.717813] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   19.819729] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   19.819883] IPv6: ADDRCONF(NETDEV_UP): lan0: link is not ready
> [   22.938849] e1000e: lan0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   22.938889] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
> [   23.404027] ------------[ cut here ]------------
> [   23.404056] kernel BUG at mm/rmap.c:385!
> [   23.404074] invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
> [   23.404107] Modules linked in: i915 cfbfillrect cfbimgblt i2c_algo_bit fbcon bitblit softcursor cfbcopyarea font drm_kms_helper drm fb fbdev binfmt_misc fuse uvcvideo videobuf2_vmalloc videobuf2_memops arc4 videobuf2_core v4l2_common sdhci_pci iwldvm videodev media mac80211 i2c_i801 i2c_core sdhci mmc_core iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm video backlight snd_timer snd
> [   23.404353] CPU: 1 PID: 4506 Comm: synaptikscfg Tainted: G      D        3.18.0-rc6-test-00001-gf5bc00c103ff #409
> [   23.404395] Hardware name: Dell Inc. Latitude E6320/09PHH9, BIOS A08 10/18/2011
> [   23.404425] task: ffff8800a337c2c0 ti: ffff88009f4ec000 task.ti: ffff88009f4ec000
> [   23.404455] RIP: 0010:[<ffffffff81125f09>]  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [   23.404494] RSP: 0018:ffff88009f4efd28  EFLAGS: 00010282
> [   23.405766] RAX: ffff88009f54d010 RBX: ffff88009f54c488 RCX: 0000000000000000
> [   23.407062] RDX: ffff88009f5a3a50 RSI: 0000000000000001 RDI: ffff88009f5a3a50
> [   23.408352] RBP: ffff88009f4efd78 R08: 0000000000000000 R09: 0000000000000000
> [   23.409597] R10: 0000000000000000 R11: ffff88009f54d020 R12: ffff88009f54d000
> [   23.410816] R13: ffff88009f54d010 R14: ffff88009f5a3a50 R15: ffff88009f5a3a50
> [   23.411998] FS:  0000000000000000(0000) GS:ffff88012d440000(0000) knlGS:0000000000000000
> [   23.413167] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   23.414320] CR2: 00007f7a855608f0 CR3: 00000000a328c000 CR4: 00000000000407e0
> [   23.415471] Stack:
> [   23.416603]  ffff8800a3390e00 ffff88009f54c498 ffff88009f54c498 ffff88009f54d000
> [   23.417747]  ffff88009f4efd78 ffff88009f54c420 ffff88009f4efdd8 0000000000000000
> [   23.418892]  0000000000000000 ffff88009f54c420 ffff88009f4efdc8 ffffffff81119e26
> [   23.420027] Call Trace:
> [   23.421153]  [<ffffffff81119e26>] free_pgtables+0x8e/0xcc
> [   23.422273]  [<ffffffff81121ac1>] exit_mmap+0x84/0x123
> [   23.423411]  [<ffffffff81044d48>] ? do_exit+0x358/0x97e
> [   23.424537]  [<ffffffff8103ff09>] mmput+0x5e/0xbb
> [   23.425665]  [<ffffffff81044d8c>] do_exit+0x39c/0x97e
> [   23.426766]  [<ffffffff810f49b4>] ? context_tracking_user_exit+0x79/0x116
> [   23.427866]  [<ffffffff8127f43a>] ? __this_cpu_preempt_check+0x13/0x15
> [   23.428962]  [<ffffffff810453f1>] do_group_exit+0x4c/0xc9
> [   23.430064]  [<ffffffff81045482>] SyS_exit_group+0x14/0x14
> [   23.431162]  [<ffffffff81524f52>] system_call_fastpath+0x12/0x17
> [   23.432262] Code: 32 f5 ff 49 8b 45 78 48 8b 18 4c 8d 60 f0 48 83 eb 10 4d 8d 6c 24 10 4c 3b 6d b8 74 3d 49 8b 7c 24 08 83 bf 98 00 00 00 00 74 02 <0f> 0b f0 ff 8f 88 00 00 00 74 1d 4c 89 ef e8 61 96 15 00 4c 89
> [   23.434722] RIP  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [   23.435924]  RSP <ffff88009f4efd28>
> [   23.441996] ---[ end trace 5761ee18fca12428 ]---
> [   23.442001] Fixing recursive fault but reboot is needed!
> [  838.179454] ------------[ cut here ]------------
> [  838.180658] kernel BUG at mm/rmap.c:385!
> [  838.181843] invalid opcode: 0000 [#3] PREEMPT SMP DEBUG_PAGEALLOC
> [  838.183046] Modules linked in: i915 cfbfillrect cfbimgblt i2c_algo_bit fbcon bitblit softcursor cfbcopyarea font drm_kms_helper drm fb fbdev binfmt_misc fuse uvcvideo videobuf2_vmalloc videobuf2_memops arc4 videobuf2_core v4l2_common sdhci_pci iwldvm videodev media mac80211 i2c_i801 i2c_core sdhci mmc_core iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm video backlight snd_timer snd
> [  838.186983] CPU: 1 PID: 6643 Comm: colord-sane Tainted: G      D        3.18.0-rc6-test-00001-gf5bc00c103ff #409
> [  838.188240] Hardware name: Dell Inc. Latitude E6320/09PHH9, BIOS A08 10/18/2011
> [  838.189503] task: ffff8800c4fd8000 ti: ffff880079c6c000 task.ti: ffff880079c6c000
> [  838.190765] RIP: 0010:[<ffffffff81125f09>]  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [  838.192045] RSP: 0018:ffff880079c6fb68  EFLAGS: 00010286
> [  838.193324] RAX: ffff8800c5a70150 RBX: ffff8800a6fd5748 RCX: 0000000000000000
> [  838.194616] RDX: ffff8800a5379840 RSI: 0000000000000001 RDI: ffff8800a5379840
> [  838.195879] RBP: ffff880079c6fbb8 R08: 0000000000000000 R09: 0000000000000000
> [  838.197100] R10: 0000000000000000 R11: ffff8800c5a70160 R12: ffff8800c5a70140
> [  838.198289] R13: ffff8800c5a70150 R14: ffff8800a5379840 R15: ffff8800a5379840
> [  838.199448] FS:  0000000000000000(0000) GS:ffff88012d440000(0000) knlGS:0000000000000000
> [  838.200604] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  838.201753] CR2: 00007fdfd692cde8 CR3: 0000000079d0d000 CR4: 00000000000407e0
> [  838.202902] Stack:
> [  838.204029]  ffff88011e6fc540 ffff8800a6fd5758 ffff8800a6fd5758 ffff8800c5a70140
> [  838.205180]  ffff880079c6fbb8 ffff8800a6fd56e0 ffff880079c6fc18 0000000000000000
> [  838.206328]  0000000000000000 ffff8800a6fd56e0 ffff880079c6fc08 ffffffff81119e26
> [  838.207477] Call Trace:
> [  838.208614]  [<ffffffff81119e26>] free_pgtables+0x8e/0xcc
> [  838.209762]  [<ffffffff81121ac1>] exit_mmap+0x84/0x123
> [  838.210897]  [<ffffffff81044d48>] ? do_exit+0x358/0x97e
> [  838.212020]  [<ffffffff8103ff09>] mmput+0x5e/0xbb
> [  838.213132]  [<ffffffff81044d8c>] do_exit+0x39c/0x97e
> [  838.214232]  [<ffffffff8104ea16>] ? get_signal+0xdb/0x68a
> [  838.215324]  [<ffffffff8115de6d>] ? poll_select_copy_remaining+0xfe/0xfe
> [  838.216420]  [<ffffffff810453f1>] do_group_exit+0x4c/0xc9
> [  838.217521]  [<ffffffff8104ef82>] get_signal+0x647/0x68a
> [  838.218612]  [<ffffffff810f48bd>] ? context_tracking_user_enter+0xdb/0x159
> [  838.219705]  [<ffffffff8100228f>] do_signal+0x28/0x657
> [  838.220796]  [<ffffffff810c1e10>] ? __acct_update_integrals+0xbf/0xd4
> [  838.221894]  [<ffffffff81063e43>] ? preempt_count_sub+0xcd/0xdb
> [  838.222998]  [<ffffffff8106972e>] ? vtime_account_user+0x88/0x95
> [  838.224105]  [<ffffffff815243a3>] ? _raw_spin_unlock+0x32/0x47
> [  838.225205]  [<ffffffff810f49b4>] ? context_tracking_user_exit+0x79/0x116
> [  838.226308]  [<ffffffff810f49b4>] ? context_tracking_user_exit+0x79/0x116
> [  838.227401]  [<ffffffff810028fd>] do_notify_resume+0x3f/0x94
> [  838.228495]  [<ffffffff81525218>] int_signal+0x12/0x17
> [  838.229581] Code: 32 f5 ff 49 8b 45 78 48 8b 18 4c 8d 60 f0 48 83 eb 10 4d 8d 6c 24 10 4c 3b 6d b8 74 3d 49 8b 7c 24 08 83 bf 98 00 00 00 00 74 02 <0f> 0b f0 ff 8f 88 00 00 00 74 1d 4c 89 ef e8 61 96 15 00 4c 89
> [  838.231909] RIP  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [  838.233003]  RSP <ffff880079c6fb68>
> [  838.234248] ---[ end trace 5761ee18fca12429 ]---
> [  838.234251] Fixing recursive fault but reboot is needed!
> [ 1806.784267] ------------[ cut here ]------------
> [ 1806.785322] kernel BUG at mm/rmap.c:385!
> [ 1806.786361] invalid opcode: 0000 [#4] PREEMPT SMP DEBUG_PAGEALLOC
> [ 1806.787397] Modules linked in: i915 cfbfillrect cfbimgblt i2c_algo_bit fbcon bitblit softcursor cfbcopyarea font drm_kms_helper drm fb fbdev binfmt_misc fuse uvcvideo videobuf2_vmalloc videobuf2_memops arc4 videobuf2_core v4l2_common sdhci_pci iwldvm videodev media mac80211 i2c_i801 i2c_core sdhci mmc_core iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm video backlight snd_timer snd
> [ 1806.790682] CPU: 1 PID: 8135 Comm: DNS Resolver #7 Tainted: G      D        3.18.0-rc6-test-00001-gf5bc00c103ff #409
> [ 1806.791728] Hardware name: Dell Inc. Latitude E6320/09PHH9, BIOS A08 10/18/2011
> [ 1806.792779] task: ffff8800b3d40000 ti: ffff880079e34000 task.ti: ffff880079e34000
> [ 1806.793816] RIP: 0010:[<ffffffff81125f09>]  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [ 1806.794863] RSP: 0018:ffff880079e37d38  EFLAGS: 00010282
> [ 1806.795894] RAX: ffff8800b508d790 RBX: ffff8800bcaa4e28 RCX: 0000000000000000
> [ 1806.796948] RDX: ffff880124ce0f20 RSI: 0000000000000001 RDI: ffff880124ce0f20
> [ 1806.798011] RBP: ffff880079e37d88 R08: 0000000000000000 R09: 0000000000000000
> [ 1806.799048] R10: 00007fc2827f9db0 R11: ffff8800b508d7a0 R12: ffff8800b508d780
> [ 1806.800105] R13: ffff8800b508d790 R14: ffff880124ce0f20 R15: ffff880124ce0f20
> [ 1806.801143] FS:  00007fc2827fa700(0000) GS:ffff88012d440000(0000) knlGS:0000000000000000
> [ 1806.802206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1806.803244] CR2: 00007fc2c6b87000 CR3: 00000000a3063000 CR4: 00000000000407e0
> [ 1806.804305] Stack:
> [ 1806.805329]  00007fc280754000 ffff8800bcaa4e38 ffff8800bcaa4e38 ffff8800b508d780
> [ 1806.806382]  0000000081098bfb ffff8800bcaa4dc0 ffff880079e37df8 00007fc27ff00000
> [ 1806.807467]  00007fc280a00000 ffff8800bcaa4dc0 ffff880079e37dd8 ffffffff81119e26
> [ 1806.808536] Call Trace:
> [ 1806.809570]  [<ffffffff81119e26>] free_pgtables+0x8e/0xcc
> [ 1806.810617]  [<ffffffff8111fe4c>] unmap_region+0xc8/0xec
> [ 1806.811658]  [<ffffffff81270329>] ? __rb_erase_color+0x122/0x1f9
> [ 1806.812724]  [<ffffffff8112192b>] do_munmap+0x275/0x2f7
> [ 1806.813792]  [<ffffffff811219f5>] vm_munmap+0x48/0x61
> [ 1806.814841]  [<ffffffff81121a34>] SyS_munmap+0x26/0x2f
> [ 1806.815884]  [<ffffffff81524f52>] system_call_fastpath+0x12/0x17
> [ 1806.816951] Code: 32 f5 ff 49 8b 45 78 48 8b 18 4c 8d 60 f0 48 83 eb 10 4d 8d 6c 24 10 4c 3b 6d b8 74 3d 49 8b 7c 24 08 83 bf 98 00 00 00 00 74 02 <0f> 0b f0 ff 8f 88 00 00 00 74 1d 4c 89 ef e8 61 96 15 00 4c 89
> [ 1806.819300] RIP  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [ 1806.820457]  RSP <ffff880079e37d38>
> [ 1806.822068] ---[ end trace 5761ee18fca1242a ]---
> --
> Michal Hocko
> SUSE Labs

[-- Attachment #2: mm-prevent-endless-growth-of-anon_vma-hierarchy-v2 --]
[-- Type: application/octet-stream, Size: 5520 bytes --]

mm: prevent endless growth of anon_vma hierarchy

From: Konstantin Khlebnikov <koct9i@gmail.com>

Constantly forking task causes unlimited grow of anon_vma chain.
Each next child allocate new level of anon_vmas and links vmas to all
previous levels because it inherits pages from them. None of anon_vmas
cannot be freed because there might be pages which points to them.

This patch adds heuristic which decides to reuse existing anon_vma instead
of forking new one. It counts vmas and direct descendants for each anon_vma.
Anon_vma with degree lower than two will be reused at next fork.
As a result each anon_vma has either alive vma or at least two descendants,
endless chains are no longer possible and count of anon_vmas is no more than
two times more than count of vmas.

v2: update degree in anon_vma_prepare for merged anon_vma

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
---
 include/linux/rmap.h |   16 ++++++++++++++++
 mm/rmap.c            |   30 +++++++++++++++++++++++++++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index c0c2bce..b1d140c 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -45,6 +45,22 @@ struct anon_vma {
 	 * mm_take_all_locks() (mm_all_locks_mutex).
 	 */
 	struct rb_root rb_root;	/* Interval tree of private "related" vmas */
+
+	/*
+	 * Count of child anon_vmas and VMAs which points to this anon_vma.
+	 *
+	 * This counter is used for making decision about reusing old anon_vma
+	 * instead of forking new one. It allows to detect anon_vmas which have
+	 * just one direct descendant and no vmas. Reusing such anon_vma not
+	 * leads to significant preformance regression but prevents degradation
+	 * of anon_vma hierarchy to endless linear chain.
+	 *
+	 * Root anon_vma is never reused because it is its own parent and it has
+	 * at leat one vma or child, thus at fork it's degree is at least 2.
+	 */
+	unsigned degree;
+
+	struct anon_vma *parent;	/* Parent of this anon_vma */
 };
 
 /*
diff --git a/mm/rmap.c b/mm/rmap.c
index 19886fb..df5c44e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -72,6 +72,8 @@ static inline struct anon_vma *anon_vma_alloc(void)
 	anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
 	if (anon_vma) {
 		atomic_set(&anon_vma->refcount, 1);
+		anon_vma->degree = 1;	/* Reference for first vma */
+		anon_vma->parent = anon_vma;
 		/*
 		 * Initialise the anon_vma root to point to itself. If called
 		 * from fork, the root will be reset to the parents anon_vma.
@@ -188,6 +190,8 @@ int anon_vma_prepare(struct vm_area_struct *vma)
 		if (likely(!vma->anon_vma)) {
 			vma->anon_vma = anon_vma;
 			anon_vma_chain_link(vma, avc, anon_vma);
+			/* vma link if merged or child link for new root */
+			anon_vma->degree++;
 			allocated = NULL;
 			avc = NULL;
 		}
@@ -256,7 +260,17 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
 		anon_vma = pavc->anon_vma;
 		root = lock_anon_vma_root(root, anon_vma);
 		anon_vma_chain_link(dst, avc, anon_vma);
+
+		/*
+		 * Reuse existing anon_vma if its degree lower than two,
+		 * that means it has no vma and just one anon_vma child.
+		 */
+		if (!dst->anon_vma && anon_vma != src->anon_vma &&
+				anon_vma->degree < 2)
+			dst->anon_vma = anon_vma;
 	}
+	if (dst->anon_vma)
+		dst->anon_vma->degree++;
 	unlock_anon_vma_root(root);
 	return 0;
 
@@ -279,6 +293,9 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	if (!pvma->anon_vma)
 		return 0;
 
+	/* Drop inherited anon_vma, we'll reuse old one or allocate new. */
+	vma->anon_vma = NULL;
+
 	/*
 	 * First, attach the new VMA to the parent VMA's anon_vmas,
 	 * so rmap can find non-COWed pages in child processes.
@@ -286,6 +303,10 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	if (anon_vma_clone(vma, pvma))
 		return -ENOMEM;
 
+	/* An old anon_vma has been reused. */
+	if (vma->anon_vma)
+		return 0;
+
 	/* Then add our own anon_vma. */
 	anon_vma = anon_vma_alloc();
 	if (!anon_vma)
@@ -299,6 +320,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	 * lock any of the anon_vmas in this anon_vma tree.
 	 */
 	anon_vma->root = pvma->anon_vma->root;
+	anon_vma->parent = pvma->anon_vma;
 	/*
 	 * With refcounts, an anon_vma can stay around longer than the
 	 * process it belongs to. The root anon_vma needs to be pinned until
@@ -309,6 +331,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	vma->anon_vma = anon_vma;
 	anon_vma_lock_write(anon_vma);
 	anon_vma_chain_link(vma, avc, anon_vma);
+	anon_vma->parent->degree++;
 	anon_vma_unlock_write(anon_vma);
 
 	return 0;
@@ -339,12 +362,16 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
 		 * Leave empty anon_vmas on the list - we'll need
 		 * to free them outside the lock.
 		 */
-		if (RB_EMPTY_ROOT(&anon_vma->rb_root))
+		if (RB_EMPTY_ROOT(&anon_vma->rb_root)) {
+			anon_vma->parent->degree--;
 			continue;
+		}
 
 		list_del(&avc->same_vma);
 		anon_vma_chain_free(avc);
 	}
+	if (vma->anon_vma)
+		vma->anon_vma->degree--;
 	unlock_anon_vma_root(root);
 
 	/*
@@ -355,6 +382,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
 	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
 		struct anon_vma *anon_vma = avc->anon_vma;
 
+		BUG_ON(anon_vma->degree);
 		put_anon_vma(anon_vma);
 
 		list_del(&avc->same_vma);

  reply	other threads:[~2014-11-25 12:13 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20120816024610.GA5350@evergreen.ssec.wisc.edu>
2012-08-16 18:58 ` Rik van Riel
2012-08-18  0:03   ` Daniel Forrest
2012-08-18  3:46     ` Rik van Riel
2012-08-18  4:07       ` Daniel Forrest
2012-08-18  4:10         ` Rik van Riel
2012-08-20  8:00       ` Hugh Dickins
2012-08-20  9:39         ` Michel Lespinasse
2012-08-20 11:11           ` Andi Kleen
2012-08-20 11:17           ` Rik van Riel
2012-08-20 11:53             ` Michel Lespinasse
2012-08-20 19:11               ` Michel Lespinasse
2012-08-22  3:20           ` [RFC PATCH] " Michel Lespinasse
2012-08-22  3:29             ` Rik van Riel
2013-06-03 19:50               ` Daniel Forrest
2013-06-04 10:37                 ` Rik van Riel
2013-06-05 14:02                   ` Andrea Arcangeli
2014-11-14 16:30                 ` [PATCH] " Daniel Forrest
2014-11-18  0:02                   ` Andrew Morton
2014-11-18  1:41                     ` Daniel Forrest
2014-11-18  2:41                       ` Rik van Riel
2014-11-18 20:19                         ` Andrew Morton
2014-11-18 22:15                           ` Konstantin Khlebnikov
2014-11-18 23:02                             ` Konstantin Khlebnikov
2014-11-18 23:50                               ` Vlastimil Babka
2014-11-19 14:36                                 ` Konstantin Khlebnikov
2014-11-19 16:09                                   ` Vlastimil Babka
2014-11-19 16:58                                     ` Konstantin Khlebnikov
2014-11-19 23:14                                       ` Michel Lespinasse
2014-11-20 14:42                                         ` Konstantin Khlebnikov
2014-11-20 14:50                                           ` Rik van Riel
2014-11-20 15:03                                             ` Konstantin Khlebnikov
2014-11-24  7:09                                               ` Konstantin Khlebnikov
2014-11-25 10:59                                                 ` Michal Hocko
2014-11-25 12:13                                                   ` Konstantin Khlebnikov [this message]
2014-11-25 15:00                                                     ` Michal Hocko
2014-11-26 17:35                                                       ` Michal Hocko
2014-12-05 15:44                                                         ` Jerome Marchand
2014-11-20 15:27                                           ` Michel Lespinasse
2014-11-19  2:48                           ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALYGNiPZmf4Y1_vX_FaiALKp-BPvct7fAiaPEjnDGnVx9paS9w@mail.gmail.com \
    --to=koct9i@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=tim@edgecast.com \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox