From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f178.google.com (mail-ob0-f178.google.com [209.85.214.178]) by kanga.kvack.org (Postfix) with ESMTP id D45066B0253 for ; Tue, 22 Sep 2015 14:52:51 -0400 (EDT) Received: by obbda8 with SMTP id da8so15461723obb.1 for ; Tue, 22 Sep 2015 11:52:51 -0700 (PDT) Received: from mail-ob0-f170.google.com (mail-ob0-f170.google.com. [209.85.214.170]) by mx.google.com with ESMTPS id 11si1811362obs.46.2015.09.22.11.52.51 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 11:52:51 -0700 (PDT) Received: by obbbh8 with SMTP id bh8so15448115obb.0 for ; Tue, 22 Sep 2015 11:52:51 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1442903021-3893-1-git-send-email-mingo@kernel.org> <1442903021-3893-6-git-send-email-mingo@kernel.org> From: Andy Lutomirski Date: Tue, 22 Sep 2015 11:52:31 -0700 Message-ID: Subject: Re: [PATCH 05/11] mm: Introduce arch_pgd_init_late() Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Ingo Molnar , Linux Kernel Mailing List , linux-mm , Andrew Morton , Denys Vlasenko , Brian Gerst , Peter Zijlstra , Borislav Petkov , "H. Peter Anvin" , Oleg Nesterov , Waiman Long , Thomas Gleixner On Tue, Sep 22, 2015 at 11:44 AM, Linus Torvalds wrote: > On Tue, Sep 22, 2015 at 11:37 AM, Andy Lutomirski wrote: >> kinds of mess. >> >> I don't think that anyone really wants to move #PF to IST, which means >> that we simply cannot handle vmalloc faults that happen when switching >> stacks after SYSCALL, no matter what fanciness we shove into the >> page_fault asm. > > But that's fine. The kernel stack is special. So yes, we want to make > sure that the kernel stack is always mapped in the thread whose stack > it is. > > But that's not a big and onerous guarantee to make. Not when the > *real* problem is "random vmalloc allocations made by other processes > that we are not in the least interested in, and we don't want to add > synchronization for". > It's the kernel stack, the TSS (for sp0) and rsp_scratch at least. But yes, that's not that onerous, and it's never lazily initialized elsewhere. How about this (long-term, not right now): Never free pgd entries. For each pgd, track the number of populated kernel entries. Also track the global (init_mm) number of existing kernel entries. At context switch time, if new_pgd has fewer entries that the total, sync it. This hits *at most* 256 times per thread, and otherwise it's just a single unlikely branch. It guarantees that we only ever take a vmalloc fault when accessing maps that didn't exist when we last context switched, which gets us all of the important percpu stuff and the kernel stack, even if we schedule onto a cpu that didn't exist when the mm was created. --Andy > Linus -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org