* [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report @ 2023-10-17 19:39 Juntong Deng 2023-10-17 20:10 ` Ricardo B. Marliere 2023-10-25 19:22 ` Andrey Konovalov 0 siblings, 2 replies; 11+ messages in thread From: Juntong Deng @ 2023-10-17 19:39 UTC (permalink / raw) To: ryabinin.a.a, glider, andreyknvl, dvyukov, vincenzo.frascino, akpm Cc: kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees The idea came from the bug I was fixing recently, 'KASAN: slab-use-after-free Read in tls_encrypt_done'. This bug is caused by subtle race condition, where the data structure is freed early on another CPU, resulting in use-after-free. Like this bug, some of the use-after-free bugs are caused by race condition, but it is not easy to quickly conclude that the cause of the use-after-free is race condition if only looking at the stack trace. I did not think this use-after-free was caused by race condition at the beginning, it took me some time to read the source code carefully and think about it to determine that it was caused by race condition. By adding timestamps for Allocation, Free, and Error to the KASAN report, it will be much easier to determine if use-after-free is caused by race condition. If the free time is slightly before the error time, then there is a high probability that this is an error caused by race condition. If the free time is long before the error time, then this is obviously not caused by race condition, but by something else. In addition, I read the source code of KASAN, and it is not a difficult task to add the function of recording timestamps, which can be done by adding a member to struct kasan_track. If it is a good idea, I can do this part of the work. Welcome to discuss! Juntong Deng ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-10-17 19:39 [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report Juntong Deng @ 2023-10-17 20:10 ` Ricardo B. Marliere 2023-10-25 19:22 ` Andrey Konovalov 1 sibling, 0 replies; 11+ messages in thread From: Ricardo B. Marliere @ 2023-10-17 20:10 UTC (permalink / raw) To: Juntong Deng Cc: ryabinin.a.a, glider, andreyknvl, dvyukov, vincenzo.frascino, akpm, linux-mm, linux-kernel-mentees, linux-kernel, kasan-dev On 23/10/18 03:39AM, Juntong Deng wrote: > If the free time is slightly before the error time, then there is a > high probability that this is an error caused by race condition. > > If the free time is long before the error time, then this is obviously > not caused by race condition, but by something else. That sounds a bit arbitrary to me. How do you set the threshold for each case? I mean, the fact remains: an invalid read after the object being freed. Does it matter what it was caused by? It should be fixed regardless. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-10-17 19:39 [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report Juntong Deng 2023-10-17 20:10 ` Ricardo B. Marliere @ 2023-10-25 19:22 ` Andrey Konovalov 2023-10-29 9:05 ` Juntong Deng 1 sibling, 1 reply; 11+ messages in thread From: Andrey Konovalov @ 2023-10-25 19:22 UTC (permalink / raw) To: Juntong Deng Cc: ryabinin.a.a, glider, dvyukov, vincenzo.frascino, akpm, kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees On Tue, Oct 17, 2023 at 9:40 PM Juntong Deng <juntong.deng@outlook.com> wrote: > > The idea came from the bug I was fixing recently, > 'KASAN: slab-use-after-free Read in tls_encrypt_done'. > > This bug is caused by subtle race condition, where the data structure > is freed early on another CPU, resulting in use-after-free. > > Like this bug, some of the use-after-free bugs are caused by race > condition, but it is not easy to quickly conclude that the cause of the > use-after-free is race condition if only looking at the stack trace. > > I did not think this use-after-free was caused by race condition at the > beginning, it took me some time to read the source code carefully and > think about it to determine that it was caused by race condition. > > By adding timestamps for Allocation, Free, and Error to the KASAN > report, it will be much easier to determine if use-after-free is > caused by race condition. An alternative would be to add the CPU number to the alloc/free stack traces. Something like: Allocated by task 42 on CPU 2: (stack trace) The bad access stack trace already prints the CPU number. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-10-25 19:22 ` Andrey Konovalov @ 2023-10-29 9:05 ` Juntong Deng 2023-10-30 6:29 ` Dmitry Vyukov 0 siblings, 1 reply; 11+ messages in thread From: Juntong Deng @ 2023-10-29 9:05 UTC (permalink / raw) To: Andrey Konovalov Cc: ryabinin.a.a, glider, dvyukov, vincenzo.frascino, akpm, kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees On 2023/10/26 3:22, Andrey Konovalov wrote: > On Tue, Oct 17, 2023 at 9:40 PM Juntong Deng <juntong.deng@outlook.com> wrote: >> >> The idea came from the bug I was fixing recently, >> 'KASAN: slab-use-after-free Read in tls_encrypt_done'. >> >> This bug is caused by subtle race condition, where the data structure >> is freed early on another CPU, resulting in use-after-free. >> >> Like this bug, some of the use-after-free bugs are caused by race >> condition, but it is not easy to quickly conclude that the cause of the >> use-after-free is race condition if only looking at the stack trace. >> >> I did not think this use-after-free was caused by race condition at the >> beginning, it took me some time to read the source code carefully and >> think about it to determine that it was caused by race condition. >> >> By adding timestamps for Allocation, Free, and Error to the KASAN >> report, it will be much easier to determine if use-after-free is >> caused by race condition. > > An alternative would be to add the CPU number to the alloc/free stack > traces. Something like: > > Allocated by task 42 on CPU 2: > (stack trace) > > The bad access stack trace already prints the CPU number. Yes, that is a great idea and the CPU number would help a lot. But I think the CPU number cannot completely replace the free timestamp, because some freeing really should be done at another CPU. We need the free timestamp to help us distinguish whether it was freed a long time ago or whether it was caused to be freed during the current operation. I think both the CPU number and the timestamp should be displayed, more information would help us find the real cause of the error faster. Should I implement these features? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-10-29 9:05 ` Juntong Deng @ 2023-10-30 6:29 ` Dmitry Vyukov 2023-10-30 9:28 ` Juntong Deng 0 siblings, 1 reply; 11+ messages in thread From: Dmitry Vyukov @ 2023-10-30 6:29 UTC (permalink / raw) To: Juntong Deng Cc: Andrey Konovalov, ryabinin.a.a, glider, vincenzo.frascino, akpm, kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees On Sun, 29 Oct 2023 at 10:05, Juntong Deng <juntong.deng@outlook.com> wrote: > > On 2023/10/26 3:22, Andrey Konovalov wrote: > > On Tue, Oct 17, 2023 at 9:40 PM Juntong Deng <juntong.deng@outlook.com> wrote: > >> > >> The idea came from the bug I was fixing recently, > >> 'KASAN: slab-use-after-free Read in tls_encrypt_done'. > >> > >> This bug is caused by subtle race condition, where the data structure > >> is freed early on another CPU, resulting in use-after-free. > >> > >> Like this bug, some of the use-after-free bugs are caused by race > >> condition, but it is not easy to quickly conclude that the cause of the > >> use-after-free is race condition if only looking at the stack trace. > >> > >> I did not think this use-after-free was caused by race condition at the > >> beginning, it took me some time to read the source code carefully and > >> think about it to determine that it was caused by race condition. > >> > >> By adding timestamps for Allocation, Free, and Error to the KASAN > >> report, it will be much easier to determine if use-after-free is > >> caused by race condition. > > > > An alternative would be to add the CPU number to the alloc/free stack > > traces. Something like: > > > > Allocated by task 42 on CPU 2: > > (stack trace) > > > > The bad access stack trace already prints the CPU number. > > Yes, that is a great idea and the CPU number would help a lot. > > But I think the CPU number cannot completely replace the free timestamp, > because some freeing really should be done at another CPU. > > We need the free timestamp to help us distinguish whether it was freed > a long time ago or whether it was caused to be freed during the > current operation. > > I think both the CPU number and the timestamp should be displayed, more > information would help us find the real cause of the error faster. > > Should I implement these features? Hi Juntong, There is also an aspect of memory consumption. KASAN headers increase the size of every heap object. So we tried to keep them as compact as possible. At some point CPU numbers and timestamps (IIRC) were already part of the header, but we removed them to shrink the header to 16 bytes. PID gives a good approximation of potential races. I usually look at PIDs to understand if it's a "plain old single-threaded use-after-free", or free and access happened in different threads. Re timestamps, I see you referenced a syzbot report. With syzkaller most timestamps will be very close even for non-racing case. So if this is added, this should be added at least under a separate config. If you are looking for potential KASAN improvements, here is a good list: https://bugzilla.kernel.org/buglist.cgi?bug_status=__open__&component=Sanitizers&list_id=1134168&product=Memory%20Management ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-10-30 6:29 ` Dmitry Vyukov @ 2023-10-30 9:28 ` Juntong Deng 2023-10-30 10:10 ` Dmitry Vyukov 0 siblings, 1 reply; 11+ messages in thread From: Juntong Deng @ 2023-10-30 9:28 UTC (permalink / raw) To: Dmitry Vyukov Cc: Andrey Konovalov, ryabinin.a.a, glider, vincenzo.frascino, akpm, kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees On 2023/10/30 14:29, Dmitry Vyukov wrote: > On Sun, 29 Oct 2023 at 10:05, Juntong Deng <juntong.deng@outlook.com> wrote: >> >> On 2023/10/26 3:22, Andrey Konovalov wrote: >>> On Tue, Oct 17, 2023 at 9:40 PM Juntong Deng <juntong.deng@outlook.com> wrote: >>>> >>>> The idea came from the bug I was fixing recently, >>>> 'KASAN: slab-use-after-free Read in tls_encrypt_done'. >>>> >>>> This bug is caused by subtle race condition, where the data structure >>>> is freed early on another CPU, resulting in use-after-free. >>>> >>>> Like this bug, some of the use-after-free bugs are caused by race >>>> condition, but it is not easy to quickly conclude that the cause of the >>>> use-after-free is race condition if only looking at the stack trace. >>>> >>>> I did not think this use-after-free was caused by race condition at the >>>> beginning, it took me some time to read the source code carefully and >>>> think about it to determine that it was caused by race condition. >>>> >>>> By adding timestamps for Allocation, Free, and Error to the KASAN >>>> report, it will be much easier to determine if use-after-free is >>>> caused by race condition. >>> >>> An alternative would be to add the CPU number to the alloc/free stack >>> traces. Something like: >>> >>> Allocated by task 42 on CPU 2: >>> (stack trace) >>> >>> The bad access stack trace already prints the CPU number. >> >> Yes, that is a great idea and the CPU number would help a lot. >> >> But I think the CPU number cannot completely replace the free timestamp, >> because some freeing really should be done at another CPU. >> >> We need the free timestamp to help us distinguish whether it was freed >> a long time ago or whether it was caused to be freed during the >> current operation. >> >> I think both the CPU number and the timestamp should be displayed, more >> information would help us find the real cause of the error faster. >> >> Should I implement these features? > > Hi Juntong, > > There is also an aspect of memory consumption. KASAN headers increase > the size of every heap object. So we tried to keep them as compact as > possible. At some point CPU numbers and timestamps (IIRC) were already > part of the header, but we removed them to shrink the header to 16 > bytes. > PID gives a good approximation of potential races. I usually look at > PIDs to understand if it's a "plain old single-threaded > use-after-free", or free and access happened in different threads. > Re timestamps, I see you referenced a syzbot report. With syzkaller > most timestamps will be very close even for non-racing case. > So if this is added, this should be added at least under a separate config. > > If you are looking for potential KASAN improvements, here is a good list: > https://bugzilla.kernel.org/buglist.cgi?bug_status=__open__&component=Sanitizers&list_id=1134168&product=Memory%20Management Hi Dmitry, I think PID cannot completely replace timestamp for reason similar to CPU number, some frees really should be done in another thread, but it is difficult for us to distinguish if it is a free that was done some time ago, or under subtle race conditions. As to whether most of the timestamps will be very close even for non-racing case, this I am not sure, because I do not have enough samples. I agree that these features should be in a separate config and the user should be free to choose whether to enable them or not. We can divide KASAN into normal mode and depth mode. Normal mode records only minimal critical information, while depth mode records more potentially useful information. Also, honestly, I think a small amount of extra memory consumption should not stop us from recording more information. Because if someone enables KASAN for debugging, then memory consumption and performance are no longer his main concern. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-10-30 9:28 ` Juntong Deng @ 2023-10-30 10:10 ` Dmitry Vyukov 2023-10-30 11:32 ` Juntong Deng 0 siblings, 1 reply; 11+ messages in thread From: Dmitry Vyukov @ 2023-10-30 10:10 UTC (permalink / raw) To: Juntong Deng Cc: Andrey Konovalov, ryabinin.a.a, glider, vincenzo.frascino, akpm, kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees On Mon, 30 Oct 2023 at 10:28, Juntong Deng <juntong.deng@outlook.com> wrote: > > On 2023/10/30 14:29, Dmitry Vyukov wrote: > > On Sun, 29 Oct 2023 at 10:05, Juntong Deng <juntong.deng@outlook.com> wrote: > >> > >> On 2023/10/26 3:22, Andrey Konovalov wrote: > >>> On Tue, Oct 17, 2023 at 9:40 PM Juntong Deng <juntong.deng@outlook.com> wrote: > >>>> > >>>> The idea came from the bug I was fixing recently, > >>>> 'KASAN: slab-use-after-free Read in tls_encrypt_done'. > >>>> > >>>> This bug is caused by subtle race condition, where the data structure > >>>> is freed early on another CPU, resulting in use-after-free. > >>>> > >>>> Like this bug, some of the use-after-free bugs are caused by race > >>>> condition, but it is not easy to quickly conclude that the cause of the > >>>> use-after-free is race condition if only looking at the stack trace. > >>>> > >>>> I did not think this use-after-free was caused by race condition at the > >>>> beginning, it took me some time to read the source code carefully and > >>>> think about it to determine that it was caused by race condition. > >>>> > >>>> By adding timestamps for Allocation, Free, and Error to the KASAN > >>>> report, it will be much easier to determine if use-after-free is > >>>> caused by race condition. > >>> > >>> An alternative would be to add the CPU number to the alloc/free stack > >>> traces. Something like: > >>> > >>> Allocated by task 42 on CPU 2: > >>> (stack trace) > >>> > >>> The bad access stack trace already prints the CPU number. > >> > >> Yes, that is a great idea and the CPU number would help a lot. > >> > >> But I think the CPU number cannot completely replace the free timestamp, > >> because some freeing really should be done at another CPU. > >> > >> We need the free timestamp to help us distinguish whether it was freed > >> a long time ago or whether it was caused to be freed during the > >> current operation. > >> > >> I think both the CPU number and the timestamp should be displayed, more > >> information would help us find the real cause of the error faster. > >> > >> Should I implement these features? > > > > Hi Juntong, > > > > There is also an aspect of memory consumption. KASAN headers increase > > the size of every heap object. So we tried to keep them as compact as > > possible. At some point CPU numbers and timestamps (IIRC) were already > > part of the header, but we removed them to shrink the header to 16 > > bytes. > > PID gives a good approximation of potential races. I usually look at > > PIDs to understand if it's a "plain old single-threaded > > use-after-free", or free and access happened in different threads. > > Re timestamps, I see you referenced a syzbot report. With syzkaller > > most timestamps will be very close even for non-racing case. > > So if this is added, this should be added at least under a separate config. > > > > If you are looking for potential KASAN improvements, here is a good list: > > https://bugzilla.kernel.org/buglist.cgi?bug_status=__open__&component=Sanitizers&list_id=1134168&product=Memory%20Management > > Hi Dmitry, > > I think PID cannot completely replace timestamp for reason similar to > CPU number, some frees really should be done in another thread, but it > is difficult for us to distinguish if it is a free that was done some > time ago, or under subtle race conditions. I agree it's not a complete replacement, it just does not consume additional memory. > As to whether most of the timestamps will be very close even for > non-racing case, this I am not sure, because I do not have > enough samples. > > I agree that these features should be in a separate config and > the user should be free to choose whether to enable them or not. > > We can divide KASAN into normal mode and depth mode. Normal mode > records only minimal critical information, while depth mode records > more potentially useful information. > > Also, honestly, I think a small amount of extra memory consumption > should not stop us from recording more information. > > Because if someone enables KASAN for debugging, then memory consumption > and performance are no longer his main concern. There are a number of debugging tools created with the "performance does not matter" attitude. They tend to be barely usable, not usable in wide scale testing, not usable in canaries, etc. All of sanitizers were created with lots of attention to performance, attention on the level of the most performance critical production code (sanitizer code is hotter than any production piece of code). That's what made them so widely used. Think of interactive uses, smaller devices, etc. Please let's keep this attitude. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-10-30 10:10 ` Dmitry Vyukov @ 2023-10-30 11:32 ` Juntong Deng 2023-10-31 9:46 ` Dmitry Vyukov 0 siblings, 1 reply; 11+ messages in thread From: Juntong Deng @ 2023-10-30 11:32 UTC (permalink / raw) To: Dmitry Vyukov Cc: Andrey Konovalov, ryabinin.a.a, glider, vincenzo.frascino, akpm, kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees On 2023/10/30 18:10, Dmitry Vyukov wrote: > On Mon, 30 Oct 2023 at 10:28, Juntong Deng <juntong.deng@outlook.com> wrote: >> >> On 2023/10/30 14:29, Dmitry Vyukov wrote: >>> On Sun, 29 Oct 2023 at 10:05, Juntong Deng <juntong.deng@outlook.com> wrote: >>>> >>>> On 2023/10/26 3:22, Andrey Konovalov wrote: >>>>> On Tue, Oct 17, 2023 at 9:40 PM Juntong Deng <juntong.deng@outlook.com> wrote: >>>>>> >>>>>> The idea came from the bug I was fixing recently, >>>>>> 'KASAN: slab-use-after-free Read in tls_encrypt_done'. >>>>>> >>>>>> This bug is caused by subtle race condition, where the data structure >>>>>> is freed early on another CPU, resulting in use-after-free. >>>>>> >>>>>> Like this bug, some of the use-after-free bugs are caused by race >>>>>> condition, but it is not easy to quickly conclude that the cause of the >>>>>> use-after-free is race condition if only looking at the stack trace. >>>>>> >>>>>> I did not think this use-after-free was caused by race condition at the >>>>>> beginning, it took me some time to read the source code carefully and >>>>>> think about it to determine that it was caused by race condition. >>>>>> >>>>>> By adding timestamps for Allocation, Free, and Error to the KASAN >>>>>> report, it will be much easier to determine if use-after-free is >>>>>> caused by race condition. >>>>> >>>>> An alternative would be to add the CPU number to the alloc/free stack >>>>> traces. Something like: >>>>> >>>>> Allocated by task 42 on CPU 2: >>>>> (stack trace) >>>>> >>>>> The bad access stack trace already prints the CPU number. >>>> >>>> Yes, that is a great idea and the CPU number would help a lot. >>>> >>>> But I think the CPU number cannot completely replace the free timestamp, >>>> because some freeing really should be done at another CPU. >>>> >>>> We need the free timestamp to help us distinguish whether it was freed >>>> a long time ago or whether it was caused to be freed during the >>>> current operation. >>>> >>>> I think both the CPU number and the timestamp should be displayed, more >>>> information would help us find the real cause of the error faster. >>>> >>>> Should I implement these features? >>> >>> Hi Juntong, >>> >>> There is also an aspect of memory consumption. KASAN headers increase >>> the size of every heap object. So we tried to keep them as compact as >>> possible. At some point CPU numbers and timestamps (IIRC) were already >>> part of the header, but we removed them to shrink the header to 16 >>> bytes. >>> PID gives a good approximation of potential races. I usually look at >>> PIDs to understand if it's a "plain old single-threaded >>> use-after-free", or free and access happened in different threads. >>> Re timestamps, I see you referenced a syzbot report. With syzkaller >>> most timestamps will be very close even for non-racing case. >>> So if this is added, this should be added at least under a separate config. >>> >>> If you are looking for potential KASAN improvements, here is a good list: >>> https://bugzilla.kernel.org/buglist.cgi?bug_status=__open__&component=Sanitizers&list_id=1134168&product=Memory%20Management >> >> Hi Dmitry, >> >> I think PID cannot completely replace timestamp for reason similar to >> CPU number, some frees really should be done in another thread, but it >> is difficult for us to distinguish if it is a free that was done some >> time ago, or under subtle race conditions. > > I agree it's not a complete replacement, it just does not consume > additional memory. > >> As to whether most of the timestamps will be very close even for >> non-racing case, this I am not sure, because I do not have >> enough samples. >> >> I agree that these features should be in a separate config and >> the user should be free to choose whether to enable them or not. >> >> We can divide KASAN into normal mode and depth mode. Normal mode >> records only minimal critical information, while depth mode records >> more potentially useful information. >> >> Also, honestly, I think a small amount of extra memory consumption >> should not stop us from recording more information. >> >> Because if someone enables KASAN for debugging, then memory consumption >> and performance are no longer his main concern. > > There are a number of debugging tools created with the "performance > does not matter" attitude. They tend to be barely usable, not usable > in wide scale testing, not usable in canaries, etc. > All of sanitizers were created with lots of attention to performance, > attention on the level of the most performance critical production > code (sanitizer code is hotter than any production piece of code). > That's what made them so widely used. Think of interactive uses, > smaller devices, etc. Please let's keep this attitude. Yes, I agree that debugging tools used at a wide scale need to have more rigorous performance considerations. Do you think it is worth using the extra bytes to record more information? If this is a user-configurable feature. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-10-30 11:32 ` Juntong Deng @ 2023-10-31 9:46 ` Dmitry Vyukov 2023-11-02 14:58 ` Andrey Konovalov 0 siblings, 1 reply; 11+ messages in thread From: Dmitry Vyukov @ 2023-10-31 9:46 UTC (permalink / raw) To: Juntong Deng Cc: Andrey Konovalov, ryabinin.a.a, glider, vincenzo.frascino, akpm, kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees On Mon, 30 Oct 2023 at 12:32, Juntong Deng <juntong.deng@outlook.com> wrote: > > On 2023/10/30 18:10, Dmitry Vyukov wrote: > > On Mon, 30 Oct 2023 at 10:28, Juntong Deng <juntong.deng@outlook.com> wrote: > >> > >> On 2023/10/30 14:29, Dmitry Vyukov wrote: > >>> On Sun, 29 Oct 2023 at 10:05, Juntong Deng <juntong.deng@outlook.com> wrote: > >>>> > >>>> On 2023/10/26 3:22, Andrey Konovalov wrote: > >>>>> On Tue, Oct 17, 2023 at 9:40 PM Juntong Deng <juntong.deng@outlook.com> wrote: > >>>>>> > >>>>>> The idea came from the bug I was fixing recently, > >>>>>> 'KASAN: slab-use-after-free Read in tls_encrypt_done'. > >>>>>> > >>>>>> This bug is caused by subtle race condition, where the data structure > >>>>>> is freed early on another CPU, resulting in use-after-free. > >>>>>> > >>>>>> Like this bug, some of the use-after-free bugs are caused by race > >>>>>> condition, but it is not easy to quickly conclude that the cause of the > >>>>>> use-after-free is race condition if only looking at the stack trace. > >>>>>> > >>>>>> I did not think this use-after-free was caused by race condition at the > >>>>>> beginning, it took me some time to read the source code carefully and > >>>>>> think about it to determine that it was caused by race condition. > >>>>>> > >>>>>> By adding timestamps for Allocation, Free, and Error to the KASAN > >>>>>> report, it will be much easier to determine if use-after-free is > >>>>>> caused by race condition. > >>>>> > >>>>> An alternative would be to add the CPU number to the alloc/free stack > >>>>> traces. Something like: > >>>>> > >>>>> Allocated by task 42 on CPU 2: > >>>>> (stack trace) > >>>>> > >>>>> The bad access stack trace already prints the CPU number. > >>>> > >>>> Yes, that is a great idea and the CPU number would help a lot. > >>>> > >>>> But I think the CPU number cannot completely replace the free timestamp, > >>>> because some freeing really should be done at another CPU. > >>>> > >>>> We need the free timestamp to help us distinguish whether it was freed > >>>> a long time ago or whether it was caused to be freed during the > >>>> current operation. > >>>> > >>>> I think both the CPU number and the timestamp should be displayed, more > >>>> information would help us find the real cause of the error faster. > >>>> > >>>> Should I implement these features? > >>> > >>> Hi Juntong, > >>> > >>> There is also an aspect of memory consumption. KASAN headers increase > >>> the size of every heap object. So we tried to keep them as compact as > >>> possible. At some point CPU numbers and timestamps (IIRC) were already > >>> part of the header, but we removed them to shrink the header to 16 > >>> bytes. > >>> PID gives a good approximation of potential races. I usually look at > >>> PIDs to understand if it's a "plain old single-threaded > >>> use-after-free", or free and access happened in different threads. > >>> Re timestamps, I see you referenced a syzbot report. With syzkaller > >>> most timestamps will be very close even for non-racing case. > >>> So if this is added, this should be added at least under a separate config. > >>> > >>> If you are looking for potential KASAN improvements, here is a good list: > >>> https://bugzilla.kernel.org/buglist.cgi?bug_status=__open__&component=Sanitizers&list_id=1134168&product=Memory%20Management > >> > >> Hi Dmitry, > >> > >> I think PID cannot completely replace timestamp for reason similar to > >> CPU number, some frees really should be done in another thread, but it > >> is difficult for us to distinguish if it is a free that was done some > >> time ago, or under subtle race conditions. > > > > I agree it's not a complete replacement, it just does not consume > > additional memory. > > > >> As to whether most of the timestamps will be very close even for > >> non-racing case, this I am not sure, because I do not have > >> enough samples. > >> > >> I agree that these features should be in a separate config and > >> the user should be free to choose whether to enable them or not. > >> > >> We can divide KASAN into normal mode and depth mode. Normal mode > >> records only minimal critical information, while depth mode records > >> more potentially useful information. > >> > >> Also, honestly, I think a small amount of extra memory consumption > >> should not stop us from recording more information. > >> > >> Because if someone enables KASAN for debugging, then memory consumption > >> and performance are no longer his main concern. > > > > There are a number of debugging tools created with the "performance > > does not matter" attitude. They tend to be barely usable, not usable > > in wide scale testing, not usable in canaries, etc. > > All of sanitizers were created with lots of attention to performance, > > attention on the level of the most performance critical production > > code (sanitizer code is hotter than any production piece of code). > > That's what made them so widely used. Think of interactive uses, > > smaller devices, etc. Please let's keep this attitude. > > Yes, I agree that debugging tools used at a wide scale need to have > more rigorous performance considerations. > > Do you think it is worth using the extra bytes to record more > information? If this is a user-configurable feature. If it's user-configurable, then it is OK. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-10-31 9:46 ` Dmitry Vyukov @ 2023-11-02 14:58 ` Andrey Konovalov 2023-11-09 19:40 ` Juntong Deng 0 siblings, 1 reply; 11+ messages in thread From: Andrey Konovalov @ 2023-11-02 14:58 UTC (permalink / raw) To: Dmitry Vyukov, Juntong Deng Cc: ryabinin.a.a, glider, vincenzo.frascino, akpm, kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees On Tue, Oct 31, 2023 at 10:46 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > >>> There is also an aspect of memory consumption. KASAN headers increase > > >>> the size of every heap object. So we tried to keep them as compact as > > >>> possible. At some point CPU numbers and timestamps (IIRC) were already > > >>> part of the header, but we removed them to shrink the header to 16 > > >>> bytes. > > Do you think it is worth using the extra bytes to record more > > information? If this is a user-configurable feature. > > If it's user-configurable, then it is OK. FWIW, Generic KASAN already stores the auxiliary stack handles in the redzone, so the size of the redzone header is 24 bytes. Perhaps, we should hide them under a config as well. However, the increase of the redzone header size will only affect small kmalloc allocations (<= 16 bytes, as kmalloc allocations are aligned to the size of the object and the redzone is thus as big as the object anyway) and small non-kmalloc slab allocations (<= 64 bytes, for which optimal_redzone returns 16). So I don't think adding new fields to the redzone will increase the memory usage by much. But this needs to be tested to make sure. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report 2023-11-02 14:58 ` Andrey Konovalov @ 2023-11-09 19:40 ` Juntong Deng 0 siblings, 0 replies; 11+ messages in thread From: Juntong Deng @ 2023-11-09 19:40 UTC (permalink / raw) To: Andrey Konovalov, Dmitry Vyukov Cc: ryabinin.a.a, glider, vincenzo.frascino, akpm, kasan-dev, linux-mm, linux-kernel, linux-kernel-mentees On 2023/11/2 22:58, Andrey Konovalov wrote: > On Tue, Oct 31, 2023 at 10:46 AM Dmitry Vyukov <dvyukov@google.com> wrote: >> >>>>>> There is also an aspect of memory consumption. KASAN headers increase >>>>>> the size of every heap object. So we tried to keep them as compact as >>>>>> possible. At some point CPU numbers and timestamps (IIRC) were already >>>>>> part of the header, but we removed them to shrink the header to 16 >>>>>> bytes. > >>> Do you think it is worth using the extra bytes to record more >>> information? If this is a user-configurable feature. >> >> If it's user-configurable, then it is OK. > > FWIW, Generic KASAN already stores the auxiliary stack handles in the > redzone, so the size of the redzone header is 24 bytes. Perhaps, we > should hide them under a config as well. > > However, the increase of the redzone header size will only affect > small kmalloc allocations (<= 16 bytes, as kmalloc allocations are > aligned to the size of the object and the redzone is thus as big as > the object anyway) and small non-kmalloc slab allocations (<= 64 > bytes, for which optimal_redzone returns 16). So I don't think adding > new fields to the redzone will increase the memory usage by much. But > this needs to be tested to make sure. Yes, I read the design documentation and source code of KASAN in depth today. Currently in Generic mode, the alloc meta is stored in the redzone (unless it doesn't fit) and the free meta is stored in the object (or in the redzone if it cannot be stored in the object). Therefore, I also think that using a few extra bytes to record more information may consume less extra memory than we expected. I am trying to implement the feature to let KASAN record more information (configurable) and test it, I will send an PATCH RFC when I am done. Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-11-09 19:40 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-10-17 19:39 [RFC] mm/kasan: Add Allocation, Free, Error timestamps to KASAN report Juntong Deng 2023-10-17 20:10 ` Ricardo B. Marliere 2023-10-25 19:22 ` Andrey Konovalov 2023-10-29 9:05 ` Juntong Deng 2023-10-30 6:29 ` Dmitry Vyukov 2023-10-30 9:28 ` Juntong Deng 2023-10-30 10:10 ` Dmitry Vyukov 2023-10-30 11:32 ` Juntong Deng 2023-10-31 9:46 ` Dmitry Vyukov 2023-11-02 14:58 ` Andrey Konovalov 2023-11-09 19:40 ` Juntong Deng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox