Quantcast

[RFC v2 0/2] mm: SLUB Freelist randomization

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[RFC v2 0/2] mm: SLUB Freelist randomization

Thomas Garnier
This is RFC v2 for the SLUB Freelist randomization. The patch is now based
on the Linux master branch (as the based SLAB patch was merged).

Changes since RFC v1:
 - Redone slab_test testing to decide best entropy approach on new page
   creation.
 - Moved to use get_random_int as best approach to still use hardware
   randomization when available but lower cost when not available.
 - Update SLAB implementation to use get_random_long and get_random_int
   on refactoring.
 - Updated testing that highlight 3-4% impact on slab_test.

Based on 0e01df100b6bf22a1de61b66657502a6454153c5.

***Background:
This proposal follows the previous SLAB Freelist patch submitted to next.
It resuses parts of previous implementation and keep a similar approach.

The kernel heap allocators are using a sequential freelist making their
allocation predictable. This predictability makes kernel heap overflow
easier to exploit. An attacker can careful prepare the kernel heap to
control the following chunk overflowed.

For example these attacks exploit the predictability of the heap:
 - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
 - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

***Problems that needed solving:
 - Randomize the Freelist (singled linked) used in the SLUB allocator.
 - Ensure good performance to encourage usage.
 - Get best entropy in early boot stage.

***Parts:
 - 01/02 Reorganize the SLAB Freelist randomization to share elements
   with the SLUB implementation.
 - 02/02 The SLUB Freelist randomization implementation. Similar approach
   than the SLAB but tailored to the singled freelist used in SLUB.

***Performance data:

slab_test impact is between 3% to 4% on average:

Before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
2. Kmalloc: alloc/free test
100000 times kmalloc(8)/kfree -> 70 cycles
100000 times kmalloc(16)/kfree -> 70 cycles
100000 times kmalloc(32)/kfree -> 70 cycles
100000 times kmalloc(64)/kfree -> 70 cycles
100000 times kmalloc(128)/kfree -> 70 cycles
100000 times kmalloc(256)/kfree -> 69 cycles
100000 times kmalloc(512)/kfree -> 70 cycles
100000 times kmalloc(1024)/kfree -> 73 cycles
100000 times kmalloc(2048)/kfree -> 72 cycles
100000 times kmalloc(4096)/kfree -> 71 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
2. Kmalloc: alloc/free test
100000 times kmalloc(8)/kfree -> 66 cycles
100000 times kmalloc(16)/kfree -> 66 cycles
100000 times kmalloc(32)/kfree -> 66 cycles
100000 times kmalloc(64)/kfree -> 66 cycles
100000 times kmalloc(128)/kfree -> 65 cycles
100000 times kmalloc(256)/kfree -> 67 cycles
100000 times kmalloc(512)/kfree -> 67 cycles
100000 times kmalloc(1024)/kfree -> 64 cycles
100000 times kmalloc(2048)/kfree -> 67 cycles
100000 times kmalloc(4096)/kfree -> 67 cycles

Kernbench, before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 101.873 (1.16069)
User Time 1045.22 (1.60447)
System Time 88.969 (0.559195)
Percent CPU 1112.9 (13.8279)
Context Switches 189140 (2282.15)
Sleeps 99008.6 (768.091)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.47 (0.562732)
User Time 1045.3 (1.34263)
System Time 88.311 (0.342554)
Percent CPU 1105.8 (6.49444)
Context Switches 189081 (2355.78)
Sleeps 99231.5 (800.358)

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[RFC v2 1/2] mm: Reorganize SLAB freelist randomization

Thomas Garnier
This commit reorganizes the previous SLAB freelist randomization to
prepare for the SLUB implementation. It moves functions that will be
shared to slab_common. It also move the definition of freelist_idx_t in
the slab_def header so a similar type can be used for all common
functions. The entropy functions are changed to align with the SLUB
implementation, now using get_random_* functions.

Signed-off-by: Thomas Garnier <[hidden email]>
---
Based on 0e01df100b6bf22a1de61b66657502a6454153c5
---
 include/linux/slab_def.h | 11 +++++++-
 mm/slab.c                | 68 ++----------------------------------------------
 mm/slab.h                | 16 ++++++++++++
 mm/slab_common.c         | 48 ++++++++++++++++++++++++++++++++++
 4 files changed, 76 insertions(+), 67 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 8694f7a..e05a871 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -3,6 +3,15 @@
 
 #include <linux/reciprocal_div.h>
 
+#define FREELIST_BYTE_INDEX (((PAGE_SIZE >> BITS_PER_BYTE) \
+ <= SLAB_OBJ_MIN_SIZE) ? 1 : 0)
+
+#if FREELIST_BYTE_INDEX
+typedef unsigned char freelist_idx_t;
+#else
+typedef unsigned short freelist_idx_t;
+#endif
+
 /*
  * Definitions unique to the original Linux SLAB allocator.
  */
@@ -81,7 +90,7 @@ struct kmem_cache {
 #endif
 
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
- void *random_seq;
+ freelist_idx_t *random_seq;
 #endif
 
  struct kmem_cache_node *node[MAX_NUMNODES];
diff --git a/mm/slab.c b/mm/slab.c
index cc8bbc1..8e32562 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -157,15 +157,6 @@
 #define ARCH_KMALLOC_FLAGS SLAB_HWCACHE_ALIGN
 #endif
 
-#define FREELIST_BYTE_INDEX (((PAGE_SIZE >> BITS_PER_BYTE) \
- <= SLAB_OBJ_MIN_SIZE) ? 1 : 0)
-
-#if FREELIST_BYTE_INDEX
-typedef unsigned char freelist_idx_t;
-#else
-typedef unsigned short freelist_idx_t;
-#endif
-
 #define SLAB_OBJ_MAX_NUM ((1 << sizeof(freelist_idx_t) * BITS_PER_BYTE) - 1)
 
 /*
@@ -1236,61 +1227,6 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
  }
 }
 
-#ifdef CONFIG_SLAB_FREELIST_RANDOM
-static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
- size_t count)
-{
- size_t i;
- unsigned int rand;
-
- for (i = 0; i < count; i++)
- list[i] = i;
-
- /* Fisher-Yates shuffle */
- for (i = count - 1; i > 0; i--) {
- rand = prandom_u32_state(state);
- rand %= (i + 1);
- swap(list[i], list[rand]);
- }
-}
-
-/* Create a random sequence per cache */
-static int cache_random_seq_create(struct kmem_cache *cachep, gfp_t gfp)
-{
- unsigned int seed, count = cachep->num;
- struct rnd_state state;
-
- if (count < 2)
- return 0;
-
- /* If it fails, we will just use the global lists */
- cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), gfp);
- if (!cachep->random_seq)
- return -ENOMEM;
-
- /* Get best entropy at this stage */
- get_random_bytes_arch(&seed, sizeof(seed));
- prandom_seed_state(&state, seed);
-
- freelist_randomize(&state, cachep->random_seq, count);
- return 0;
-}
-
-/* Destroy the per-cache random freelist sequence */
-static void cache_random_seq_destroy(struct kmem_cache *cachep)
-{
- kfree(cachep->random_seq);
- cachep->random_seq = NULL;
-}
-#else
-static inline int cache_random_seq_create(struct kmem_cache *cachep, gfp_t gfp)
-{
- return 0;
-}
-static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
-#endif /* CONFIG_SLAB_FREELIST_RANDOM */
-
-
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
@@ -2554,7 +2490,7 @@ static bool freelist_state_initialize(union freelist_init_state *state,
  unsigned int rand;
 
  /* Use best entropy available to define a random shift */
- get_random_bytes_arch(&rand, sizeof(rand));
+ rand = get_random_int();
 
  /* Use a random state if the pre-computed list is not available */
  if (!cachep->random_seq) {
@@ -3979,7 +3915,7 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
  int shared = 0;
  int batchcount = 0;
 
- err = cache_random_seq_create(cachep, gfp);
+ err = cache_random_seq_create(cachep, cachep->num, gfp);
  if (err)
  goto end;
 
diff --git a/mm/slab.h b/mm/slab.h
index dedb1a9..2c33bf3 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -42,6 +42,7 @@ struct kmem_cache {
 #include <linux/kmemcheck.h>
 #include <linux/kasan.h>
 #include <linux/kmemleak.h>
+#include <linux/random.h>
 
 /*
  * State of the slab allocator.
@@ -464,4 +465,19 @@ int memcg_slab_show(struct seq_file *m, void *p);
 
 void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
 
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
+ size_t count);
+int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count,
+ gfp_t gfp);
+void cache_random_seq_destroy(struct kmem_cache *cachep);
+#else
+static inline int cache_random_seq_create(struct kmem_cache *cachep,
+ unsigned int count, gfp_t gfp)
+{
+ return 0;
+}
+static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
+#endif /* CONFIG_SLAB_FREELIST_RANDOM */
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index a65dad7..657ecf1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1142,6 +1142,54 @@ int memcg_slab_show(struct seq_file *m, void *p)
 }
 #endif
 
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+/* Randomize a generic freelist */
+void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
+ size_t count)
+{
+ size_t i;
+ unsigned int rand;
+
+ for (i = 0; i < count; i++)
+ list[i] = i;
+
+ /* Fisher-Yates shuffle */
+ for (i = count - 1; i > 0; i--) {
+ rand = prandom_u32_state(state);
+ rand %= (i + 1);
+ swap(list[i], list[rand]);
+ }
+}
+
+/* Create a random sequence per cache */
+int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count,
+    gfp_t gfp)
+{
+ struct rnd_state state;
+
+ if (count < 2)
+ return 0;
+
+ /* If it fails, we will just use the global lists */
+ cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), gfp);
+ if (!cachep->random_seq)
+ return -ENOMEM;
+
+ /* Get best entropy at this stage */
+ prandom_seed_state(&state, get_random_long());
+
+ freelist_randomize(&state, cachep->random_seq, count);
+ return 0;
+}
+
+/* Destroy the per-cache random freelist sequence */
+void cache_random_seq_destroy(struct kmem_cache *cachep)
+{
+ kfree(cachep->random_seq);
+ cachep->random_seq = NULL;
+}
+#endif /* CONFIG_SLAB_FREELIST_RANDOM */
+
 /*
  * slabinfo_op - iterator that generates /proc/slabinfo
  *
--
2.8.0.rc3.226.g39d4020

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[RFC v2 2/2] mm: SLUB Freelist randomization

Thomas Garnier
In reply to this post by Thomas Garnier
Implements Freelist randomization for the SLUB allocator. It was
previous implemented for the SLAB allocator. Both use the same
configuration option (CONFIG_SLAB_FREELIST_RANDOM).

The list is randomized during initialization of a new set of pages. The
order on different freelist sizes is pre-computed at boot for
performance. Each kmem_cache has its own randomized freelist. This
security feature reduces the predictability of the kernel SLUB allocator
against heap overflows rendering attacks much less stable.

For example these attacks exploit the predictability of the heap:
 - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
 - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

Performance results:

slab_test impact is between 3% to 4% on average:

Before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
2. Kmalloc: alloc/free test
100000 times kmalloc(8)/kfree -> 70 cycles
100000 times kmalloc(16)/kfree -> 70 cycles
100000 times kmalloc(32)/kfree -> 70 cycles
100000 times kmalloc(64)/kfree -> 70 cycles
100000 times kmalloc(128)/kfree -> 70 cycles
100000 times kmalloc(256)/kfree -> 69 cycles
100000 times kmalloc(512)/kfree -> 70 cycles
100000 times kmalloc(1024)/kfree -> 73 cycles
100000 times kmalloc(2048)/kfree -> 72 cycles
100000 times kmalloc(4096)/kfree -> 71 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
2. Kmalloc: alloc/free test
100000 times kmalloc(8)/kfree -> 66 cycles
100000 times kmalloc(16)/kfree -> 66 cycles
100000 times kmalloc(32)/kfree -> 66 cycles
100000 times kmalloc(64)/kfree -> 66 cycles
100000 times kmalloc(128)/kfree -> 65 cycles
100000 times kmalloc(256)/kfree -> 67 cycles
100000 times kmalloc(512)/kfree -> 67 cycles
100000 times kmalloc(1024)/kfree -> 64 cycles
100000 times kmalloc(2048)/kfree -> 67 cycles
100000 times kmalloc(4096)/kfree -> 67 cycles

Kernbench, before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 101.873 (1.16069)
User Time 1045.22 (1.60447)
System Time 88.969 (0.559195)
Percent CPU 1112.9 (13.8279)
Context Switches 189140 (2282.15)
Sleeps 99008.6 (768.091)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.47 (0.562732)
User Time 1045.3 (1.34263)
System Time 88.311 (0.342554)
Percent CPU 1105.8 (6.49444)
Context Switches 189081 (2355.78)
Sleeps 99231.5 (800.358)

Signed-off-by: Thomas Garnier <[hidden email]>
---
Based on 0e01df100b6bf22a1de61b66657502a6454153c5
---
 include/linux/slub_def.h |   8 +++
 init/Kconfig             |   4 +-
 mm/slub.c                | 133 ++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 136 insertions(+), 9 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 665cd0c..22d487e 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -56,6 +56,9 @@ struct kmem_cache_order_objects {
  unsigned long x;
 };
 
+/* Index used for freelist randomization */
+typedef unsigned int freelist_idx_t;
+
 /*
  * Slab cache management.
  */
@@ -99,6 +102,11 @@ struct kmem_cache {
  */
  int remote_node_defrag_ratio;
 #endif
+
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+ freelist_idx_t *random_seq;
+#endif
+
  struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/init/Kconfig b/init/Kconfig
index a9c4aefd..fbb6678 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1771,10 +1771,10 @@ endchoice
 
 config SLAB_FREELIST_RANDOM
  default n
- depends on SLAB
+ depends on SLAB || SLUB
  bool "SLAB freelist randomization"
  help
-  Randomizes the freelist order used on creating new SLABs. This
+  Randomizes the freelist order used on creating new pages. This
   security feature reduces the predictability of the kernel slab
   allocator against heap overflows.
 
diff --git a/mm/slub.c b/mm/slub.c
index 825ff45..217aa8a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1405,6 +1405,109 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
  return page;
 }
 
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+/* Pre-initialize the random sequence cache */
+static int init_cache_random_seq(struct kmem_cache *s)
+{
+ int err;
+ unsigned long i, count = oo_objects(s->oo);
+
+ err = cache_random_seq_create(s, count, GFP_KERNEL);
+ if (err) {
+ pr_err("SLUB: Unable to initialize free list for %s\n",
+ s->name);
+ return err;
+ }
+
+ /* Transform to an offset on the set of pages */
+ if (s->random_seq) {
+ for (i = 0; i < count; i++)
+ s->random_seq[i] *= s->size;
+ }
+ return 0;
+}
+
+/* Initialize each random sequence freelist per cache */
+static void __init init_freelist_randomization(void)
+{
+ struct kmem_cache *s;
+
+ mutex_lock(&slab_mutex);
+
+ list_for_each_entry(s, &slab_caches, list)
+ init_cache_random_seq(s);
+
+ mutex_unlock(&slab_mutex);
+}
+
+/* Get the next entry on the pre-computed freelist randomized */
+static void *next_freelist_entry(struct kmem_cache *s, struct page *page,
+ unsigned long *pos, void *start,
+ unsigned long page_limit,
+ unsigned long freelist_count)
+{
+ freelist_idx_t idx;
+
+ /*
+ * If the target page allocation failed, the number of objects on the
+ * page might be smaller than the usual size defined by the cache.
+ */
+ do {
+ idx = s->random_seq[*pos];
+ *pos += 1;
+ if (*pos >= freelist_count)
+ *pos = 0;
+ } while (unlikely(idx >= page_limit));
+
+ return (char *)start + idx;
+}
+
+/* Shuffle the single linked freelist based on a random pre-computed sequence */
+static bool shuffle_freelist(struct kmem_cache *s, struct page *page)
+{
+ void *start;
+ void *cur;
+ void *next;
+ unsigned long idx, pos, page_limit, freelist_count;
+
+ if (page->objects < 2 || !s->random_seq)
+ return false;
+
+ freelist_count = oo_objects(s->oo);
+ pos = get_random_int() % freelist_count;
+
+ page_limit = page->objects * s->size;
+ start = fixup_red_left(s, page_address(page));
+
+ /* First entry is used as the base of the freelist */
+ cur = next_freelist_entry(s, page, &pos, start, page_limit,
+ freelist_count);
+ page->freelist = cur;
+
+ for (idx = 1; idx < page->objects; idx++) {
+ setup_object(s, page, cur);
+ next = next_freelist_entry(s, page, &pos, start, page_limit,
+ freelist_count);
+ set_freepointer(s, cur, next);
+ cur = next;
+ }
+ setup_object(s, page, cur);
+ set_freepointer(s, cur, NULL);
+
+ return true;
+}
+#else
+static inline int init_cache_random_seq(struct kmem_cache *s)
+{
+ return 0;
+}
+static inline void init_freelist_randomization(void) { }
+static inline bool shuffle_freelist(struct kmem_cache *s, struct page *page)
+{
+ return false;
+}
+#endif /* CONFIG_SLAB_FREELIST_RANDOM */
+
 static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
  struct page *page;
@@ -1412,6 +1515,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
  gfp_t alloc_gfp;
  void *start, *p;
  int idx, order;
+ bool shuffle;
 
  flags &= gfp_allowed_mask;
 
@@ -1473,15 +1577,19 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 
  kasan_poison_slab(page);
 
- for_each_object_idx(p, idx, s, start, page->objects) {
- setup_object(s, page, p);
- if (likely(idx < page->objects))
- set_freepointer(s, p, p + s->size);
- else
- set_freepointer(s, p, NULL);
+ shuffle = shuffle_freelist(s, page);
+
+ if (!shuffle) {
+ for_each_object_idx(p, idx, s, start, page->objects) {
+ setup_object(s, page, p);
+ if (likely(idx < page->objects))
+ set_freepointer(s, p, p + s->size);
+ else
+ set_freepointer(s, p, NULL);
+ }
+ page->freelist = fixup_red_left(s, start);
  }
 
- page->freelist = fixup_red_left(s, start);
  page->inuse = page->objects;
  page->frozen = 1;
 
@@ -3207,6 +3315,7 @@ static void free_kmem_cache_nodes(struct kmem_cache *s)
 
 void __kmem_cache_release(struct kmem_cache *s)
 {
+ cache_random_seq_destroy(s);
  free_percpu(s->cpu_slab);
  free_kmem_cache_nodes(s);
 }
@@ -3431,6 +3540,13 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
 #ifdef CONFIG_NUMA
  s->remote_node_defrag_ratio = 1000;
 #endif
+
+ /* Initialize the pre-computed randomized freelist if slab is up */
+ if (slab_state >= UP) {
+ if (init_cache_random_seq(s))
+ goto error;
+ }
+
  if (!init_kmem_cache_nodes(s))
  goto error;
 
@@ -3947,6 +4063,9 @@ void __init kmem_cache_init(void)
  setup_kmalloc_cache_index_table();
  create_kmalloc_caches(0);
 
+ /* Setup random freelists for each cache */
+ init_freelist_randomization();
+
 #ifdef CONFIG_SMP
  register_cpu_notifier(&slab_notifier);
 #endif
--
2.8.0.rc3.226.g39d4020

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC v2 2/2] mm: SLUB Freelist randomization

Kees Cook
On Tue, May 24, 2016 at 2:15 PM, Thomas Garnier <[hidden email]> wrote:

> Implements Freelist randomization for the SLUB allocator. It was
> previous implemented for the SLAB allocator. Both use the same
> configuration option (CONFIG_SLAB_FREELIST_RANDOM).
>
> The list is randomized during initialization of a new set of pages. The
> order on different freelist sizes is pre-computed at boot for
> performance. Each kmem_cache has its own randomized freelist. This
> security feature reduces the predictability of the kernel SLUB allocator
> against heap overflows rendering attacks much less stable.
>
> For example these attacks exploit the predictability of the heap:
>  - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
>  - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)
>
> Performance results:
>
> slab_test impact is between 3% to 4% on average:

Seems like slab_test is pretty intensive (so the impact appears
higher). On a more "regular" load like kernbench, the impact seems to
be almost 0. Is that accurate?

Regardless, please consider both patches:

Reviewed-by: Kees Cook <[hidden email]>

-Kees

>
> Before:
>
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
> 100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
> 100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
> 100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
> 100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
> 100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
> 100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
> 100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
> 100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
> 100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
> 2. Kmalloc: alloc/free test
> 100000 times kmalloc(8)/kfree -> 70 cycles
> 100000 times kmalloc(16)/kfree -> 70 cycles
> 100000 times kmalloc(32)/kfree -> 70 cycles
> 100000 times kmalloc(64)/kfree -> 70 cycles
> 100000 times kmalloc(128)/kfree -> 70 cycles
> 100000 times kmalloc(256)/kfree -> 69 cycles
> 100000 times kmalloc(512)/kfree -> 70 cycles
> 100000 times kmalloc(1024)/kfree -> 73 cycles
> 100000 times kmalloc(2048)/kfree -> 72 cycles
> 100000 times kmalloc(4096)/kfree -> 71 cycles
>
> After:
>
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
> 100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
> 100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
> 100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
> 100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
> 100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
> 100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
> 100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
> 100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
> 100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
> 2. Kmalloc: alloc/free test
> 100000 times kmalloc(8)/kfree -> 66 cycles
> 100000 times kmalloc(16)/kfree -> 66 cycles
> 100000 times kmalloc(32)/kfree -> 66 cycles
> 100000 times kmalloc(64)/kfree -> 66 cycles
> 100000 times kmalloc(128)/kfree -> 65 cycles
> 100000 times kmalloc(256)/kfree -> 67 cycles
> 100000 times kmalloc(512)/kfree -> 67 cycles
> 100000 times kmalloc(1024)/kfree -> 64 cycles
> 100000 times kmalloc(2048)/kfree -> 67 cycles
> 100000 times kmalloc(4096)/kfree -> 67 cycles
>
> Kernbench, before:
>
> Average Optimal load -j 12 Run (std deviation):
> Elapsed Time 101.873 (1.16069)
> User Time 1045.22 (1.60447)
> System Time 88.969 (0.559195)
> Percent CPU 1112.9 (13.8279)
> Context Switches 189140 (2282.15)
> Sleeps 99008.6 (768.091)
>
> After:
>
> Average Optimal load -j 12 Run (std deviation):
> Elapsed Time 102.47 (0.562732)
> User Time 1045.3 (1.34263)
> System Time 88.311 (0.342554)
> Percent CPU 1105.8 (6.49444)
> Context Switches 189081 (2355.78)
> Sleeps 99231.5 (800.358)
>
> Signed-off-by: Thomas Garnier <[hidden email]>
> ---
> Based on 0e01df100b6bf22a1de61b66657502a6454153c5
> ---
>  include/linux/slub_def.h |   8 +++
>  init/Kconfig             |   4 +-
>  mm/slub.c                | 133 ++++++++++++++++++++++++++++++++++++++++++++---
>  3 files changed, 136 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index 665cd0c..22d487e 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -56,6 +56,9 @@ struct kmem_cache_order_objects {
>         unsigned long x;
>  };
>
> +/* Index used for freelist randomization */
> +typedef unsigned int freelist_idx_t;
> +
>  /*
>   * Slab cache management.
>   */
> @@ -99,6 +102,11 @@ struct kmem_cache {
>          */
>         int remote_node_defrag_ratio;
>  #endif
> +
> +#ifdef CONFIG_SLAB_FREELIST_RANDOM
> +       freelist_idx_t *random_seq;
> +#endif
> +
>         struct kmem_cache_node *node[MAX_NUMNODES];
>  };
>
> diff --git a/init/Kconfig b/init/Kconfig
> index a9c4aefd..fbb6678 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1771,10 +1771,10 @@ endchoice
>
>  config SLAB_FREELIST_RANDOM
>         default n
> -       depends on SLAB
> +       depends on SLAB || SLUB
>         bool "SLAB freelist randomization"
>         help
> -         Randomizes the freelist order used on creating new SLABs. This
> +         Randomizes the freelist order used on creating new pages. This
>           security feature reduces the predictability of the kernel slab
>           allocator against heap overflows.
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 825ff45..217aa8a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1405,6 +1405,109 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>         return page;
>  }
>
> +#ifdef CONFIG_SLAB_FREELIST_RANDOM
> +/* Pre-initialize the random sequence cache */
> +static int init_cache_random_seq(struct kmem_cache *s)
> +{
> +       int err;
> +       unsigned long i, count = oo_objects(s->oo);
> +
> +       err = cache_random_seq_create(s, count, GFP_KERNEL);
> +       if (err) {
> +               pr_err("SLUB: Unable to initialize free list for %s\n",
> +                       s->name);
> +               return err;
> +       }
> +
> +       /* Transform to an offset on the set of pages */
> +       if (s->random_seq) {
> +               for (i = 0; i < count; i++)
> +                       s->random_seq[i] *= s->size;
> +       }
> +       return 0;
> +}
> +
> +/* Initialize each random sequence freelist per cache */
> +static void __init init_freelist_randomization(void)
> +{
> +       struct kmem_cache *s;
> +
> +       mutex_lock(&slab_mutex);
> +
> +       list_for_each_entry(s, &slab_caches, list)
> +               init_cache_random_seq(s);
> +
> +       mutex_unlock(&slab_mutex);
> +}
> +
> +/* Get the next entry on the pre-computed freelist randomized */
> +static void *next_freelist_entry(struct kmem_cache *s, struct page *page,
> +                               unsigned long *pos, void *start,
> +                               unsigned long page_limit,
> +                               unsigned long freelist_count)
> +{
> +       freelist_idx_t idx;
> +
> +       /*
> +        * If the target page allocation failed, the number of objects on the
> +        * page might be smaller than the usual size defined by the cache.
> +        */
> +       do {
> +               idx = s->random_seq[*pos];
> +               *pos += 1;
> +               if (*pos >= freelist_count)
> +                       *pos = 0;
> +       } while (unlikely(idx >= page_limit));
> +
> +       return (char *)start + idx;
> +}
> +
> +/* Shuffle the single linked freelist based on a random pre-computed sequence */
> +static bool shuffle_freelist(struct kmem_cache *s, struct page *page)
> +{
> +       void *start;
> +       void *cur;
> +       void *next;
> +       unsigned long idx, pos, page_limit, freelist_count;
> +
> +       if (page->objects < 2 || !s->random_seq)
> +               return false;
> +
> +       freelist_count = oo_objects(s->oo);
> +       pos = get_random_int() % freelist_count;
> +
> +       page_limit = page->objects * s->size;
> +       start = fixup_red_left(s, page_address(page));
> +
> +       /* First entry is used as the base of the freelist */
> +       cur = next_freelist_entry(s, page, &pos, start, page_limit,
> +                               freelist_count);
> +       page->freelist = cur;
> +
> +       for (idx = 1; idx < page->objects; idx++) {
> +               setup_object(s, page, cur);
> +               next = next_freelist_entry(s, page, &pos, start, page_limit,
> +                       freelist_count);
> +               set_freepointer(s, cur, next);
> +               cur = next;
> +       }
> +       setup_object(s, page, cur);
> +       set_freepointer(s, cur, NULL);
> +
> +       return true;
> +}
> +#else
> +static inline int init_cache_random_seq(struct kmem_cache *s)
> +{
> +       return 0;
> +}
> +static inline void init_freelist_randomization(void) { }
> +static inline bool shuffle_freelist(struct kmem_cache *s, struct page *page)
> +{
> +       return false;
> +}
> +#endif /* CONFIG_SLAB_FREELIST_RANDOM */
> +
>  static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  {
>         struct page *page;
> @@ -1412,6 +1515,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>         gfp_t alloc_gfp;
>         void *start, *p;
>         int idx, order;
> +       bool shuffle;
>
>         flags &= gfp_allowed_mask;
>
> @@ -1473,15 +1577,19 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>
>         kasan_poison_slab(page);
>
> -       for_each_object_idx(p, idx, s, start, page->objects) {
> -               setup_object(s, page, p);
> -               if (likely(idx < page->objects))
> -                       set_freepointer(s, p, p + s->size);
> -               else
> -                       set_freepointer(s, p, NULL);
> +       shuffle = shuffle_freelist(s, page);
> +
> +       if (!shuffle) {
> +               for_each_object_idx(p, idx, s, start, page->objects) {
> +                       setup_object(s, page, p);
> +                       if (likely(idx < page->objects))
> +                               set_freepointer(s, p, p + s->size);
> +                       else
> +                               set_freepointer(s, p, NULL);
> +               }
> +               page->freelist = fixup_red_left(s, start);
>         }
>
> -       page->freelist = fixup_red_left(s, start);
>         page->inuse = page->objects;
>         page->frozen = 1;
>
> @@ -3207,6 +3315,7 @@ static void free_kmem_cache_nodes(struct kmem_cache *s)
>
>  void __kmem_cache_release(struct kmem_cache *s)
>  {
> +       cache_random_seq_destroy(s);
>         free_percpu(s->cpu_slab);
>         free_kmem_cache_nodes(s);
>  }
> @@ -3431,6 +3540,13 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
>  #ifdef CONFIG_NUMA
>         s->remote_node_defrag_ratio = 1000;
>  #endif
> +
> +       /* Initialize the pre-computed randomized freelist if slab is up */
> +       if (slab_state >= UP) {
> +               if (init_cache_random_seq(s))
> +                       goto error;
> +       }
> +
>         if (!init_kmem_cache_nodes(s))
>                 goto error;
>
> @@ -3947,6 +4063,9 @@ void __init kmem_cache_init(void)
>         setup_kmalloc_cache_index_table();
>         create_kmalloc_caches(0);
>
> +       /* Setup random freelists for each cache */
> +       init_freelist_randomization();
> +
>  #ifdef CONFIG_SMP
>         register_cpu_notifier(&slab_notifier);
>  #endif
> --
> 2.8.0.rc3.226.g39d4020
>



--
Kees Cook
Chrome OS & Brillo Security
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC v2 1/2] mm: Reorganize SLAB freelist randomization

Joonsoo Kim-2
In reply to this post by Thomas Garnier
On Tue, May 24, 2016 at 02:15:22PM -0700, Thomas Garnier wrote:
> This commit reorganizes the previous SLAB freelist randomization to
> prepare for the SLUB implementation. It moves functions that will be
> shared to slab_common. It also move the definition of freelist_idx_t in
> the slab_def header so a similar type can be used for all common
> functions. The entropy functions are changed to align with the SLUB
> implementation, now using get_random_* functions.

Could you explain more what's the difference between get_random_*
and get_random_bytes_arch() and why this change is needed?

And, I think that it should be another patch.

>
> Signed-off-by: Thomas Garnier <[hidden email]>
> ---
> Based on 0e01df100b6bf22a1de61b66657502a6454153c5
> ---
>  include/linux/slab_def.h | 11 +++++++-
>  mm/slab.c                | 68 ++----------------------------------------------
>  mm/slab.h                | 16 ++++++++++++
>  mm/slab_common.c         | 48 ++++++++++++++++++++++++++++++++++
>  4 files changed, 76 insertions(+), 67 deletions(-)
>
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index 8694f7a..e05a871 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -3,6 +3,15 @@
>  
>  #include <linux/reciprocal_div.h>
>  
> +#define FREELIST_BYTE_INDEX (((PAGE_SIZE >> BITS_PER_BYTE) \
> + <= SLAB_OBJ_MIN_SIZE) ? 1 : 0)
> +
> +#if FREELIST_BYTE_INDEX
> +typedef unsigned char freelist_idx_t;
> +#else
> +typedef unsigned short freelist_idx_t;
> +#endif
> +

This is a SLAB specific index size definition and I don't want to export
it to SLUB. Please use 'void *random_seq' and allocate sizeof(void *)
memory for each entry. And, then do type casting when suffling in
SLAB. There is some memory waste but not that much so we can tolerate
it.

Others look fine to me.

Thanks.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC v2 2/2] mm: SLUB Freelist randomization

js1304
In reply to this post by Thomas Garnier
2016-05-25 6:15 GMT+09:00 Thomas Garnier <[hidden email]>:

> Implements Freelist randomization for the SLUB allocator. It was
> previous implemented for the SLAB allocator. Both use the same
> configuration option (CONFIG_SLAB_FREELIST_RANDOM).
>
> The list is randomized during initialization of a new set of pages. The
> order on different freelist sizes is pre-computed at boot for
> performance. Each kmem_cache has its own randomized freelist. This
> security feature reduces the predictability of the kernel SLUB allocator
> against heap overflows rendering attacks much less stable.
>
> For example these attacks exploit the predictability of the heap:
>  - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
>  - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)
>
> Performance results:
>
> slab_test impact is between 3% to 4% on average:
>
> Before:
>
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
> 100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
> 100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
> 100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
> 100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
> 100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
> 100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
> 100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
> 100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
> 100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
> 2. Kmalloc: alloc/free test
> 100000 times kmalloc(8)/kfree -> 70 cycles
> 100000 times kmalloc(16)/kfree -> 70 cycles
> 100000 times kmalloc(32)/kfree -> 70 cycles
> 100000 times kmalloc(64)/kfree -> 70 cycles
> 100000 times kmalloc(128)/kfree -> 70 cycles
> 100000 times kmalloc(256)/kfree -> 69 cycles
> 100000 times kmalloc(512)/kfree -> 70 cycles
> 100000 times kmalloc(1024)/kfree -> 73 cycles
> 100000 times kmalloc(2048)/kfree -> 72 cycles
> 100000 times kmalloc(4096)/kfree -> 71 cycles
>
> After:
>
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
> 100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
> 100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
> 100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
> 100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
> 100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
> 100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
> 100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
> 100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
> 100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
> 2. Kmalloc: alloc/free test
> 100000 times kmalloc(8)/kfree -> 66 cycles
> 100000 times kmalloc(16)/kfree -> 66 cycles
> 100000 times kmalloc(32)/kfree -> 66 cycles
> 100000 times kmalloc(64)/kfree -> 66 cycles
> 100000 times kmalloc(128)/kfree -> 65 cycles
> 100000 times kmalloc(256)/kfree -> 67 cycles
> 100000 times kmalloc(512)/kfree -> 67 cycles
> 100000 times kmalloc(1024)/kfree -> 64 cycles
> 100000 times kmalloc(2048)/kfree -> 67 cycles
> 100000 times kmalloc(4096)/kfree -> 67 cycles
>
> Kernbench, before:
>
> Average Optimal load -j 12 Run (std deviation):
> Elapsed Time 101.873 (1.16069)
> User Time 1045.22 (1.60447)
> System Time 88.969 (0.559195)
> Percent CPU 1112.9 (13.8279)
> Context Switches 189140 (2282.15)
> Sleeps 99008.6 (768.091)
>
> After:
>
> Average Optimal load -j 12 Run (std deviation):
> Elapsed Time 102.47 (0.562732)
> User Time 1045.3 (1.34263)
> System Time 88.311 (0.342554)
> Percent CPU 1105.8 (6.49444)
> Context Switches 189081 (2355.78)
> Sleeps 99231.5 (800.358)
>
> Signed-off-by: Thomas Garnier <[hidden email]>
> ---
> Based on 0e01df100b6bf22a1de61b66657502a6454153c5
> ---
>  include/linux/slub_def.h |   8 +++
>  init/Kconfig             |   4 +-
>  mm/slub.c                | 133 ++++++++++++++++++++++++++++++++++++++++++++---
>  3 files changed, 136 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index 665cd0c..22d487e 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -56,6 +56,9 @@ struct kmem_cache_order_objects {
>         unsigned long x;
>  };
>
> +/* Index used for freelist randomization */
> +typedef unsigned int freelist_idx_t;
> +
>  /*
>   * Slab cache management.
>   */
> @@ -99,6 +102,11 @@ struct kmem_cache {
>          */
>         int remote_node_defrag_ratio;
>  #endif
> +
> +#ifdef CONFIG_SLAB_FREELIST_RANDOM
> +       freelist_idx_t *random_seq;
> +#endif
> +
>         struct kmem_cache_node *node[MAX_NUMNODES];
>  };
>
> diff --git a/init/Kconfig b/init/Kconfig
> index a9c4aefd..fbb6678 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1771,10 +1771,10 @@ endchoice
>
>  config SLAB_FREELIST_RANDOM
>         default n
> -       depends on SLAB
> +       depends on SLAB || SLUB
>         bool "SLAB freelist randomization"
>         help
> -         Randomizes the freelist order used on creating new SLABs. This
> +         Randomizes the freelist order used on creating new pages. This
>           security feature reduces the predictability of the kernel slab
>           allocator against heap overflows.
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 825ff45..217aa8a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1405,6 +1405,109 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>         return page;
>  }
>
> +#ifdef CONFIG_SLAB_FREELIST_RANDOM
> +/* Pre-initialize the random sequence cache */
> +static int init_cache_random_seq(struct kmem_cache *s)
> +{
> +       int err;
> +       unsigned long i, count = oo_objects(s->oo);
> +
> +       err = cache_random_seq_create(s, count, GFP_KERNEL);
> +       if (err) {
> +               pr_err("SLUB: Unable to initialize free list for %s\n",
> +                       s->name);
> +               return err;
> +       }
> +
> +       /* Transform to an offset on the set of pages */
> +       if (s->random_seq) {
> +               for (i = 0; i < count; i++)
> +                       s->random_seq[i] *= s->size;
> +       }
> +       return 0;
> +}
> +
> +/* Initialize each random sequence freelist per cache */
> +static void __init init_freelist_randomization(void)
> +{
> +       struct kmem_cache *s;
> +
> +       mutex_lock(&slab_mutex);
> +
> +       list_for_each_entry(s, &slab_caches, list)
> +               init_cache_random_seq(s);
> +
> +       mutex_unlock(&slab_mutex);
> +}
> +
> +/* Get the next entry on the pre-computed freelist randomized */
> +static void *next_freelist_entry(struct kmem_cache *s, struct page *page,
> +                               unsigned long *pos, void *start,
> +                               unsigned long page_limit,
> +                               unsigned long freelist_count)
> +{
> +       freelist_idx_t idx;
> +
> +       /*
> +        * If the target page allocation failed, the number of objects on the
> +        * page might be smaller than the usual size defined by the cache.
> +        */
> +       do {
> +               idx = s->random_seq[*pos];
> +               *pos += 1;
> +               if (*pos >= freelist_count)
> +                       *pos = 0;
> +       } while (unlikely(idx >= page_limit));
> +
> +       return (char *)start + idx;
> +}
> +
> +/* Shuffle the single linked freelist based on a random pre-computed sequence */
> +static bool shuffle_freelist(struct kmem_cache *s, struct page *page)
> +{
> +       void *start;
> +       void *cur;
> +       void *next;
> +       unsigned long idx, pos, page_limit, freelist_count;
> +
> +       if (page->objects < 2 || !s->random_seq)
> +               return false;
> +
> +       freelist_count = oo_objects(s->oo);
> +       pos = get_random_int() % freelist_count;
> +
> +       page_limit = page->objects * s->size;
> +       start = fixup_red_left(s, page_address(page));
> +
> +       /* First entry is used as the base of the freelist */
> +       cur = next_freelist_entry(s, page, &pos, start, page_limit,
> +                               freelist_count);
> +       page->freelist = cur;
> +
> +       for (idx = 1; idx < page->objects; idx++) {
> +               setup_object(s, page, cur);
> +               next = next_freelist_entry(s, page, &pos, start, page_limit,
> +                       freelist_count);
> +               set_freepointer(s, cur, next);
> +               cur = next;
> +       }
> +       setup_object(s, page, cur);
> +       set_freepointer(s, cur, NULL);
> +
> +       return true;
> +}
> +#else
> +static inline int init_cache_random_seq(struct kmem_cache *s)
> +{
> +       return 0;
> +}
> +static inline void init_freelist_randomization(void) { }
> +static inline bool shuffle_freelist(struct kmem_cache *s, struct page *page)
> +{
> +       return false;
> +}
> +#endif /* CONFIG_SLAB_FREELIST_RANDOM */
> +
>  static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  {
>         struct page *page;
> @@ -1412,6 +1515,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>         gfp_t alloc_gfp;
>         void *start, *p;
>         int idx, order;
> +       bool shuffle;
>
>         flags &= gfp_allowed_mask;
>
> @@ -1473,15 +1577,19 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>
>         kasan_poison_slab(page);
>
> -       for_each_object_idx(p, idx, s, start, page->objects) {
> -               setup_object(s, page, p);
> -               if (likely(idx < page->objects))
> -                       set_freepointer(s, p, p + s->size);
> -               else
> -                       set_freepointer(s, p, NULL);
> +       shuffle = shuffle_freelist(s, page);
> +
> +       if (!shuffle) {
> +               for_each_object_idx(p, idx, s, start, page->objects) {
> +                       setup_object(s, page, p);
> +                       if (likely(idx < page->objects))
> +                               set_freepointer(s, p, p + s->size);
> +                       else
> +                               set_freepointer(s, p, NULL);
> +               }
> +               page->freelist = fixup_red_left(s, start);
>         }
>
> -       page->freelist = fixup_red_left(s, start);
>         page->inuse = page->objects;
>         page->frozen = 1;
>
> @@ -3207,6 +3315,7 @@ static void free_kmem_cache_nodes(struct kmem_cache *s)
>
>  void __kmem_cache_release(struct kmem_cache *s)
>  {
> +       cache_random_seq_destroy(s);
>         free_percpu(s->cpu_slab);
>         free_kmem_cache_nodes(s);
>  }
> @@ -3431,6 +3540,13 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
>  #ifdef CONFIG_NUMA
>         s->remote_node_defrag_ratio = 1000;
>  #endif
> +
> +       /* Initialize the pre-computed randomized freelist if slab is up */
> +       if (slab_state >= UP) {
> +               if (init_cache_random_seq(s))
> +                       goto error;
> +       }
> +
>         if (!init_kmem_cache_nodes(s))
>                 goto error;
>
> @@ -3947,6 +4063,9 @@ void __init kmem_cache_init(void)
>         setup_kmalloc_cache_index_table();
>         create_kmalloc_caches(0);
>
> +       /* Setup random freelists for each cache */
> +       init_freelist_randomization();

dma kmalloc caches are initialized with slab_state = UP.
That means that it's random_seq is initialized twice and
some memory would leak.

Maybe, you need to check if random_seq is already initialized
or not in init_cache_randome_seq().

Others look fine to me.

Thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC v2 2/2] mm: SLUB Freelist randomization

Thomas Garnier
In reply to this post by Kees Cook
On Wed, May 25, 2016 at 3:25 PM, Kees Cook <[hidden email]> wrote:

> On Tue, May 24, 2016 at 2:15 PM, Thomas Garnier <[hidden email]> wrote:
>> Implements Freelist randomization for the SLUB allocator. It was
>> previous implemented for the SLAB allocator. Both use the same
>> configuration option (CONFIG_SLAB_FREELIST_RANDOM).
>>
>> The list is randomized during initialization of a new set of pages. The
>> order on different freelist sizes is pre-computed at boot for
>> performance. Each kmem_cache has its own randomized freelist. This
>> security feature reduces the predictability of the kernel SLUB allocator
>> against heap overflows rendering attacks much less stable.
>>
>> For example these attacks exploit the predictability of the heap:
>>  - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
>>  - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)
>>
>> Performance results:
>>
>> slab_test impact is between 3% to 4% on average:
>
> Seems like slab_test is pretty intensive (so the impact appears
> higher). On a more "regular" load like kernbench, the impact seems to
> be almost 0. Is that accurate?
>

Yes, because the slab_test done is more intensive on a single thread.
It will show higher perf impact than just a global testing. The
overall impact on the system is of course much smaller. I will detail
that on the performance details.

> Regardless, please consider both patches:
>
> Reviewed-by: Kees Cook <[hidden email]>
>
> -Kees
>
>>
>> Before:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
>> 100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
>> 100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
>> 100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
>> 100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
>> 100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
>> 100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
>> 100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
>> 100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
>> 100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
>> 2. Kmalloc: alloc/free test
>> 100000 times kmalloc(8)/kfree -> 70 cycles
>> 100000 times kmalloc(16)/kfree -> 70 cycles
>> 100000 times kmalloc(32)/kfree -> 70 cycles
>> 100000 times kmalloc(64)/kfree -> 70 cycles
>> 100000 times kmalloc(128)/kfree -> 70 cycles
>> 100000 times kmalloc(256)/kfree -> 69 cycles
>> 100000 times kmalloc(512)/kfree -> 70 cycles
>> 100000 times kmalloc(1024)/kfree -> 73 cycles
>> 100000 times kmalloc(2048)/kfree -> 72 cycles
>> 100000 times kmalloc(4096)/kfree -> 71 cycles
>>
>> After:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
>> 100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
>> 100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
>> 100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
>> 100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
>> 100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
>> 100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
>> 100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
>> 100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
>> 100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
>> 2. Kmalloc: alloc/free test
>> 100000 times kmalloc(8)/kfree -> 66 cycles
>> 100000 times kmalloc(16)/kfree -> 66 cycles
>> 100000 times kmalloc(32)/kfree -> 66 cycles
>> 100000 times kmalloc(64)/kfree -> 66 cycles
>> 100000 times kmalloc(128)/kfree -> 65 cycles
>> 100000 times kmalloc(256)/kfree -> 67 cycles
>> 100000 times kmalloc(512)/kfree -> 67 cycles
>> 100000 times kmalloc(1024)/kfree -> 64 cycles
>> 100000 times kmalloc(2048)/kfree -> 67 cycles
>> 100000 times kmalloc(4096)/kfree -> 67 cycles
>>
>> Kernbench, before:
>>
>> Average Optimal load -j 12 Run (std deviation):
>> Elapsed Time 101.873 (1.16069)
>> User Time 1045.22 (1.60447)
>> System Time 88.969 (0.559195)
>> Percent CPU 1112.9 (13.8279)
>> Context Switches 189140 (2282.15)
>> Sleeps 99008.6 (768.091)
>>
>> After:
>>
>> Average Optimal load -j 12 Run (std deviation):
>> Elapsed Time 102.47 (0.562732)
>> User Time 1045.3 (1.34263)
>> System Time 88.311 (0.342554)
>> Percent CPU 1105.8 (6.49444)
>> Context Switches 189081 (2355.78)
>> Sleeps 99231.5 (800.358)
>>
>> Signed-off-by: Thomas Garnier <[hidden email]>
>> ---
>> Based on 0e01df100b6bf22a1de61b66657502a6454153c5
>> ---
>>  include/linux/slub_def.h |   8 +++
>>  init/Kconfig             |   4 +-
>>  mm/slub.c                | 133 ++++++++++++++++++++++++++++++++++++++++++++---
>>  3 files changed, 136 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
>> index 665cd0c..22d487e 100644
>> --- a/include/linux/slub_def.h
>> +++ b/include/linux/slub_def.h
>> @@ -56,6 +56,9 @@ struct kmem_cache_order_objects {
>>         unsigned long x;
>>  };
>>
>> +/* Index used for freelist randomization */
>> +typedef unsigned int freelist_idx_t;
>> +
>>  /*
>>   * Slab cache management.
>>   */
>> @@ -99,6 +102,11 @@ struct kmem_cache {
>>          */
>>         int remote_node_defrag_ratio;
>>  #endif
>> +
>> +#ifdef CONFIG_SLAB_FREELIST_RANDOM
>> +       freelist_idx_t *random_seq;
>> +#endif
>> +
>>         struct kmem_cache_node *node[MAX_NUMNODES];
>>  };
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index a9c4aefd..fbb6678 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1771,10 +1771,10 @@ endchoice
>>
>>  config SLAB_FREELIST_RANDOM
>>         default n
>> -       depends on SLAB
>> +       depends on SLAB || SLUB
>>         bool "SLAB freelist randomization"
>>         help
>> -         Randomizes the freelist order used on creating new SLABs. This
>> +         Randomizes the freelist order used on creating new pages. This
>>           security feature reduces the predictability of the kernel slab
>>           allocator against heap overflows.
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 825ff45..217aa8a 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1405,6 +1405,109 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>>         return page;
>>  }
>>
>> +#ifdef CONFIG_SLAB_FREELIST_RANDOM
>> +/* Pre-initialize the random sequence cache */
>> +static int init_cache_random_seq(struct kmem_cache *s)
>> +{
>> +       int err;
>> +       unsigned long i, count = oo_objects(s->oo);
>> +
>> +       err = cache_random_seq_create(s, count, GFP_KERNEL);
>> +       if (err) {
>> +               pr_err("SLUB: Unable to initialize free list for %s\n",
>> +                       s->name);
>> +               return err;
>> +       }
>> +
>> +       /* Transform to an offset on the set of pages */
>> +       if (s->random_seq) {
>> +               for (i = 0; i < count; i++)
>> +                       s->random_seq[i] *= s->size;
>> +       }
>> +       return 0;
>> +}
>> +
>> +/* Initialize each random sequence freelist per cache */
>> +static void __init init_freelist_randomization(void)
>> +{
>> +       struct kmem_cache *s;
>> +
>> +       mutex_lock(&slab_mutex);
>> +
>> +       list_for_each_entry(s, &slab_caches, list)
>> +               init_cache_random_seq(s);
>> +
>> +       mutex_unlock(&slab_mutex);
>> +}
>> +
>> +/* Get the next entry on the pre-computed freelist randomized */
>> +static void *next_freelist_entry(struct kmem_cache *s, struct page *page,
>> +                               unsigned long *pos, void *start,
>> +                               unsigned long page_limit,
>> +                               unsigned long freelist_count)
>> +{
>> +       freelist_idx_t idx;
>> +
>> +       /*
>> +        * If the target page allocation failed, the number of objects on the
>> +        * page might be smaller than the usual size defined by the cache.
>> +        */
>> +       do {
>> +               idx = s->random_seq[*pos];
>> +               *pos += 1;
>> +               if (*pos >= freelist_count)
>> +                       *pos = 0;
>> +       } while (unlikely(idx >= page_limit));
>> +
>> +       return (char *)start + idx;
>> +}
>> +
>> +/* Shuffle the single linked freelist based on a random pre-computed sequence */
>> +static bool shuffle_freelist(struct kmem_cache *s, struct page *page)
>> +{
>> +       void *start;
>> +       void *cur;
>> +       void *next;
>> +       unsigned long idx, pos, page_limit, freelist_count;
>> +
>> +       if (page->objects < 2 || !s->random_seq)
>> +               return false;
>> +
>> +       freelist_count = oo_objects(s->oo);
>> +       pos = get_random_int() % freelist_count;
>> +
>> +       page_limit = page->objects * s->size;
>> +       start = fixup_red_left(s, page_address(page));
>> +
>> +       /* First entry is used as the base of the freelist */
>> +       cur = next_freelist_entry(s, page, &pos, start, page_limit,
>> +                               freelist_count);
>> +       page->freelist = cur;
>> +
>> +       for (idx = 1; idx < page->objects; idx++) {
>> +               setup_object(s, page, cur);
>> +               next = next_freelist_entry(s, page, &pos, start, page_limit,
>> +                       freelist_count);
>> +               set_freepointer(s, cur, next);
>> +               cur = next;
>> +       }
>> +       setup_object(s, page, cur);
>> +       set_freepointer(s, cur, NULL);
>> +
>> +       return true;
>> +}
>> +#else
>> +static inline int init_cache_random_seq(struct kmem_cache *s)
>> +{
>> +       return 0;
>> +}
>> +static inline void init_freelist_randomization(void) { }
>> +static inline bool shuffle_freelist(struct kmem_cache *s, struct page *page)
>> +{
>> +       return false;
>> +}
>> +#endif /* CONFIG_SLAB_FREELIST_RANDOM */
>> +
>>  static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>>  {
>>         struct page *page;
>> @@ -1412,6 +1515,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>>         gfp_t alloc_gfp;
>>         void *start, *p;
>>         int idx, order;
>> +       bool shuffle;
>>
>>         flags &= gfp_allowed_mask;
>>
>> @@ -1473,15 +1577,19 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>>
>>         kasan_poison_slab(page);
>>
>> -       for_each_object_idx(p, idx, s, start, page->objects) {
>> -               setup_object(s, page, p);
>> -               if (likely(idx < page->objects))
>> -                       set_freepointer(s, p, p + s->size);
>> -               else
>> -                       set_freepointer(s, p, NULL);
>> +       shuffle = shuffle_freelist(s, page);
>> +
>> +       if (!shuffle) {
>> +               for_each_object_idx(p, idx, s, start, page->objects) {
>> +                       setup_object(s, page, p);
>> +                       if (likely(idx < page->objects))
>> +                               set_freepointer(s, p, p + s->size);
>> +                       else
>> +                               set_freepointer(s, p, NULL);
>> +               }
>> +               page->freelist = fixup_red_left(s, start);
>>         }
>>
>> -       page->freelist = fixup_red_left(s, start);
>>         page->inuse = page->objects;
>>         page->frozen = 1;
>>
>> @@ -3207,6 +3315,7 @@ static void free_kmem_cache_nodes(struct kmem_cache *s)
>>
>>  void __kmem_cache_release(struct kmem_cache *s)
>>  {
>> +       cache_random_seq_destroy(s);
>>         free_percpu(s->cpu_slab);
>>         free_kmem_cache_nodes(s);
>>  }
>> @@ -3431,6 +3540,13 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
>>  #ifdef CONFIG_NUMA
>>         s->remote_node_defrag_ratio = 1000;
>>  #endif
>> +
>> +       /* Initialize the pre-computed randomized freelist if slab is up */
>> +       if (slab_state >= UP) {
>> +               if (init_cache_random_seq(s))
>> +                       goto error;
>> +       }
>> +
>>         if (!init_kmem_cache_nodes(s))
>>                 goto error;
>>
>> @@ -3947,6 +4063,9 @@ void __init kmem_cache_init(void)
>>         setup_kmalloc_cache_index_table();
>>         create_kmalloc_caches(0);
>>
>> +       /* Setup random freelists for each cache */
>> +       init_freelist_randomization();
>> +
>>  #ifdef CONFIG_SMP
>>         register_cpu_notifier(&slab_notifier);
>>  #endif
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>
>
>
> --
> Kees Cook
> Chrome OS & Brillo Security
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC v2 2/2] mm: SLUB Freelist randomization

Thomas Garnier
In reply to this post by js1304
On Wed, May 25, 2016 at 6:49 PM, Joonsoo Kim <[hidden email]> wrote:

> 2016-05-25 6:15 GMT+09:00 Thomas Garnier <[hidden email]>:
>> Implements Freelist randomization for the SLUB allocator. It was
>> previous implemented for the SLAB allocator. Both use the same
>> configuration option (CONFIG_SLAB_FREELIST_RANDOM).
>>
>> The list is randomized during initialization of a new set of pages. The
>> order on different freelist sizes is pre-computed at boot for
>> performance. Each kmem_cache has its own randomized freelist. This
>> security feature reduces the predictability of the kernel SLUB allocator
>> against heap overflows rendering attacks much less stable.
>>
>> For example these attacks exploit the predictability of the heap:
>>  - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
>>  - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)
>>
>> Performance results:
>>
>> slab_test impact is between 3% to 4% on average:
>>
>> Before:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
>> 100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
>> 100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
>> 100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
>> 100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
>> 100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
>> 100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
>> 100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
>> 100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
>> 100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
>> 2. Kmalloc: alloc/free test
>> 100000 times kmalloc(8)/kfree -> 70 cycles
>> 100000 times kmalloc(16)/kfree -> 70 cycles
>> 100000 times kmalloc(32)/kfree -> 70 cycles
>> 100000 times kmalloc(64)/kfree -> 70 cycles
>> 100000 times kmalloc(128)/kfree -> 70 cycles
>> 100000 times kmalloc(256)/kfree -> 69 cycles
>> 100000 times kmalloc(512)/kfree -> 70 cycles
>> 100000 times kmalloc(1024)/kfree -> 73 cycles
>> 100000 times kmalloc(2048)/kfree -> 72 cycles
>> 100000 times kmalloc(4096)/kfree -> 71 cycles
>>
>> After:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
>> 100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
>> 100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
>> 100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
>> 100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
>> 100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
>> 100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
>> 100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
>> 100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
>> 100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
>> 2. Kmalloc: alloc/free test
>> 100000 times kmalloc(8)/kfree -> 66 cycles
>> 100000 times kmalloc(16)/kfree -> 66 cycles
>> 100000 times kmalloc(32)/kfree -> 66 cycles
>> 100000 times kmalloc(64)/kfree -> 66 cycles
>> 100000 times kmalloc(128)/kfree -> 65 cycles
>> 100000 times kmalloc(256)/kfree -> 67 cycles
>> 100000 times kmalloc(512)/kfree -> 67 cycles
>> 100000 times kmalloc(1024)/kfree -> 64 cycles
>> 100000 times kmalloc(2048)/kfree -> 67 cycles
>> 100000 times kmalloc(4096)/kfree -> 67 cycles
>>
>> Kernbench, before:
>>
>> Average Optimal load -j 12 Run (std deviation):
>> Elapsed Time 101.873 (1.16069)
>> User Time 1045.22 (1.60447)
>> System Time 88.969 (0.559195)
>> Percent CPU 1112.9 (13.8279)
>> Context Switches 189140 (2282.15)
>> Sleeps 99008.6 (768.091)
>>
>> After:
>>
>> Average Optimal load -j 12 Run (std deviation):
>> Elapsed Time 102.47 (0.562732)
>> User Time 1045.3 (1.34263)
>> System Time 88.311 (0.342554)
>> Percent CPU 1105.8 (6.49444)
>> Context Switches 189081 (2355.78)
>> Sleeps 99231.5 (800.358)
>>
>> Signed-off-by: Thomas Garnier <[hidden email]>
>> ---
>> Based on 0e01df100b6bf22a1de61b66657502a6454153c5
>> ---
>>  include/linux/slub_def.h |   8 +++
>>  init/Kconfig             |   4 +-
>>  mm/slub.c                | 133 ++++++++++++++++++++++++++++++++++++++++++++---
>>  3 files changed, 136 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
>> index 665cd0c..22d487e 100644
>> --- a/include/linux/slub_def.h
>> +++ b/include/linux/slub_def.h
>> @@ -56,6 +56,9 @@ struct kmem_cache_order_objects {
>>         unsigned long x;
>>  };
>>
>> +/* Index used for freelist randomization */
>> +typedef unsigned int freelist_idx_t;
>> +
>>  /*
>>   * Slab cache management.
>>   */
>> @@ -99,6 +102,11 @@ struct kmem_cache {
>>          */
>>         int remote_node_defrag_ratio;
>>  #endif
>> +
>> +#ifdef CONFIG_SLAB_FREELIST_RANDOM
>> +       freelist_idx_t *random_seq;
>> +#endif
>> +
>>         struct kmem_cache_node *node[MAX_NUMNODES];
>>  };
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index a9c4aefd..fbb6678 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1771,10 +1771,10 @@ endchoice
>>
>>  config SLAB_FREELIST_RANDOM
>>         default n
>> -       depends on SLAB
>> +       depends on SLAB || SLUB
>>         bool "SLAB freelist randomization"
>>         help
>> -         Randomizes the freelist order used on creating new SLABs. This
>> +         Randomizes the freelist order used on creating new pages. This
>>           security feature reduces the predictability of the kernel slab
>>           allocator against heap overflows.
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 825ff45..217aa8a 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1405,6 +1405,109 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>>         return page;
>>  }
>>
>> +#ifdef CONFIG_SLAB_FREELIST_RANDOM
>> +/* Pre-initialize the random sequence cache */
>> +static int init_cache_random_seq(struct kmem_cache *s)
>> +{
>> +       int err;
>> +       unsigned long i, count = oo_objects(s->oo);
>> +
>> +       err = cache_random_seq_create(s, count, GFP_KERNEL);
>> +       if (err) {
>> +               pr_err("SLUB: Unable to initialize free list for %s\n",
>> +                       s->name);
>> +               return err;
>> +       }
>> +
>> +       /* Transform to an offset on the set of pages */
>> +       if (s->random_seq) {
>> +               for (i = 0; i < count; i++)
>> +                       s->random_seq[i] *= s->size;
>> +       }
>> +       return 0;
>> +}
>> +
>> +/* Initialize each random sequence freelist per cache */
>> +static void __init init_freelist_randomization(void)
>> +{
>> +       struct kmem_cache *s;
>> +
>> +       mutex_lock(&slab_mutex);
>> +
>> +       list_for_each_entry(s, &slab_caches, list)
>> +               init_cache_random_seq(s);
>> +
>> +       mutex_unlock(&slab_mutex);
>> +}
>> +
>> +/* Get the next entry on the pre-computed freelist randomized */
>> +static void *next_freelist_entry(struct kmem_cache *s, struct page *page,
>> +                               unsigned long *pos, void *start,
>> +                               unsigned long page_limit,
>> +                               unsigned long freelist_count)
>> +{
>> +       freelist_idx_t idx;
>> +
>> +       /*
>> +        * If the target page allocation failed, the number of objects on the
>> +        * page might be smaller than the usual size defined by the cache.
>> +        */
>> +       do {
>> +               idx = s->random_seq[*pos];
>> +               *pos += 1;
>> +               if (*pos >= freelist_count)
>> +                       *pos = 0;
>> +       } while (unlikely(idx >= page_limit));
>> +
>> +       return (char *)start + idx;
>> +}
>> +
>> +/* Shuffle the single linked freelist based on a random pre-computed sequence */
>> +static bool shuffle_freelist(struct kmem_cache *s, struct page *page)
>> +{
>> +       void *start;
>> +       void *cur;
>> +       void *next;
>> +       unsigned long idx, pos, page_limit, freelist_count;
>> +
>> +       if (page->objects < 2 || !s->random_seq)
>> +               return false;
>> +
>> +       freelist_count = oo_objects(s->oo);
>> +       pos = get_random_int() % freelist_count;
>> +
>> +       page_limit = page->objects * s->size;
>> +       start = fixup_red_left(s, page_address(page));
>> +
>> +       /* First entry is used as the base of the freelist */
>> +       cur = next_freelist_entry(s, page, &pos, start, page_limit,
>> +                               freelist_count);
>> +       page->freelist = cur;
>> +
>> +       for (idx = 1; idx < page->objects; idx++) {
>> +               setup_object(s, page, cur);
>> +               next = next_freelist_entry(s, page, &pos, start, page_limit,
>> +                       freelist_count);
>> +               set_freepointer(s, cur, next);
>> +               cur = next;
>> +       }
>> +       setup_object(s, page, cur);
>> +       set_freepointer(s, cur, NULL);
>> +
>> +       return true;
>> +}
>> +#else
>> +static inline int init_cache_random_seq(struct kmem_cache *s)
>> +{
>> +       return 0;
>> +}
>> +static inline void init_freelist_randomization(void) { }
>> +static inline bool shuffle_freelist(struct kmem_cache *s, struct page *page)
>> +{
>> +       return false;
>> +}
>> +#endif /* CONFIG_SLAB_FREELIST_RANDOM */
>> +
>>  static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>>  {
>>         struct page *page;
>> @@ -1412,6 +1515,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>>         gfp_t alloc_gfp;
>>         void *start, *p;
>>         int idx, order;
>> +       bool shuffle;
>>
>>         flags &= gfp_allowed_mask;
>>
>> @@ -1473,15 +1577,19 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>>
>>         kasan_poison_slab(page);
>>
>> -       for_each_object_idx(p, idx, s, start, page->objects) {
>> -               setup_object(s, page, p);
>> -               if (likely(idx < page->objects))
>> -                       set_freepointer(s, p, p + s->size);
>> -               else
>> -                       set_freepointer(s, p, NULL);
>> +       shuffle = shuffle_freelist(s, page);
>> +
>> +       if (!shuffle) {
>> +               for_each_object_idx(p, idx, s, start, page->objects) {
>> +                       setup_object(s, page, p);
>> +                       if (likely(idx < page->objects))
>> +                               set_freepointer(s, p, p + s->size);
>> +                       else
>> +                               set_freepointer(s, p, NULL);
>> +               }
>> +               page->freelist = fixup_red_left(s, start);
>>         }
>>
>> -       page->freelist = fixup_red_left(s, start);
>>         page->inuse = page->objects;
>>         page->frozen = 1;
>>
>> @@ -3207,6 +3315,7 @@ static void free_kmem_cache_nodes(struct kmem_cache *s)
>>
>>  void __kmem_cache_release(struct kmem_cache *s)
>>  {
>> +       cache_random_seq_destroy(s);
>>         free_percpu(s->cpu_slab);
>>         free_kmem_cache_nodes(s);
>>  }
>> @@ -3431,6 +3540,13 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
>>  #ifdef CONFIG_NUMA
>>         s->remote_node_defrag_ratio = 1000;
>>  #endif
>> +
>> +       /* Initialize the pre-computed randomized freelist if slab is up */
>> +       if (slab_state >= UP) {
>> +               if (init_cache_random_seq(s))
>> +                       goto error;
>> +       }
>> +
>>         if (!init_kmem_cache_nodes(s))
>>                 goto error;
>>
>> @@ -3947,6 +4063,9 @@ void __init kmem_cache_init(void)
>>         setup_kmalloc_cache_index_table();
>>         create_kmalloc_caches(0);
>>
>> +       /* Setup random freelists for each cache */
>> +       init_freelist_randomization();
>
> dma kmalloc caches are initialized with slab_state = UP.
> That means that it's random_seq is initialized twice and
> some memory would leak.
>
> Maybe, you need to check if random_seq is already initialized
> or not in init_cache_randome_seq().

Thanks, I will look into that.

>
> Others look fine to me.
>

Thanks, I will move to PATCH on next iteration (based on linux-next
related to the other thread).

> Thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC v2 1/2] mm: Reorganize SLAB freelist randomization

Thomas Garnier
In reply to this post by Joonsoo Kim-2
On Wed, May 25, 2016 at 6:09 PM, Joonsoo Kim <[hidden email]> wrote:

> On Tue, May 24, 2016 at 02:15:22PM -0700, Thomas Garnier wrote:
>> This commit reorganizes the previous SLAB freelist randomization to
>> prepare for the SLUB implementation. It moves functions that will be
>> shared to slab_common. It also move the definition of freelist_idx_t in
>> the slab_def header so a similar type can be used for all common
>> functions. The entropy functions are changed to align with the SLUB
>> implementation, now using get_random_* functions.
>
> Could you explain more what's the difference between get_random_*
> and get_random_bytes_arch() and why this change is needed?
>
> And, I think that it should be another patch.
>

Sure. From my test (limited scenario), get_random_bytes_arch was much
slower than get_random_int when the random instructions were not
available. It also seems from looking at both implementation that
get_random_int may provide a bit more entropy at this early stage.

>>
>> Signed-off-by: Thomas Garnier <[hidden email]>
>> ---
>> Based on 0e01df100b6bf22a1de61b66657502a6454153c5
>> ---
>>  include/linux/slab_def.h | 11 +++++++-
>>  mm/slab.c                | 68 ++----------------------------------------------
>>  mm/slab.h                | 16 ++++++++++++
>>  mm/slab_common.c         | 48 ++++++++++++++++++++++++++++++++++
>>  4 files changed, 76 insertions(+), 67 deletions(-)
>>
>> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
>> index 8694f7a..e05a871 100644
>> --- a/include/linux/slab_def.h
>> +++ b/include/linux/slab_def.h
>> @@ -3,6 +3,15 @@
>>
>>  #include <linux/reciprocal_div.h>
>>
>> +#define FREELIST_BYTE_INDEX (((PAGE_SIZE >> BITS_PER_BYTE) \
>> +                             <= SLAB_OBJ_MIN_SIZE) ? 1 : 0)
>> +
>> +#if FREELIST_BYTE_INDEX
>> +typedef unsigned char freelist_idx_t;
>> +#else
>> +typedef unsigned short freelist_idx_t;
>> +#endif
>> +
>
> This is a SLAB specific index size definition and I don't want to export
> it to SLUB. Please use 'void *random_seq' and allocate sizeof(void *)
> memory for each entry. And, then do type casting when suffling in
> SLAB. There is some memory waste but not that much so we can tolerate
> it.
>
> Others look fine to me.
>

Ok, will do. Thanks.

> Thanks.
>
Loading...