Very slow to write to Django file-based cache

2021-03-11

The built-in file-based backend django.core.cache.backends.filebased.FileBasedCache is used in my Django application. It takes seconds to write 1000 entries (by calling set() method 1000 times) to the file-based cache despite LOCATION has been pointed to the shared memory /dev/shm.

After reading the source code of filebased.py, I found Django will cull (_cull() method) the entire cache every time when set() method is invoked. _cull() method is devised to remove random cache entries if MAX_ENTRIES is reached (the maximum number of entries allowed in the cache before old values are deleted). How does Django know the number of existing entries? Django evaluates the length of the list of filenames returned by _list_cache_files() (an expensive method).

Customized cache backend

In my use case, the number of cache entries is bounded by the number of objects of a model. It is unnecessary to "cull" the cache when an entry is set. I made a copy of filebased.py and commented out line self._cull() # make some room if necessary in set() method.

On Linux, each file occupies at least 4KB of disk space. Considering cached values are generally smaller than 1KB, compression overhead can be avoided by removing zlib.compress and zlib.decompress from the code. It is also a good idea to change cache_suffix to avoid mismatch.

Remove expired cache files

The following method was implemented to remove expired cache files actively.

def remove_expired(self):
    for fname in self._list_cache_files():
        with open(fname, 'rb') as f:
            self._is_expired(f)