UNLINK vs DEL - A deep dive into how it works internally in Redis 🪢

A couple of days back, I found myself debating the differences between Redis' UNLINK and DEL commands with my friend Sarthak in a social media comment section. An interesting take; I come across that the majority of the people seemed to believe is "DEL is a blocking command. while UNLINK is non-blocking - so UNLINK is better". I don't fully agree with this characterisation.

It's somewhat true - but it's not the full story. As an engineer, it's my moral duty to unnecessarily dive into the rabbit hole and dig into the Redis codebase to see the actual implementation. Let's see what's actually happening under the hood.

TLDR;

DEL vs UNLINK - the only difference is the way they free the value (freeing the key is straightforward). Respectfully, It will be completely wrong to just say one is blocking and another is not.

UNLINK is a smart command: it's not always non-blocking/async. It calculates the deallocation cost of an object, and if it is very small (cost of freeing < 64), it will just do what DEL is supposed to do and free the object ASAP. Otherwise, the object is sent to the background queue for processing.

What are UNLINK and DEL commands?

For my internet friends who aren't familiar with Redis - it's a super popular, distributed in-memory key-value database. As the name suggests, both UNLINK and DEL commands do the same thing - they remove the keys from Redis.

A quick Google search tells you why UNLINK was introduced in Redis 4.0, if we already had DEL. UNLINK is very similar to DEL but it performs the memory reclaiming in a different thread - so it's not blocking other operations, while DEL is. In simple terms, UNLINK just unlinks the key from the keyspace, and actual key removal happens later, asynchronously - so it's faster.

Fun fact - In redis 6.0, a new configuration lazyfree-lazy-user-del was introduced - so if this is set to true, your DEL command runs like UNLINK.

But how it's implemented internally?

DEL command - source code analysis

It doesn't take a genius to find the very famous db.c in redis codebase, and head over to delCommand method - thanks to Github symbol search.

void delCommand(client *c) {
    delGenericCommand(c,server.lazyfree_lazy_user_del);
}

Not surprised - It just proxies the delGenericCommand method with the lazy flag which is sent as the value of lazyfree_lazy_user_del redis server config. I'm sure for unlinkCommand, they are gonna just set it to always true - neat.

/* This command implements DEL and UNLINK. */
void delGenericCommand(client *c, int lazy) {
    int numdel = 0, j;

    for (j = 1; j < c->argc; j++) {
	    // cleaning up already expired data 
        if (expireIfNeeded(c->db,c->argv[j],0) == KEY_DELETED)
            continue;
        // ternary operator to call method based on the lazy flag
        int deleted  = lazy ? dbAsyncDelete(c->db,c->argv[j]) :
                              dbSyncDelete(c->db,c->argv[j]);
	    // 
        if (deleted) {
            signalModifiedKey(c,c->db,c->argv[j]);
            notifyKeyspaceEvent(NOTIFY_GENERIC,
                "del",c->argv[j],c->db->id);
            server.dirty++;
            numdel++;
        }
    }
    addReplyLongLong(c,numdel);
}

There is too much to go through - but this beautiful ternary operator got my attention - the lazy flag decides which method to call. Now, we will go deeper into dbSyncDelete and come back to dbAsyncDelete in UNLINK code analysis.

Hold on - Redis using j as variable name? Where are all those clean code evangelists lecturing me about meaningful variable names? never mind.

Let's get into dbSyncDelete. Let me guess - it shouldn't be hard deleting a key from a hash table. Ah, not really. My immature high-level language lover mind took garbage collectors for granted. For a language like C, it's very very interesting and important to pay attention to how memory gets released.

/* Delete a key, value and associated expiration entry if any, from the DB */
int dbSyncDelete(redisDb *db, robj *key) {
    return dbGenericDelete(db, key, 0, DB_FLAG_KEY_DELETED);
}

Ah, it's a proxy to dbGenericDelete with async argument set as 0. Cool. Let's go to dbGenericDelete .

/* Helper for sync and async delete. */
int dbGenericDelete(redisDb *db, robj *key, int async, int flags) {
    dictEntry **plink;
    int table;
    int slot = getKeySlot(key->ptr);
    dictEntry *de = kvstoreDictTwoPhaseUnlinkFind(db->keys, slot, key->ptr, &plink, &table);
    if (de) {
        robj *val = dictGetVal(de);

        /* remove key from histogram */
        updateKeysizesHist(db, slot, val->type, getObjectLength(val), 0);

        /* If hash object with expiry on fields, remove it from HFE DS of DB */
        if (val->type == OBJ_HASH)
            hashTypeRemoveFromExpires(&db->hexpires, val);

        /* RM_StringDMA may call dbUnshareStringValue which may free val, so we
         * need to incr to retain val */
        incrRefCount(val);
        /* Tells the module that the key has been unlinked from the database. */
        moduleNotifyKeyUnlink(key,val,db->id,flags);
        /* We want to try to unblock any module clients or clients using a blocking XREADGROUP */
        signalDeletedKeyAsReady(db,key,val->type);
        /* We should call decr before freeObjAsync. If not, the refcount may be
         * greater than 1, so freeObjAsync doesn't work */
        decrRefCount(val);
        if (async) {
            /* Because of dbUnshareStringValue, the val in de may change. */
            freeObjAsync(key, dictGetVal(de), db->id);
            kvstoreDictSetVal(db->keys, slot, de, NULL);
        }
        /* Deleting an entry from the expires dict will not free the sds of
         * the key, because it is shared with the main dictionary. */
        kvstoreDictDelete(db->expires, slot, key->ptr);

        kvstoreDictTwoPhaseUnlinkFree(db->keys, slot, de, plink, table);
        return 1;
    } else {
        return 0;
    }
}

Ahh I too found it overwhelming !! But I did find a couple of things really interesting here.

1. Key slot calculation

The method int slot = getKeySlot(key->ptr); calculates the slot for the given key. Redis uses a hash slot mechanism to distribute keys across nodes in a cluster. Shamelessly plugging my tweet that explains the mechanism in a little detail.

2. Two-phase linking

Redis moved away from the traditional way of removing the key long back. kvstoreDictTwoPhaseUnlinkFind method finds the key in the main dictionary using the slot number and the key. This is the first phase - it does not delete the key instead it sets up plink and table. In the second phase - key will be safely removed. kvstoreDictTwoPhaseUnlinkFree is what I'm referring to here. This immediately releases the memory.

3. Asynchronous deletion

The if (async) code block does a bunch of really nice things, like scheduling the memory to be freed up asynchronously. And also setting up the value of entry in main dict to NULL to mark it for deletion. We will dive into it in the UNLINK section.

4. Delete expiration data

Interesting thing to note is redis maintains 2 dictionary - the main key dictionary and expiration dictionary. So, as a key is deleted - it needs to be removed from the expiration dict as well - manually. This is what - kvstoreDictDelete method does.

Well - so what have you learnt here? The normal DEL command implementation is quite straight forward. Synchronously remove the key and release the memory using the two-phase linking mechanism. This is done in both the main key dictionary and expiration dictionary. And if there is an inner reference - recursive deletion comes to the rescue.

UNLINK command - source code analysis

As we already covered the core logic and the code is generic with the async flag being passed down, lets' dive straight into how async delete is actually happening.

if (async) {
	/* Because of dbUnshareStringValue, the val in de may change. */
	freeObjAsync(key, dictGetVal(de), db->id);
	kvstoreDictSetVal(db->keys, slot, de, NULL);
}

Wohoo, jump to freeObjAsync.

void freeObjAsync(robj *key, robj *obj, int dbid) {
    size_t free_effort = lazyfreeGetFreeEffort(key,obj,dbid);
    /* Note that if the object is shared, to reclaim it now it is not
     * possible. This rarely happens, however sometimes the implementation
     * of parts of the Redis core may call incrRefCount() to protect
     * objects, and then call dbDelete().
	 * LAZYFREE_THRESHOLD = 64 
     * */
    if (free_effort > LAZYFREE_THRESHOLD && obj->refcount == 1) {
        atomicIncr(lazyfree_objects,1);
        bioCreateLazyFreeJob(lazyfreeFreeObject,1,obj);
    } else {
        decrRefCount(obj);
    }
}

Damn, that's smart - it calculates the cost of deleting the object. Once it gets how expensive it is (free_effort indicates efforts in terms of CPU, time and memory usage may be?) - it decides to either free the memory immediately or delay it for later once it has enough CPU cycles to do it.

LAZYFREE_THRESHOLD is set as 64. And the check obj->refcount == 1 ensures that object is no-longer referenced anywhere else otherwise it might get stuck in removing it's references recursively.

atomicIncr is just an atomic increment counter operation on global variable lazyfree_objects that has the number of objects currently being freezed lazily. It helps redis in scheduling the operations in the background.

bioCreateLazyFreeJob is responsible for creating a job in background IO (BIO) to lazily free the object. lazyfreeFreeObject is a callback function defined on line 13 of this file.

How effort for deleting the key is calculated?

Redis' codebase says - the return value is not always the actual number of allocations the object is composed of, but a number proportional to it.

for strings, it is always 1 as it does not require multiple allocations to free - so constant effort
for list objects - it is the number of elements in the quicklist.
for set objects - it is the number of elements in the hash table as set is backed by hashtable.
for sorted set objects - it is the length of skiplist
for hash objects - it is the number of elements in the hash table
for stream objects - it is the number of RAX nodes + consumer groups + entries in pending entry list (PEL)
for module - the process of calculating the value in this case seems a bit complicated for me to understand. It uses moduleGetFreeEffort.
default, it's always 1.

The magic number LAZYFREE_THRESHOLD = 64

Although, there is no clear explanation of LAZYFREE_THRESHOLD as 64, so it seems a bit arbitrary. A couple of my internet stranger friends and chatGPT says this number was chosen by redis developers post a massive benchmarking. The consideration was trade-off such as performance vs blocking, memory management and avoiding overhead.

Ok, I trust that.

Now that I’ve wrapped up my ramblings, I invite your comments on it. If you find any technical inaccuracies, let me know, please. I'm active on X (twitter) as @the2ndfloorguy and if you are interested in what an unfunny & strange programmer will do next, see you there!