Extending ejabberd, the first time

Have you ever worked with XMPP or ejabberd? First things first: both things are great! The XMPP protocol is very robust, extensible and well documented. ejabberd on the other hand is a scalable and easy to use messaging server which implements the XMPP protocol. And we extended it for our custom needs.

The Issue™

We build tools for real estate agents and customers who want to sell their properties. The first thing which comes to mind is the communication between both. Some like to write emails, others like a phone call and there are the people who like to chat with instant messages. Think of SMS or the new fashioned version: WhatsApp.

By providing solutions for both parties, we have to deal with multiple kinds of devices: Tablets, mobile phones, desktop systems - which offer different possibilities. Our issue is located at the instant messaging component and the distribution of the user devices.

Usecase from the user’s perspective

Say a real estate agent has a communication channel for each property he is aware of, and there he can get in contact with HAUSGOLD for support. The common situation is that an agent has multiple properties he works on concurrently. So he asks for help for one of them and receives an answer a bit later that day. In the meanwhile he does some other tasks and time passes by. Then he gets notified about the answer and opens the communication channel on his desktop system and starts to chat with one of our service agents. Soon his questions are answered and he is on the way to a property for a visit. Now he uses his mobile phone with our native app to check for updates on another property. Is there a new message? - He just opens the communication dashboard and sees a badge with a counter for each message he missed. Straightforward eh?

That’s the point where ejabberd lacks support for already read messages or the count of unread messages for a specific room. The feature is commonly known through WhatsApp - the already read messages inside the conversation are marked and outside of the conversation there is a badge with a counter for missed messages. And these information are synced between the native app and the web browser version. And we want to offer the exact same functionality to our users.

Unread message counter

As you can see on the figure, there is a card with a message bubble on the right upper corner which indicates there are 5 new messages which were not read by the user yet.

Let’s have a look at the technical foundation

XMPP to the rescue, there are some solutions already specified. The best matching specification out there is XEP-0333 Chat Markers, but there was at the time of writing no implementation available for ejabberd. That leads us to look into XEP-0012 Last Activity which is available for ejabberd, but does not offer the minimum requirements for our feature.

We discussed multiple approaches, and one was based on the idea of sending custom message stanzas inside the multi-user conversation to acknowledge the last read message of a user. The upside of this solution is the easy implementation on the client side, without modifying the server side at all. But the drawbacks exceeded the pros because this would mess up the history quite badly and the database would rapidly grow in its size. So most of the persisted stanzas would be acknowledgment messages instead of real text messages. Furthermore there is no lightweight way to retrieve the last acknowledged message without fetching the last N messages from the history, which makes it quite bad for mobile internet connections. The unread message count looks quite the same, a painful client aggregation over the history.

The Solution™

We implemented a lightweight mix of XEP-0333 Chat Markers and XEP-0012 Last Activity ourselves as a custom ejabberd Erlang module. Which offers a fairly satisfying solution for future enhancements and the currently required feature set.

And thats how it works under the hood:

mod_read_markers architecture

The custom message stanza approach was dropped in favor of a dedicated IQ stanza workflow for acknowledging and retrieving the last read message and the database pollution consideration was reduced to a minimum. The database schema allows the module to persist and query the last read message for each user on each multi-user conversation very efficiently in ways of storage size and query performance. Each user on each conversation causes a new row which gets updated when the last read message is changed or the unseed message counter increases. Let’s do the math to get a feeling of the data size.

There are some assumptions we postulate first: There is one multi-user conversation (R) per property. A property is normally managed by one to three real estate agents (A) of one office. There is one property owner (O) and one HAUSGOLD service agent (SA) involved as well. So the formula is R times
(A+O+SA) . With some real values the formula looks like this:

This looks promising to scale good for the future. But the storage-efficiency comes with the cost of no historic data. We persist and update only the last read message without tracking the milestones. This is perfectly fine for our requirements because we do not want to display when any message on the conversation was read by whom on which time.

We open sourced the mod_read_markers module together with a concept paper. If you are interested in the details of the project have a look at the docs and the codebase itself. The code was commented a lot to help understanding it and customizing it further if you like.

Behind the scenes

The module comes with many goodies which eased the development. The first one is a Docker packaged ejabberd server which requires no time to setup to get it up and running with a database (PostgreSQL). The project comes with a GNU Make manifest to help people get the thing running with a standardized user interface. ($ make install and/or $ make start) This is our internal practice to streamline all the projects for each developer, no matter of its experience level or discipline. And it’s so much fun when everything on your ecosystem works this way, so you can play with new things and quickly being productive.

Another major component of the module is the end-to-end test suite which helped to check if the module is doing the right things while it was developed. The suite reloads the running ejabberd server with the current build of the module, seeds some database objects (users, multi-user conversations, messages, etc.) and sends the IQ’s and verifies their results. On Travis CI, too.

For me it was the first time that I wrote something in Erlang but after around three days I was comfortable with the syntax and the major concepts. Erlang is really well documented and the compiler prints quite helpful messages which eased the whole development a lot. Unfortunately the ejabberd documentations were not this helpful. Almost everything I read up was outdated, incomplete or hard to find. It took me hours to find out how to write a new XMPP specification for the custom elements (XMPP codec) and how to integrate them correctly. There is no up to date list of all usable hooks and how to use them correctly, which was quite annoying.

But the ejabberd community is really active and helpful (thanks processone!) and the ejabberd core modules code act briefly as a living documentation which can be used.

From my perspective it was much fun to get the thing implemented and the result works as expected so the project was a success. If you’re brave enough to hand roll your own custom ejabberd module take a look at the ejabberd-contrib repository and/or use the mod_read_markers repository as a template to get up quickly.