The Story of the IT-depression, birds and EJDB 2.0
Apart from the development, I also teach at the Department of Informational Systems in a university, and once during the exam I told one of my students that at the end of the course students should have at least a basic idea on how the stop-and-copy garbage collector works. Student argued that the intellect and analytical skills can always offset not knowing some facts, plus the topic itself looks minuscule in comparison to the fundamental laws of nature. Student pointed out:
I want to tell you a story about one small but useful software project with some thoughtful implication.
It is 2011: popularization of NoSQL
The question about birds was asked in far-off 2011, and I really started to think about it. My lazy stream of thoughts was interrupted by a mundane developer’s task: to make a convenient information storage and search system for audio files metatags for a small media player written in C++.
Taking on a task, just like any engineer, I started looking for the available solutions and, surprisingly, received quite poor results: the implementation was either too bulky (SQLite) or had stability issues (GigaBASE). “Bulky SQLite?” you might ask. Strangely enough, yes. My inner perfectionist was rebelling: the size of my whole project was one tenth of the single sqlite.c file. At that time I didn’t compare project size to the system kernel. 2011 was the golden age of different NoSQL solutions. I was taking a closer look at the MongoDB as a possible solution but we faced two essential limitations:
- It was difficult to use MongoDB as an add-on DLL because it’s developers initially didn’t plan for such a use case, and the MongoDB was not technically ready for this.
- MongoDB was under AGPL license which is, in my opinion, detrimental. AGPL doesn’t bring anything to the world of open software and usually just a convenient cover for the commercially-licensed products with a shareware code.
Looking for a better option of the lightweight data storage system for small projects, I defined the key qualities of the product:
- Document-oriented with a possibility to store hierarchy of documents with an arbitrary structure.
- Support JSON documents collections.
- Implemented in C99 language.
- Free license which allows use in the closed sourced products.
Same trap twice
As you can guess, we created our own solution, which was inspired by the MongoDB: a database management system implemented as a shared library (EJDB 1.0).
EJDB 1.0 is based on a fascinating project TokyoCabinet, alas — abandoned now. EJDB became quite popular on github among developers from all over the world. At that time one of the 10gen managers contacted me to discuss the query compatibility between EJDB and MongoDB but it didn’t go any further than a discussion.
However, we took too much from MongoDB and that became a problem. For example, the bulky format of the search queries, e.g.:
Eventually we realized that such query representation suits robots not people.
Another controversial feature taken from the MongoDB was to keep document’s primary key in the document itself as a special _id field, which mixes the original structure of the stored document with the implementation of the storage system. Undoubtedly, the better approach would be to separate original documents from the data structures used to manage collections.
Despite the inner beauty of the TokyoCabinet, it imposed LGPL 1.x license on EJDB, which is freer than AGPL but still limits the use of the EJDB by many projects.
Being a bit disappointed I released a number of versions of EJDB and then left it inactive for a few years. From time to time I recalled this project when I thought about that very question of birds.
2018: inspiration gained
It took a few years to finally convince myself that birds are not smarter than humans. They instinctively use the laws of nature unable to comprehend them and to use according to their needs. But aren’t many developers doing the same? Instead of comprehending “the laws of nature” they thoughtlessly migrate to technologies and frameworks like IT-birds migrating with the seasons? A true story (from job interviews):
— You said you used TypeScript a lot. Does it mean you’re also familiar with JS and you understand how TypeScript improves it and what’s the difference?
— Well, I haven’t studied JS, so I can’t say.
— Why your tenure was so short in every company you worked for?
— The projects were generally boring. Wild outsourcing, you see? I want to develop real products!
After interviewing yet another candidate Andrew and I got a drink, discussed candidates and talked about the unnecessary complications of the modern software. Many specialists who came to the interview were surprised that we do not use Spring Boot in our projects: “because everybody does”. But almost no one is surprised by the Slack-like desktop apps based on Eleсtron, which effectively are secretly packed full featured web-browsers eating up hundreds of megabytes of virtual memory to serve a simple chat.
During the discussion we started to understand what irritates us so much in the modern IT — the inappropriate use of the monstrous hype solutions.
— Stop complicating! — we decided, and used this as an inspiration we took on the second version of EJDB.
EJDB 2.0: philosophy of openness and simplicity
Open source can not be half-open, so we decided to use the MIT license for all the project components, completely getting rid of LGPL code from the TokyoCabinet.
Just for the EJDB2 we developed Key-value data storage (iowow.io) under the MIT license.
Iowow can work with a lot of key-value data sub-bases, stored in one file, which simplified transferring the data between devices and making backup copies and it also reduces the chance of inconsistencies in the storage data.
Iowow is based on the simple data structure — skip list, which allowed us to create a much more understandable implementation of the persistent storage with a much smaller code base (compared to B+tree and LSM tree) with high performance (according to benchmarks).
Maximum volume of a database file is 512G, which is a consequence of compromises in the implementation of the one-file database on skip lists.
Documents stored in the EJDB2 collections are serialized in a simple binary format — Binn.
During the work on EJDB2 we reconsidered the search queries format and we introduced a new, more intuitive (comparing to the 1.x version) XPath-like search queries — the JQL language.
JQL supports JSON Patch and JSON merge patch standards. Here are some examples:
Select all documents and then keep (project) only firstName and lastName fields in the resulting documents:
It is always better to have a choice than not, so, despite that the easiest and the most efficient way of accessing the data is to do it locally, with EJDB2 you can expose your database via HTTP/Websocket protocols so you can access it remotely over the network using a simple text protocol. This substantially broadens the number of architectures you can have with EJDB2. To enable remote access, built-in HTTP server can be configured and started using C API or a separate executable can be started. You can both query and manage your data using network API.
Aside from the hype-rays
Having finished with the birds, I got haunted by the thoughts about a spanner — this invention was not intended to tighten all possible kinds of screw-nuts but the number of screw-nuts existing just for this spanner is enough for it to have it’s purpose of life. On the other hand, it is weird to tighten all screw-nuts with one spanner. It’s quite common to promise developers to ride a spaceship but in reality they dig a tunnel with a pickaxe. We give them pickaxe right away because it is the only way to stop being a bird and understand which spanner will suit which screw-nuts.