My FeedDiscussionsHeadless CMS
New
Sign in
Log inSign up
Learn more about Hashnode Headless CMSHashnode Headless CMS
Collaborate seamlessly with Hashnode Headless CMS for Enterprise.
Upgrade ✨Learn more
Metastocle - a decentralized data storage

Metastocle - a decentralized data storage

Alexander Balasyan's photo
Alexander Balasyan
·Jun 13, 2020

What if you need to store a variety of data decentralized? Objects, arrays, dates, numbers, strings, yes anything. Is it necessary to develop a powerful DBMS for this? Indeed, often we just need to store and receive data in a distributed, openly, but as simple as possible and without any special claims.

In this article, I would like to reveal a little bit about the metastocle library, which can be used to solve the above problem easily, but with some limitations.

A bit of background

About a year ago, there was a desire and need to create a music storage. See this article for more details. From the very beginning, it was clear that you need to write everything so that you can do the same with other entities in the future: books, videos, etc. it was decided to divide everything into layers that can be used independently.

Metastocle is one of the layers that allows you to store and retrieve many types of data (but not files), as opposed to the storacle layer, which implements working with files.

When we save files, we need to write the hashes somewhere so that we can access them later. This is exactly why we need metastocle. It's where we keep everything we need: the names of the songs, links to files etc.

As a result, all this was brought to a certain universal form, and the system consists of three main entities:

  • Collections - an entity for defining the data structure, various options, and so on.
  • Documents - data itself, as objects.
  • Actions(Instructions) - a set of rules for processing the required data: filtering, sorting, limiting, and so on.

Let's look at a couple of examples:

Server:

const Node = require('metastocle').Node;

(async () => {  
  try {
    const node = new Node({
      port: 4000,
      hostname: 'localhost'
    });
    // Creating a collection
    await node.addCollection('test', { limit: 10000, pk: 'id' });
    await node.init();
  }
  catch(err) {
    console.error(err.stack);
    process.exit(1);
  }
})();

Client:

  const Client = require('metastocle').Client;

(async () => {  
  try {
    const client = new Client({
      address: 'localhost:4000'
    });
    await client.init();

    // Adding a document
    const doc = await client.addDocument('test', { text: 'hi' });

    // Updating this document
    await client.updateDocuments('test', { text: 'bye' }, {
      filter: { id: doc.id }
    });

    // Adding another document
    await client.addDocument('test', { id: 2, text: 'new' });

    // Getting the second document
    const results = await client.getDocuments('test', {
      filter: { id: 2 }
    });

    // Getting it differently
    const doc2 = await client.getDocumentById('test', 2)); 

    // Adding more documents
    for(let i = 10; i <= 20; i++) {
      await client.addDocument('test', { id: i, x: i });
    }

    // Getting the documents that meet all the conditions
    const results2 = await client.getDocuments('test', {
      filter: { id: { $gt: 15 } },
      sort: [['x', 'desc']],
      limit: 2,
      offset: 1,
      fields: ['id']
    });

    // Deleting documents with id > 15
    await client.deleteDocuments('test', {
      filter: { id: { $gt: 15 } }
    });
  }
  catch(err) {
    console.error(err.stack);
    process.exit(1);
  }
})();

Customers can not create a collection. The network sets the structure itself, and users only work with documents. Collections can also be described declaratively, via node options:

const node = new Node({
  port: 4000,
  hostname: 'localhost',
  collections: {
    test: { limit: 10000, pk: 'id' }
  }
});

Main collection parameters:

  • pk - a primary key field. You can omit this if it is not required. If this field is specified, a uuid hash is created by default. But you can pass any integer or string.
  • limit - maximum number of documents per node
  • queue - queue mode: if enabled, when the limit is reached, certain documents are deleted to record new ones
  • limitationOrder - if the limit and queue are enabled, then you can specify sorting rules to determine which documents to delete. By default, those that have not been used for a long time are deleted.
  • schema - document field structure
  • defaults - default values for document fields
  • hooks - document field hooks
  • preferredDuplicates - you can specify the preferred number of duplicate documents in the network

The structure of the collection fields (schema) can be described as:

{ 
  type: 'object',
  props: {
    count: 'number',
    title: 'string',
    description: { type: 'string' },
    priority: {
      type: 'number',
      value: val => val >= -1 && val <= 1
    },
    goods: {
      type: 'array',
      items: {
        type: 'object',
        props: {
          title: 'string',
          isAble: 'boolean’
        }
      }
    }
  }
}

All rules can be found in the function utils.validateSchema() in https://github.com/ortexx/spreadable/blob/master/src/utils.js

Default values and hooks can be like that:

{ 
  defaults: {
    date: Date.now
    priority: 0
    'nested.prop': (key, doc) => Date.now() - doc.date
  },
  hooks: {
    priority: (val, key, doc, prevDoc) => prevDoc? prevDoc.priority + 1: val
  }
}

Main features of the library:

  • Working on the CRUD principle
  • Storing all Javascript data types that can be serialized, including nested ones.
  • Data can be added to storage through any node.
  • Data can be duplicated for greater reliability.
  • Queries may contain nested filters

Isomorphism

The client is written in javascript and is isomorphic, it can be used directly from your browser. You can upload a file https://github.com/ortexx/metastocle/blob/master/dist/metastocle.client.js as a script and get access to window.ClientMetastocle or import via the build system etc

Client Api

  • async Client.prototype.addDocument() - adding a document to the collection
  • async Client.prototype.getDocuments() - getting documents from the collection according some instructions
  • async Client.prototype.getDocumentsСount() - getting the number of documents in the collection
  • async Client.prototype.getDocumentByPk() - getting a document from a collection using the primary key
  • async Client.prototype.updateDocuments() - updating documents in the collection according to some instructions
  • async Client.prototype.deleteDocuments() - deleting documents from the collection according to some instructions

Basic actions (instructions)

.filter - data filtering, example:

{ 
  a: { $lt: 1 },
  $and: [
    { x: 1 },
    { y: { $gt: 2 } },
    { 
      $or: [
        { z: 1 },
        { "b.c": 2 }
      ] 
    }
  ]
}

.sort - data sorting, example:

{ sort: [['x', 'asc'], ['y.z', 'desc']] }

.limit - amount of data

.offset - starting position for data selection

.fields - the required fields

All instructions and possible values are described in more detail in the readme.

Using the command line

The library can be used via the command line. To do this you need to install it globally: npm i -g metastocle --unsafe-perm=true --allow-root. After that, you can run the necessary actions from the project directory.

For example, metastocle -a getDocumentByPk -o test -p 1 -c ./config.js, to get a document with the primary key 1 from the collection "test". All actions can be found in https://github.com/ortexx/metastocle/blob/master/bin/actions.js

Limitations

  • All data is first stored in memory, and later written to a file, at certain intervals, and when exiting the process. Therefore, first, you need to have enough RAM, and second, keep in mind that you will not be able to run multiple processes to work with the same database.
  • Sharding at the level of the entire network has not been implemented very effectively yet. Priority is given to duplication, because the size of the network is unstable: nodes can be disconnected, connected, and so on at any time. So if you want to get a large amount of data from the network, keep in mind that all this will be collected via the HTTP protocol, without much optimization.

I came to the choice of the stack and these restrictions deliberately, because there was no goal and possibility to create a full-fledged DBMS.

Although the library is still a bit crude in terms of optimizing data requests, but if you follow certain rules, everything is fine:

  • You need to narrow the selection of data as much as possible, and try to organize everything so that you get documents by keys, or by some other fields, but filtered to the optimal size.
  • If you still need to pull a lot of data, you will have to limit each server, based on their optimal size, to transfer them over the network. For example, if 10,000 documents in a collection weigh 100 KB in compressed form, then by limiting the collection at each node to this value, we will get everything at an acceptable speed.

For any questions, please contact: