Refactor to support column families, direct rocksdb access

Expose low-level helpers, fix dialyzer warnings

WIP column families and mrdb API

Basic functionality in place

started adding documentation

remove doc/ from .gitignore

add doc/* files

recognize pre-existing tabs at startup

wip: most of the functionality in place (not yet merge ops)

wip: adding transaction support

wip: add transaction test case (currently dumps core)

First draft, mnesia plugin user guide

Fix note formatting

WIP working on indexing

Index iterators, dialyzer, xref fixes

open db with optimistic transactions

Use rocksdb-1.7.0

Use seanhinde rocksdb patch, enable rollback

Call the right transaction_get() function

WIP add 'snap_tx' activity type

tx restart using mrdb_mutex

Fix test suite sync bugs

WIP instrumented for debugging

WIP working on migration test case

Add migration test suite

Migration works, subscribe to schema changes

WIP fix batch handling

Manage separate batches per db_ref

Add mrdb:fold/3

Add some docs, erlang_ls config

Use seanhinde's rocksdb vsn
This commit is contained in:
Ulf Wiger
2020-12-22 10:36:24 +01:00
parent c0ce3afe39
commit d5dafb5b7e
41 changed files with 8688 additions and 1571 deletions
+282
View File
@@ -0,0 +1,282 @@
# Mnesia Rocksdb - Rocksdb backend plugin for Mnesia #
Copyright (c) 2013-21 Klarna AB
__Authors:__ Ulf Wiger ([`ulf@wiger.net`](mailto:ulf@wiger.net)).
The Mnesia DBMS, part of Erlang/OTP, supports 'backend plugins', making
it possible to utilize more capable key-value stores than the `dets`
module (limited to 2 GB per table). Unfortunately, this support is
undocumented. Below, some informal documentation for the plugin system
is provided.
### <a name="Table_of_Contents">Table of Contents</a> ###
1. [Usage](#Usage)
1. [Prerequisites](#Prerequisites)
1. [Getting started](#Getting_started)
1. [Special features](#Special_features)
1. [Customization](#Customization)
1. [Handling of errors in write operations](#Handling_of_errors_in_write_operations)
1. [Caveats](#Caveats)
1. [Mnesia backend plugins](#Mnesia_backend_plugins)
1. [Background](#Background)
1. [Design](#Design)
1. [Mnesia index plugins](#Mnesia_index_plugins)
1. [Rocksdb](#Rocksdb)
### <a name="Usage">Usage</a> ###
#### <a name="Prerequisites">Prerequisites</a> ####
* rocksdb (included as dependency)
* sext (included as dependency)
* Erlang/OTP 21.0 or newer (https://github.com/erlang/otp)
#### <a name="Getting_started">Getting started</a> ####
Call `mnesia_rocksdb:register()` immediately after
starting mnesia.
Put `{rocksdb_copies, [node()]}` into the table definitions of
tables you want to be in RocksDB.
#### <a name="Special_features">Special features</a> ####
RocksDB tables support efficient selects on _prefix keys_.
The backend uses the `sext` module (see
[`https://github.com/uwiger/sext`](https://github.com/uwiger/sext)) for mapping between Erlang terms and the
binary data stored in the tables. This provides two useful properties:
* The records are stored in the Erlang term order of their keys.
* A prefix of a composite key is ordered just before any key for which
it is a prefix. For example, `{x, '_'}` is a prefix for keys `{x, a}`,`{x, b}` and so on.
This means that a prefix key identifies the start of the sequence of
entries whose keys match the prefix. The backend uses this to optimize
selects on prefix keys.
### Customization
RocksDB supports a number of customization options. These can be specified
by providing a `{Key, Value}` list named `rocksdb_opts` under `user_properties`,
for example:
```
mnesia:create_table(foo, [{rocksdb_copies, [node()]},
...
{user_properties,
[{rocksdb_opts, [{max_open_files, 1024}]}]
}])
```
Consult the [RocksDB documentation](https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning)
for information on configuration parameters. Also see the section below on handling write errors.
The default configuration for tables in `mnesia_rocksdb` is:
```
default_open_opts() ->
[ {create_if_missing, true}
, {cache_size,
list_to_integer(get_env_default("ROCKSDB_CACHE_SIZE", "32212254"))}
, {block_size, 1024}
, {max_open_files, 100}
, {write_buffer_size,
list_to_integer(get_env_default(
"ROCKSDB_WRITE_BUFFER_SIZE", "4194304"))}
, {compression,
list_to_atom(get_env_default("ROCKSDB_COMPRESSION", "true"))}
, {use_bloomfilter, true}
].
```
It is also possible, for larger databases, to produce a tuning parameter file.
This is experimental, and mostly copied from `mnesia_leveldb`. Consult the
source code in `mnesia_rocksdb_tuning.erl` and `mnesia_rocksdb_params.erl`.
Contributions are welcome.
#### <a name="Caveats">Caveats</a> ####
Avoid placing `bag` tables in RocksDB. Although they work, each write
requires additional reads, causing substantial runtime overheads. There
are better ways to represent and process bag data (see above about
_prefix keys_).
The `mnesia:table_info(T, size)` call always returns zero for RocksDB
tables. RocksDB itself does not track the number of elements in a table, and
although it is possible to make the `mnesia_rocksdb` backend maintain a size
counter, it incurs a high runtime overhead for writes and deletes since it
forces them to first do a read to check the existence of the key. If you
depend on having an up to date size count at all times, you need to maintain
it yourself. If you only need the size occasionally, you may traverse the
table to count the elements.
### <a name="Mnesia_backend_plugins">Mnesia backend plugins</a> ###
#### <a name="Background">Background</a> ####
Mnesia was initially designed to be a RAM-only DBMS, and Erlang's
`ets` tables were developed for this purpose. In order to support
persistence, e.g. for configuration data, a disk-based version of `ets`
(called `dets`) was created. The `dets` API mimicks the `ets` API,
and `dets` is quite convenient and fast for (nowadays) small datasets.
However, using a 32-bit bucket system, it is limited to 2GB of data.
It also doesn't support ordered sets. When used in Mnesia, dets-based
tables are called `disc_only_copies`.
To circumvent these limitations, another table type, called `disc_copies`
was added. This is a combination of `ets` and `disk_log`, where Mnesia
periodically snapshots the `ets` data to a log file on disk, and meanwhile
maintains a log of updates, which can be applied at startup. These tables
are quite performant (especially on read access), but all data is kept in
RAM, which can become a serious limitation.
A backend plugin system was proposed by Ulf Wiger in 2016, and further
developed with Klarna's support, to finally become included in OTP 19.
Klarna uses a LevelDb backend, but Aeternity, in 2017, instead chose
to implement a Rocksdb backend plugin.
### <a name="Design">Design</a> ###
As backend plugins were added on a long-since legacy-stable Mnesia,
they had to conform to the existing code structure. For this reason,
the plugin callbacks hook into the already present low-level access
API in the `mnesia_lib` module. As a consequence, backend plugins have
the same access semantics and granularity as `ets` and `dets`. This
isn't much of a disadvantage for key-value stores like LevelDb and RocksDB,
but a more serious issue is that the update part of this API is called
on _after_ the point of no return. That is, Mnesia does not expect
these updates to fail, and has no recourse if they do. As an aside,
this could also happen if a `disc_only_copies` table exceeds the 2 GB
limit (mnesia will not check it, and `dets` will not complain, but simply
drop the update.)
### <a name="Mnesia_index_plugins">Mnesia index plugins</a> ###
When adding support for backend plugins, index plugins were also added. Unfortunately, they remain undocumented.
An index plugin can be added in one of two ways:
1. When creating a schema, provide `{index_plugins, [{Name, Module, Function}]}` options.
1. Call the function `mnesia_schema:add_index_plugin(Name, Module, Function)`
`Name` must be an atom wrapped as a 1-tuple, e.g. `{words}`.
The plugin callback is called as `Module:Function(Table, Pos, Obj)`, where `Pos=={words}` in
our example. It returns a list of index terms.
<strong>Example</strong>
Given the following index plugin implementation:
```
-module(words).
-export([words_f/3]).
words_f(_,_,Obj) when is_tuple(Obj) ->
words_(tuple_to_list(Obj)).
words_(Str) when is_binary(Str) ->
string:lexemes(Str, [$\s, $\n, [$\r,$\n]]);
words_(L) when is_list(L) ->
lists:flatmap(fun words_/1, L);
words_(_) ->
[].
```
We can register the plugin and use it in table definitions:
```
Eshell V12.1.3 (abort with ^G)
1> mnesia:start().
ok
2> mnesia_schema:add_index_plugin({words}, words, words_f).
{atomic,ok}
3> mnesia:create_table(i, [{index, [{words}]}]).
{atomic,ok}
```
Note that in this case, we had neither a backend plugin, nor even a persistent schema.
Index plugins can be used with all table types. The registered indexing function (arity 3) must exist
as an exported function along the node's code path.
To see what happens when we insert an object, we can turn on call trace.
```
4> dbg:tracer().
{ok,<0.108.0>}
5> dbg:tp(words, x).
{ok,[{matched,nonode@nohost,3},{saved,x}]}
6> dbg:p(all,[c]).
{ok,[{matched,nonode@nohost,60}]}
7> mnesia:dirty_write({i,<<"one two">>, [<<"three">>, <<"four">>]}).
(<0.84.0>) call words:words_f(i,{words},{i,<<"one two">>,[<<"three">>,<<"four">>]})
(<0.84.0>) returned from words:words_f/3 -> [<<"one">>,<<"two">>,<<"three">>,
<<"four">>]
(<0.84.0>) call words:words_f(i,{words},{i,<<"one two">>,[<<"three">>,<<"four">>]})
(<0.84.0>) returned from words:words_f/3 -> [<<"one">>,<<"two">>,<<"three">>,
<<"four">>]
ok
8> dbg:ctp('_'), dbg:stop().
ok
9> mnesia:dirty_index_read(i, <<"one">>, {words}).
[{i,<<"one two">>,[<<"three">>,<<"four">>]}]
```
(The fact that the indexing function is called twice, seems like a performance bug.)
We can observe that the indexing callback is able to operate on the whole object.
It needs to be side-effect free and efficient, since it will be called at least once for each update
(if an old object exists in the table, the indexing function will be called on it too, before it is
replaced by the new object.)
### <a name="Rocksdb">Rocksdb</a> ###
### <a name="Usage">Usage</a> ###
## Modules ##
<table width="100%" border="0" summary="list of modules">
<tr><td><a href="mnesia_rocksdb.md" class="module">mnesia_rocksdb</a></td></tr>
<tr><td><a href="mnesia_rocksdb_admin.md" class="module">mnesia_rocksdb_admin</a></td></tr>
<tr><td><a href="mnesia_rocksdb_app.md" class="module">mnesia_rocksdb_app</a></td></tr>
<tr><td><a href="mnesia_rocksdb_lib.md" class="module">mnesia_rocksdb_lib</a></td></tr>
<tr><td><a href="mnesia_rocksdb_params.md" class="module">mnesia_rocksdb_params</a></td></tr>
<tr><td><a href="mnesia_rocksdb_sup.md" class="module">mnesia_rocksdb_sup</a></td></tr>
<tr><td><a href="mnesia_rocksdb_tuning.md" class="module">mnesia_rocksdb_tuning</a></td></tr>
<tr><td><a href="mrdb.md" class="module">mrdb</a></td></tr>
<tr><td><a href="mrdb_index.md" class="module">mrdb_index</a></td></tr>
<tr><td><a href="mrdb_mutex.md" class="module">mrdb_mutex</a></td></tr>
<tr><td><a href="mrdb_select.md" class="module">mrdb_select</a></td></tr></table>
+5
View File
@@ -0,0 +1,5 @@
%% encoding: UTF-8
{application,mnesia_rocksdb}.
{modules,[mnesia_rocksdb,mnesia_rocksdb_admin,mnesia_rocksdb_app,
mnesia_rocksdb_lib,mnesia_rocksdb_params,mnesia_rocksdb_sup,
mnesia_rocksdb_tuning,mrdb,mrdb_index,mrdb_mutex,mrdb_select]}.
BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

File diff suppressed because one or more lines are too long
+326
View File
@@ -0,0 +1,326 @@
# Module mnesia_rocksdb_admin #
* [Data Types](#types)
* [Function Index](#index)
* [Function Details](#functions)
__Behaviours:__ [`gen_server`](gen_server.md).
<a name="types"></a>
## Data Types ##
### <a name="type-alias">alias()</a> ###
<pre><code>
alias() = atom()
</code></pre>
### <a name="type-backend">backend()</a> ###
<pre><code>
backend() = #{db_ref =&gt; <a href="#type-db_ref">db_ref()</a>, cf_info =&gt; #{<a href="#type-table">table()</a> =&gt; <a href="#type-cf">cf()</a>}}
</code></pre>
### <a name="type-cf">cf()</a> ###
<pre><code>
cf() = <a href="http://www.erlang.org/doc/man/mrdb.html#type-db_ref">mrdb:db_ref()</a>
</code></pre>
### <a name="type-db_ref">db_ref()</a> ###
<pre><code>
db_ref() = <a href="/home/uwiger/ae/mnesia_rocksdb/_build/default/lib/rocksdb/doc/rocksdb.md#type-db_handle">rocksdb:db_handle()</a>
</code></pre>
### <a name="type-gen_server_noreply">gen_server_noreply()</a> ###
<pre><code>
gen_server_noreply() = {noreply, <a href="#type-st">st()</a>} | {stop, <a href="#type-reason">reason()</a>, <a href="#type-st">st()</a>}
</code></pre>
### <a name="type-gen_server_reply">gen_server_reply()</a> ###
<pre><code>
gen_server_reply() = {reply, <a href="#type-reply">reply()</a>, <a href="#type-st">st()</a>} | {stop, <a href="#type-reason">reason()</a>, <a href="#type-reply">reply()</a>, <a href="#type-st">st()</a>}
</code></pre>
### <a name="type-properties">properties()</a> ###
<pre><code>
properties() = [{atom(), any()}]
</code></pre>
### <a name="type-reason">reason()</a> ###
<pre><code>
reason() = any()
</code></pre>
### <a name="type-reply">reply()</a> ###
<pre><code>
reply() = any()
</code></pre>
### <a name="type-req">req()</a> ###
<pre><code>
req() = {create_table, <a href="#type-table">table()</a>, <a href="#type-properties">properties()</a>} | {delete_table, <a href="#type-table">table()</a>} | {load_table, <a href="#type-table">table()</a>} | {related_resources, <a href="#type-table">table()</a>} | {get_ref, <a href="#type-table">table()</a>} | {add_aliases, [<a href="#type-alias">alias()</a>]} | {write_table_property, <a href="#type-tabname">tabname()</a>, tuple()} | {remove_aliases, [<a href="#type-alias">alias()</a>]} | {migrate, [{<a href="#type-tabname">tabname()</a>, map()}]} | {prep_close, <a href="#type-table">table()</a>} | {close_table, <a href="#type-table">table()</a>}
</code></pre>
### <a name="type-st">st()</a> ###
<pre><code>
st() = #st{backends = #{<a href="#type-alias">alias()</a> =&gt; <a href="#type-backend">backend()</a>}, standalone = #{{<a href="#type-alias">alias()</a>, <a href="#type-table">table()</a>} =&gt; <a href="#type-cf">cf()</a>}, default_opts = [{atom(), term()}]}
</code></pre>
### <a name="type-table">table()</a> ###
<pre><code>
table() = <a href="#type-tabname">tabname()</a> | {admin, <a href="#type-alias">alias()</a>} | {<a href="#type-tabname">tabname()</a>, index, any()} | {<a href="#type-tabname">tabname()</a>, retainer, any()}
</code></pre>
### <a name="type-tabname">tabname()</a> ###
<pre><code>
tabname() = atom()
</code></pre>
<a name="index"></a>
## Function Index ##
<table width="100%" border="1" cellspacing="0" cellpadding="2" summary="function index"><tr><td valign="top"><a href="#add_aliases-1">add_aliases/1</a></td><td></td></tr><tr><td valign="top"><a href="#close_table-2">close_table/2</a></td><td></td></tr><tr><td valign="top"><a href="#code_change-3">code_change/3</a></td><td></td></tr><tr><td valign="top"><a href="#create_table-3">create_table/3</a></td><td></td></tr><tr><td valign="top"><a href="#delete_table-2">delete_table/2</a></td><td></td></tr><tr><td valign="top"><a href="#ensure_started-0">ensure_started/0</a></td><td></td></tr><tr><td valign="top"><a href="#get_ref-1">get_ref/1</a></td><td></td></tr><tr><td valign="top"><a href="#get_ref-2">get_ref/2</a></td><td></td></tr><tr><td valign="top"><a href="#handle_call-3">handle_call/3</a></td><td></td></tr><tr><td valign="top"><a href="#handle_cast-2">handle_cast/2</a></td><td></td></tr><tr><td valign="top"><a href="#handle_info-2">handle_info/2</a></td><td></td></tr><tr><td valign="top"><a href="#init-1">init/1</a></td><td></td></tr><tr><td valign="top"><a href="#load_table-2">load_table/2</a></td><td></td></tr><tr><td valign="top"><a href="#meta-0">meta/0</a></td><td></td></tr><tr><td valign="top"><a href="#migrate_standalone-2">migrate_standalone/2</a></td><td></td></tr><tr><td valign="top"><a href="#prep_close-2">prep_close/2</a></td><td></td></tr><tr><td valign="top"><a href="#read_info-1">read_info/1</a></td><td></td></tr><tr><td valign="top"><a href="#read_info-2">read_info/2</a></td><td></td></tr><tr><td valign="top"><a href="#read_info-4">read_info/4</a></td><td></td></tr><tr><td valign="top"><a href="#related_resources-2">related_resources/2</a></td><td></td></tr><tr><td valign="top"><a href="#remove_aliases-1">remove_aliases/1</a></td><td></td></tr><tr><td valign="top"><a href="#request_ref-2">request_ref/2</a></td><td></td></tr><tr><td valign="top"><a href="#start_link-0">start_link/0</a></td><td></td></tr><tr><td valign="top"><a href="#terminate-2">terminate/2</a></td><td></td></tr><tr><td valign="top"><a href="#write_info-4">write_info/4</a></td><td></td></tr><tr><td valign="top"><a href="#write_table_property-3">write_table_property/3</a></td><td></td></tr></table>
<a name="functions"></a>
## Function Details ##
<a name="add_aliases-1"></a>
### add_aliases/1 ###
`add_aliases(Aliases) -> any()`
<a name="close_table-2"></a>
### close_table/2 ###
`close_table(Alias, Name) -> any()`
<a name="code_change-3"></a>
### code_change/3 ###
`code_change(FromVsn, St, Extra) -> any()`
<a name="create_table-3"></a>
### create_table/3 ###
`create_table(Alias, Name, Props) -> any()`
<a name="delete_table-2"></a>
### delete_table/2 ###
<pre><code>
delete_table(Alias::<a href="#type-alias">alias()</a>, Name::<a href="#type-tabname">tabname()</a>) -&gt; ok
</code></pre>
<br />
<a name="ensure_started-0"></a>
### ensure_started/0 ###
<pre><code>
ensure_started() -&gt; ok
</code></pre>
<br />
<a name="get_ref-1"></a>
### get_ref/1 ###
`get_ref(Name) -> any()`
<a name="get_ref-2"></a>
### get_ref/2 ###
`get_ref(Name, Default) -> any()`
<a name="handle_call-3"></a>
### handle_call/3 ###
<pre><code>
handle_call(Req::{<a href="#type-alias">alias()</a>, <a href="#type-req">req()</a>}, From::any(), St::<a href="#type-st">st()</a>) -&gt; <a href="#type-gen_server_reply">gen_server_reply()</a>
</code></pre>
<br />
<a name="handle_cast-2"></a>
### handle_cast/2 ###
<pre><code>
handle_cast(Msg::any(), St::<a href="#type-st">st()</a>) -&gt; <a href="#type-gen_server_noreply">gen_server_noreply()</a>
</code></pre>
<br />
<a name="handle_info-2"></a>
### handle_info/2 ###
<pre><code>
handle_info(Msg::any(), St::<a href="#type-st">st()</a>) -&gt; <a href="#type-gen_server_noreply">gen_server_noreply()</a>
</code></pre>
<br />
<a name="init-1"></a>
### init/1 ###
`init(X1) -> any()`
<a name="load_table-2"></a>
### load_table/2 ###
`load_table(Alias, Name) -> any()`
<a name="meta-0"></a>
### meta/0 ###
`meta() -> any()`
<a name="migrate_standalone-2"></a>
### migrate_standalone/2 ###
`migrate_standalone(Alias, Tabs) -> any()`
<a name="prep_close-2"></a>
### prep_close/2 ###
`prep_close(Alias, Tab) -> any()`
<a name="read_info-1"></a>
### read_info/1 ###
`read_info(TRec) -> any()`
<a name="read_info-2"></a>
### read_info/2 ###
`read_info(Alias, Tab) -> any()`
<a name="read_info-4"></a>
### read_info/4 ###
`read_info(Alias, Tab, K, Default) -> any()`
<a name="related_resources-2"></a>
### related_resources/2 ###
`related_resources(Alias, Name) -> any()`
<a name="remove_aliases-1"></a>
### remove_aliases/1 ###
`remove_aliases(Aliases) -> any()`
<a name="request_ref-2"></a>
### request_ref/2 ###
`request_ref(Alias, Name) -> any()`
<a name="start_link-0"></a>
### start_link/0 ###
`start_link() -> any()`
<a name="terminate-2"></a>
### terminate/2 ###
`terminate(X1, St) -> any()`
<a name="write_info-4"></a>
### write_info/4 ###
`write_info(Alias, Tab, K, V) -> any()`
<a name="write_table_property-3"></a>
### write_table_property/3 ###
`write_table_property(Alias, Tab, Prop) -> any()`
+32
View File
@@ -0,0 +1,32 @@
# Module mnesia_rocksdb_app #
* [Function Index](#index)
* [Function Details](#functions)
__Behaviours:__ [`application`](application.md).
<a name="index"></a>
## Function Index ##
<table width="100%" border="1" cellspacing="0" cellpadding="2" summary="function index"><tr><td valign="top"><a href="#start-2">start/2</a></td><td></td></tr><tr><td valign="top"><a href="#stop-1">stop/1</a></td><td></td></tr></table>
<a name="functions"></a>
## Function Details ##
<a name="start-2"></a>
### start/2 ###
`start(StartType, StartArgs) -> any()`
<a name="stop-1"></a>
### stop/1 ###
`stop(State) -> any()`
+168
View File
@@ -0,0 +1,168 @@
# Module mnesia_rocksdb_lib #
* [Description](#description)
* [Function Index](#index)
* [Function Details](#functions)
RocksDB update wrappers, in separate module for easy tracing and mocking.
<a name="description"></a>
## Description ##
<a name="index"></a>
## Function Index ##
<table width="100%" border="1" cellspacing="0" cellpadding="2" summary="function index"><tr><td valign="top"><a href="#check_encoding-2">check_encoding/2</a></td><td></td></tr><tr><td valign="top"><a href="#create_mountpoint-1">create_mountpoint/1</a></td><td></td></tr><tr><td valign="top"><a href="#data_mountpoint-1">data_mountpoint/1</a></td><td></td></tr><tr><td valign="top"><a href="#decode-2">decode/2</a></td><td></td></tr><tr><td valign="top"><a href="#decode_key-1">decode_key/1</a></td><td></td></tr><tr><td valign="top"><a href="#decode_key-2">decode_key/2</a></td><td></td></tr><tr><td valign="top"><a href="#decode_val-1">decode_val/1</a></td><td></td></tr><tr><td valign="top"><a href="#decode_val-3">decode_val/3</a></td><td></td></tr><tr><td valign="top"><a href="#default_encoding-3">default_encoding/3</a></td><td></td></tr><tr><td valign="top"><a href="#delete-3">delete/3</a></td><td></td></tr><tr><td valign="top"><a href="#encode-2">encode/2</a></td><td></td></tr><tr><td valign="top"><a href="#encode_key-1">encode_key/1</a></td><td></td></tr><tr><td valign="top"><a href="#encode_key-2">encode_key/2</a></td><td></td></tr><tr><td valign="top"><a href="#encode_val-1">encode_val/1</a></td><td></td></tr><tr><td valign="top"><a href="#encode_val-2">encode_val/2</a></td><td></td></tr><tr><td valign="top"><a href="#keypos-1">keypos/1</a></td><td></td></tr><tr><td valign="top"><a href="#open_rocksdb-3">open_rocksdb/3</a></td><td></td></tr><tr><td valign="top"><a href="#put-4">put/4</a></td><td></td></tr><tr><td valign="top"><a href="#tabname-1">tabname/1</a></td><td></td></tr><tr><td valign="top"><a href="#valid_key_type-2">valid_key_type/2</a></td><td></td></tr><tr><td valign="top"><a href="#valid_obj_type-2">valid_obj_type/2</a></td><td></td></tr><tr><td valign="top"><a href="#write-3">write/3</a></td><td></td></tr></table>
<a name="functions"></a>
## Function Details ##
<a name="check_encoding-2"></a>
### check_encoding/2 ###
`check_encoding(Encoding, Attributes) -> any()`
<a name="create_mountpoint-1"></a>
### create_mountpoint/1 ###
`create_mountpoint(Tab) -> any()`
<a name="data_mountpoint-1"></a>
### data_mountpoint/1 ###
`data_mountpoint(Tab) -> any()`
<a name="decode-2"></a>
### decode/2 ###
`decode(Val, X2) -> any()`
<a name="decode_key-1"></a>
### decode_key/1 ###
<pre><code>
decode_key(CodedKey::binary()) -&gt; any()
</code></pre>
<br />
<a name="decode_key-2"></a>
### decode_key/2 ###
`decode_key(CodedKey, Enc) -> any()`
<a name="decode_val-1"></a>
### decode_val/1 ###
<pre><code>
decode_val(CodedVal::binary()) -&gt; any()
</code></pre>
<br />
<a name="decode_val-3"></a>
### decode_val/3 ###
`decode_val(CodedVal, K, Ref) -> any()`
<a name="default_encoding-3"></a>
### default_encoding/3 ###
`default_encoding(X1, Type, As) -> any()`
<a name="delete-3"></a>
### delete/3 ###
`delete(Ref, K, Opts) -> any()`
<a name="encode-2"></a>
### encode/2 ###
`encode(Value, X2) -> any()`
<a name="encode_key-1"></a>
### encode_key/1 ###
<pre><code>
encode_key(Key::any()) -&gt; binary()
</code></pre>
<br />
<a name="encode_key-2"></a>
### encode_key/2 ###
`encode_key(Key, X2) -> any()`
<a name="encode_val-1"></a>
### encode_val/1 ###
<pre><code>
encode_val(Val::any()) -&gt; binary()
</code></pre>
<br />
<a name="encode_val-2"></a>
### encode_val/2 ###
`encode_val(Val, Enc) -> any()`
<a name="keypos-1"></a>
### keypos/1 ###
`keypos(Tab) -> any()`
<a name="open_rocksdb-3"></a>
### open_rocksdb/3 ###
`open_rocksdb(MPd, RdbOpts, CFs) -> any()`
<a name="put-4"></a>
### put/4 ###
`put(Ref, K, V, Opts) -> any()`
<a name="tabname-1"></a>
### tabname/1 ###
`tabname(Tab) -> any()`
<a name="valid_key_type-2"></a>
### valid_key_type/2 ###
`valid_key_type(X1, Key) -> any()`
<a name="valid_obj_type-2"></a>
### valid_obj_type/2 ###
`valid_obj_type(X1, Obj) -> any()`
<a name="write-3"></a>
### write/3 ###
`write(X1, L, Opts) -> any()`
+80
View File
@@ -0,0 +1,80 @@
# Module mnesia_rocksdb_params #
* [Function Index](#index)
* [Function Details](#functions)
__Behaviours:__ [`gen_server`](gen_server.md).
<a name="index"></a>
## Function Index ##
<table width="100%" border="1" cellspacing="0" cellpadding="2" summary="function index"><tr><td valign="top"><a href="#code_change-3">code_change/3</a></td><td></td></tr><tr><td valign="top"><a href="#delete-1">delete/1</a></td><td></td></tr><tr><td valign="top"><a href="#handle_call-3">handle_call/3</a></td><td></td></tr><tr><td valign="top"><a href="#handle_cast-2">handle_cast/2</a></td><td></td></tr><tr><td valign="top"><a href="#handle_info-2">handle_info/2</a></td><td></td></tr><tr><td valign="top"><a href="#init-1">init/1</a></td><td></td></tr><tr><td valign="top"><a href="#lookup-2">lookup/2</a></td><td></td></tr><tr><td valign="top"><a href="#start_link-0">start_link/0</a></td><td></td></tr><tr><td valign="top"><a href="#store-2">store/2</a></td><td></td></tr><tr><td valign="top"><a href="#terminate-2">terminate/2</a></td><td></td></tr></table>
<a name="functions"></a>
## Function Details ##
<a name="code_change-3"></a>
### code_change/3 ###
`code_change(X1, S, X3) -> any()`
<a name="delete-1"></a>
### delete/1 ###
`delete(Tab) -> any()`
<a name="handle_call-3"></a>
### handle_call/3 ###
`handle_call(X1, X2, S) -> any()`
<a name="handle_cast-2"></a>
### handle_cast/2 ###
`handle_cast(X1, S) -> any()`
<a name="handle_info-2"></a>
### handle_info/2 ###
`handle_info(X1, S) -> any()`
<a name="init-1"></a>
### init/1 ###
`init(X1) -> any()`
<a name="lookup-2"></a>
### lookup/2 ###
`lookup(Tab, Default) -> any()`
<a name="start_link-0"></a>
### start_link/0 ###
`start_link() -> any()`
<a name="store-2"></a>
### store/2 ###
`store(Tab, Params) -> any()`
<a name="terminate-2"></a>
### terminate/2 ###
`terminate(X1, X2) -> any()`
+32
View File
@@ -0,0 +1,32 @@
# Module mnesia_rocksdb_sup #
* [Function Index](#index)
* [Function Details](#functions)
__Behaviours:__ [`supervisor`](supervisor.md).
<a name="index"></a>
## Function Index ##
<table width="100%" border="1" cellspacing="0" cellpadding="2" summary="function index"><tr><td valign="top"><a href="#init-1">init/1</a></td><td></td></tr><tr><td valign="top"><a href="#start_link-0">start_link/0</a></td><td></td></tr></table>
<a name="functions"></a>
## Function Details ##
<a name="init-1"></a>
### init/1 ###
`init(X1) -> any()`
<a name="start_link-0"></a>
### start_link/0 ###
`start_link() -> any()`
+126
View File
@@ -0,0 +1,126 @@
# Module mnesia_rocksdb_tuning #
* [Function Index](#index)
* [Function Details](#functions)
<a name="index"></a>
## Function Index ##
<table width="100%" border="1" cellspacing="0" cellpadding="2" summary="function index"><tr><td valign="top"><a href="#cache-1">cache/1</a></td><td></td></tr><tr><td valign="top"><a href="#calc_sizes-0">calc_sizes/0</a></td><td></td></tr><tr><td valign="top"><a href="#calc_sizes-1">calc_sizes/1</a></td><td></td></tr><tr><td valign="top"><a href="#count_rdb_tabs-0">count_rdb_tabs/0</a></td><td></td></tr><tr><td valign="top"><a href="#count_rdb_tabs-1">count_rdb_tabs/1</a></td><td></td></tr><tr><td valign="top"><a href="#default-1">default/1</a></td><td></td></tr><tr><td valign="top"><a href="#describe_env-0">describe_env/0</a></td><td></td></tr><tr><td valign="top"><a href="#get_avail_ram-0">get_avail_ram/0</a></td><td></td></tr><tr><td valign="top"><a href="#get_maxfiles-0">get_maxfiles/0</a></td><td></td></tr><tr><td valign="top"><a href="#get_maxfiles-1">get_maxfiles/1</a></td><td></td></tr><tr><td valign="top"><a href="#ideal_max_files-0">ideal_max_files/0</a></td><td></td></tr><tr><td valign="top"><a href="#ideal_max_files-1">ideal_max_files/1</a></td><td></td></tr><tr><td valign="top"><a href="#max_files-1">max_files/1</a></td><td></td></tr><tr><td valign="top"><a href="#rdb_indexes-0">rdb_indexes/0</a></td><td></td></tr><tr><td valign="top"><a href="#rdb_indexes-1">rdb_indexes/1</a></td><td></td></tr><tr><td valign="top"><a href="#rdb_tabs-0">rdb_tabs/0</a></td><td></td></tr><tr><td valign="top"><a href="#rdb_tabs-1">rdb_tabs/1</a></td><td></td></tr><tr><td valign="top"><a href="#write_buffer-1">write_buffer/1</a></td><td></td></tr></table>
<a name="functions"></a>
## Function Details ##
<a name="cache-1"></a>
### cache/1 ###
`cache(X1) -> any()`
<a name="calc_sizes-0"></a>
### calc_sizes/0 ###
`calc_sizes() -> any()`
<a name="calc_sizes-1"></a>
### calc_sizes/1 ###
`calc_sizes(D) -> any()`
<a name="count_rdb_tabs-0"></a>
### count_rdb_tabs/0 ###
`count_rdb_tabs() -> any()`
<a name="count_rdb_tabs-1"></a>
### count_rdb_tabs/1 ###
`count_rdb_tabs(Db) -> any()`
<a name="default-1"></a>
### default/1 ###
`default(X1) -> any()`
<a name="describe_env-0"></a>
### describe_env/0 ###
`describe_env() -> any()`
<a name="get_avail_ram-0"></a>
### get_avail_ram/0 ###
`get_avail_ram() -> any()`
<a name="get_maxfiles-0"></a>
### get_maxfiles/0 ###
`get_maxfiles() -> any()`
<a name="get_maxfiles-1"></a>
### get_maxfiles/1 ###
`get_maxfiles(X1) -> any()`
<a name="ideal_max_files-0"></a>
### ideal_max_files/0 ###
`ideal_max_files() -> any()`
<a name="ideal_max_files-1"></a>
### ideal_max_files/1 ###
`ideal_max_files(D) -> any()`
<a name="max_files-1"></a>
### max_files/1 ###
`max_files(X1) -> any()`
<a name="rdb_indexes-0"></a>
### rdb_indexes/0 ###
`rdb_indexes() -> any()`
<a name="rdb_indexes-1"></a>
### rdb_indexes/1 ###
`rdb_indexes(Db) -> any()`
<a name="rdb_tabs-0"></a>
### rdb_tabs/0 ###
`rdb_tabs() -> any()`
<a name="rdb_tabs-1"></a>
### rdb_tabs/1 ###
`rdb_tabs(Db) -> any()`
<a name="write_buffer-1"></a>
### write_buffer/1 ###
`write_buffer(X1) -> any()`
+943
View File
File diff suppressed because one or more lines are too long
+99
View File
@@ -0,0 +1,99 @@
# Module mrdb_index #
* [Data Types](#types)
* [Function Index](#index)
* [Function Details](#functions)
<a name="types"></a>
## Data Types ##
### <a name="type-index_value">index_value()</a> ###
<pre><code>
index_value() = any()
</code></pre>
### <a name="type-iterator_action">iterator_action()</a> ###
<pre><code>
iterator_action() = <a href="http://www.erlang.org/doc/man/mrdb.html#type-iterator_action">mrdb:iterator_action()</a>
</code></pre>
### <a name="type-ix_iterator">ix_iterator()</a> ###
<pre><code>
ix_iterator() = #mrdb_ix_iter{i = <a href="http://www.erlang.org/doc/man/mrdb.html#type-iterator">mrdb:iterator()</a>, type = set | bag, sub = <a href="http://www.erlang.org/doc/man/mrdb.html#type-ref">mrdb:ref()</a> | pid()}
</code></pre>
### <a name="type-object">object()</a> ###
<pre><code>
object() = tuple()
</code></pre>
<a name="index"></a>
## Function Index ##
<table width="100%" border="1" cellspacing="0" cellpadding="2" summary="function index"><tr><td valign="top"><a href="#iterator-2">iterator/2</a></td><td></td></tr><tr><td valign="top"><a href="#iterator_close-1">iterator_close/1</a></td><td></td></tr><tr><td valign="top"><a href="#iterator_move-2">iterator_move/2</a></td><td></td></tr><tr><td valign="top"><a href="#with_iterator-3">with_iterator/3</a></td><td></td></tr></table>
<a name="functions"></a>
## Function Details ##
<a name="iterator-2"></a>
### iterator/2 ###
<pre><code>
iterator(Tab::<a href="http://www.erlang.org/doc/man/mrdb.html#type-ref_or_tab">mrdb:ref_or_tab()</a>, IxPos::<a href="http://www.erlang.org/doc/man/mrdb.html#type-index_position">mrdb:index_position()</a>) -&gt; {ok, <a href="#type-ix_iterator">ix_iterator()</a>} | {error, term()}
</code></pre>
<br />
<a name="iterator_close-1"></a>
### iterator_close/1 ###
<pre><code>
iterator_close(Mrdb_ix_iter::<a href="#type-ix_iterator">ix_iterator()</a>) -&gt; ok
</code></pre>
<br />
<a name="iterator_move-2"></a>
### iterator_move/2 ###
<pre><code>
iterator_move(Mrdb_ix_iter::<a href="#type-ix_iterator">ix_iterator()</a>, Dir::<a href="#type-iterator_action">iterator_action()</a>) -&gt; {ok, <a href="#type-index_value">index_value()</a>, <a href="#type-object">object()</a>} | {error, term()}
</code></pre>
<br />
<a name="with_iterator-3"></a>
### with_iterator/3 ###
<pre><code>
with_iterator(Tab::<a href="http://www.erlang.org/doc/man/mrdb.html#type-ref_or_tab">mrdb:ref_or_tab()</a>, IxPos::<a href="http://www.erlang.org/doc/man/mrdb.html#type-index_position">mrdb:index_position()</a>, Fun::fun((<a href="#type-ix_iterator">ix_iterator()</a>) -&gt; Res)) -&gt; Res
</code></pre>
<br />
+30
View File
@@ -0,0 +1,30 @@
# Module mrdb_mutex #
* [Function Index](#index)
* [Function Details](#functions)
<a name="index"></a>
## Function Index ##
<table width="100%" border="1" cellspacing="0" cellpadding="2" summary="function index"><tr><td valign="top"><a href="#do-2">do/2</a></td><td></td></tr><tr><td valign="top"><a href="#ensure_tab-0">ensure_tab/0</a></td><td></td></tr></table>
<a name="functions"></a>
## Function Details ##
<a name="do-2"></a>
### do/2 ###
`do(Rsrc, F) -> any()`
<a name="ensure_tab-0"></a>
### ensure_tab/0 ###
`ensure_tab() -> any()`
+48
View File
@@ -0,0 +1,48 @@
# Module mrdb_select #
* [Function Index](#index)
* [Function Details](#functions)
<a name="index"></a>
## Function Index ##
<table width="100%" border="1" cellspacing="0" cellpadding="2" summary="function index"><tr><td valign="top"><a href="#fold-5">fold/5</a></td><td></td></tr><tr><td valign="top"><a href="#rdb_fold-5">rdb_fold/5</a></td><td></td></tr><tr><td valign="top"><a href="#select-1">select/1</a></td><td></td></tr><tr><td valign="top"><a href="#select-3">select/3</a></td><td></td></tr><tr><td valign="top"><a href="#select-4">select/4</a></td><td></td></tr></table>
<a name="functions"></a>
## Function Details ##
<a name="fold-5"></a>
### fold/5 ###
`fold(Ref, Fun, Acc, MS, Limit) -> any()`
<a name="rdb_fold-5"></a>
### rdb_fold/5 ###
`rdb_fold(Ref, Fun, Acc, Prefix, Limit) -> any()`
<a name="select-1"></a>
### select/1 ###
`select(Cont) -> any()`
<a name="select-3"></a>
### select/3 ###
`select(Ref, MS, Limit) -> any()`
<a name="select-4"></a>
### select/4 ###
`select(Ref, MS, AccKeys, Limit) -> any()`
+250
View File
@@ -0,0 +1,250 @@
@author Ulf Wiger <ulf@wiger.net>
@copyright 2013-21 Klarna AB
@title Mnesia Rocksdb - Rocksdb backend plugin for Mnesia
@doc
The Mnesia DBMS, part of Erlang/OTP, supports 'backend plugins', making
it possible to utilize more capable key-value stores than the `dets'
module (limited to 2 GB per table). Unfortunately, this support is
undocumented. Below, some informal documentation for the plugin system
is provided.
== Table of Contents ==
<ol>
<li>{@section Usage}</li>
<ol>
<li>{@section Prerequisites}</li>
<li>{@section Getting started}</li>
<li>{@section Special features}</li>
<li>{@section Customization}</li>
<li>{@section Handling of errors in write operations}</li>
<li>{@section Caveats}</li>
</ol>
<li>{@section Mnesia backend plugins}</li>
<ol>
<li>{@section Background}</li>
<li>{@section Design}</li>
</ol>
<li>{@section Mnesia index plugins}</li>
<li>{@section Rocksdb}</li>
</ol>
== Usage ==
=== Prerequisites ===
<ul>
<li>rocksdb (included as dependency)</li>
<li>sext (included as dependency)</li>
<li>Erlang/OTP 21.0 or newer (https://github.com/erlang/otp)</li>
</ul>
=== Getting started ===
Call `mnesia_rocksdb:register()' immediately after
starting mnesia.
Put `{rocksdb_copies, [node()]}' into the table definitions of
tables you want to be in RocksDB.
=== Special features ===
RocksDB tables support efficient selects on <em>prefix keys</em>.
The backend uses the `sext' module (see
[https://github.com/uwiger/sext]) for mapping between Erlang terms and the
binary data stored in the tables. This provides two useful properties:
<ul>
<li>The records are stored in the Erlang term order of their keys.</li>
<li>A prefix of a composite key is ordered just before any key for which
it is a prefix. For example, ``{x, '_'}'' is a prefix for keys `{x, a}',
`{x, b}' and so on.</li>
</ul>
This means that a prefix key identifies the start of the sequence of
entries whose keys match the prefix. The backend uses this to optimize
selects on prefix keys.
### Customization
RocksDB supports a number of customization options. These can be specified
by providing a `{Key, Value}' list named `rocksdb_opts' under `user_properties',
for example:
```
mnesia:create_table(foo, [{rocksdb_copies, [node()]},
...
{user_properties,
[{rocksdb_opts, [{max_open_files, 1024}]}]
}])
'''
Consult the <a href="https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning">RocksDB documentation</a>
for information on configuration parameters. Also see the section below on handling write errors.
The default configuration for tables in `mnesia_rocksdb' is:
```
default_open_opts() ->
[ {create_if_missing, true}
, {cache_size,
list_to_integer(get_env_default("ROCKSDB_CACHE_SIZE", "32212254"))}
, {block_size, 1024}
, {max_open_files, 100}
, {write_buffer_size,
list_to_integer(get_env_default(
"ROCKSDB_WRITE_BUFFER_SIZE", "4194304"))}
, {compression,
list_to_atom(get_env_default("ROCKSDB_COMPRESSION", "true"))}
, {use_bloomfilter, true}
].
'''
It is also possible, for larger databases, to produce a tuning parameter file.
This is experimental, and mostly copied from `mnesia_leveldb'. Consult the
source code in `mnesia_rocksdb_tuning.erl' and `mnesia_rocksdb_params.erl'.
Contributions are welcome.
=== Caveats ===
Avoid placing `bag' tables in RocksDB. Although they work, each write
requires additional reads, causing substantial runtime overheads. There
are better ways to represent and process bag data (see above about
<em>prefix keys</em>).
The `mnesia:table_info(T, size)' call always returns zero for RocksDB
tables. RocksDB itself does not track the number of elements in a table, and
although it is possible to make the `mnesia_rocksdb' backend maintain a size
counter, it incurs a high runtime overhead for writes and deletes since it
forces them to first do a read to check the existence of the key. If you
depend on having an up to date size count at all times, you need to maintain
it yourself. If you only need the size occasionally, you may traverse the
table to count the elements.
== Mnesia backend plugins ==
=== Background ===
Mnesia was initially designed to be a RAM-only DBMS, and Erlang's
`ets' tables were developed for this purpose. In order to support
persistence, e.g. for configuration data, a disk-based version of `ets'
(called `dets') was created. The `dets' API mimicks the `ets' API,
and `dets' is quite convenient and fast for (nowadays) small datasets.
However, using a 32-bit bucket system, it is limited to 2GB of data.
It also doesn't support ordered sets. When used in Mnesia, dets-based
tables are called `disc_only_copies'.
To circumvent these limitations, another table type, called `disc_copies'
was added. This is a combination of `ets' and `disk_log', where Mnesia
periodically snapshots the `ets' data to a log file on disk, and meanwhile
maintains a log of updates, which can be applied at startup. These tables
are quite performant (especially on read access), but all data is kept in
RAM, which can become a serious limitation.
A backend plugin system was proposed by Ulf Wiger in 2016, and further
developed with Klarna's support, to finally become included in OTP 19.
Klarna uses a LevelDb backend, but Aeternity, in 2017, instead chose
to implement a Rocksdb backend plugin.
== Design ==
As backend plugins were added on a long-since legacy-stable Mnesia,
they had to conform to the existing code structure. For this reason,
the plugin callbacks hook into the already present low-level access
API in the `mnesia_lib' module. As a consequence, backend plugins have
the same access semantics and granularity as `ets' and `dets'. This
isn't much of a disadvantage for key-value stores like LevelDb and RocksDB,
but a more serious issue is that the update part of this API is called
on <em>after</em> the point of no return. That is, Mnesia does not expect
these updates to fail, and has no recourse if they do. As an aside,
this could also happen if a `disc_only_copies' table exceeds the 2 GB
limit (mnesia will not check it, and `dets' will not complain, but simply
drop the update.)
== Mnesia index plugins ==
When adding support for backend plugins, index plugins were also added. Unfortunately, they remain undocumented.
An index plugin can be added in one of two ways:
<ol>
<li>When creating a schema, provide `{index_plugins, [{Name, Module, Function}]}' options.</li>
<li>Call the function `mnesia_schema:add_index_plugin(Name, Module, Function)'</li>
</ol>
`Name' must be an atom wrapped as a 1-tuple, e.g. `{words}'.
The plugin callback is called as `Module:Function(Table, Pos, Obj)', where `Pos=={words}' in
our example. It returns a list of index terms.
<strong>Example</strong>
Given the following index plugin implementation:
```
-module(words).
-export([words_f/3]).
words_f(_,_,Obj) when is_tuple(Obj) ->
words_(tuple_to_list(Obj)).
words_(Str) when is_binary(Str) ->
string:lexemes(Str, [$\s, $\n, [$\r,$\n]]);
words_(L) when is_list(L) ->
lists:flatmap(fun words_/1, L);
words_(_) ->
[].
'''
We can register the plugin and use it in table definitions:
```
Eshell V12.1.3 (abort with ^G)
1> mnesia:start().
ok
2> mnesia_schema:add_index_plugin({words}, words, words_f).
{atomic,ok}
3> mnesia:create_table(i, [{index, [{words}]}]).
{atomic,ok}
'''
Note that in this case, we had neither a backend plugin, nor even a persistent schema.
Index plugins can be used with all table types. The registered indexing function (arity 3) must exist
as an exported function along the node's code path.
To see what happens when we insert an object, we can turn on call trace.
```
4> dbg:tracer().
{ok,<0.108.0>}
5> dbg:tp(words, x).
{ok,[{matched,nonode@nohost,3},{saved,x}]}
6> dbg:p(all,[c]).
{ok,[{matched,nonode@nohost,60}]}
7> mnesia:dirty_write({i,<<"one two">>, [<<"three">>, <<"four">>]}).
(<0.84.0>) call words:words_f(i,{words},{i,<<"one two">>,[<<"three">>,<<"four">>]})
(<0.84.0>) returned from words:words_f/3 -> [<<"one">>,<<"two">>,<<"three">>,
<<"four">>]
(<0.84.0>) call words:words_f(i,{words},{i,<<"one two">>,[<<"three">>,<<"four">>]})
(<0.84.0>) returned from words:words_f/3 -> [<<"one">>,<<"two">>,<<"three">>,
<<"four">>]
ok
8> dbg:ctp('_'), dbg:stop().
ok
9> mnesia:dirty_index_read(i, <<"one">>, {words}).
[{i,<<"one two">>,[<<"three">>,<<"four">>]}]
'''
(The fact that the indexing function is called twice, seems like a performance bug.)
We can observe that the indexing callback is able to operate on the whole object.
It needs to be side-effect free and efficient, since it will be called at least once for each update
(if an old object exists in the table, the indexing function will be called on it too, before it is
replaced by the new object.)
== Rocksdb ==
== Usage ==
@end
+300
View File
@@ -0,0 +1,300 @@
# Using Mnesia Plugins #
Copyright (c) 2017-21 Aeternity Anstalt. All Rights Reserved.
__Authors:__ Ulf Wiger ([`ulf@wiger.net`](mailto:ulf@wiger.net)).
The Mnesia DBMS, part of Erlang/OTP, supports 'backend plugins', making
it possible to utilize more capable key-value stores than the `dets`
module (limited to 2 GB per table). Unfortunately, this support is
undocumented. Below, some informal documentation for the plugin system
is provided.
This user guide illustrates these concepts using `mnesia_rocksdb`
as an example.
We will deal with two types of plugin:
1. backend plugins
2. index plugins
A backend plugin is a module that implements a `mnesia_backend_type`
behavior. Each plugin can support any number of `aliases`, which
combined with the plugin module make up a `backend_type`.
When using `mnesia_rocksdb`, the default alias is `rocksdb_copies`,
and it is registered as a `{rocksdb_copies, mnesia_rocksdb}` pair.
Once registered, the alias can be used just like the built-in
backend types `ram_copies`, `disc_copies`, `disc_only_copies`.
Mnesia asks the plugin module which one of the built-in types'
semantics the new type is supposed to mimick: ram-only, ram+disk
or disk-only. This is mainly relevant for how Mnesia checkpoints and
backs up data.
### <a name="Table_of_Contents">Table of Contents</a> ###
1. [Usage](#Usage)
1. [Prerequisites](#Prerequisites)
2. [Getting started](#Getting_started)
3. [New indexing functionality](#New_indexing_functionality)
## Usage
### Prerequisites
- rocksdb (included as dependency)
- sext (included as dependency)
- Erlang/OTP 22.0 or newer (https://github.com/erlang/otp)
### Getting started
For the purposes of this user guide, we assume an unnamed, single node
mnesia installation. The only place where plugins are affected by
distributed Mnesia, is in the table sync callbacks. The simplest way
to get all paths in order for experimentation is to check out
`mnesia_rocksdb`, building it, and then calling `rebar3 shell`. Unless
we note otherwise, this is how a node has been started for each example.
> Erlang shell interactions have been slightly beautified by eliding
> some text and breaking and indenting some lines
#### Adding a backend type to mnesia
There are three different ways, all undocumented, to register a
backend plugin in mnesia:
1. Add a `backend_types` option when creating the schema, using
`mnesia:create_schema/2`
```erlang
Erlang/OTP 22 [erts-10.7] ...
Eshell V10.7 (abort with ^G)
1> mnesia:create_schema([node()],
[{backend_types,[{rocksdb_copies,mnesia_rocksdb}]}]).
ok
2> mnesia:start().
ok
3> mnesia_schema:backend_types().
[ram_copies,disc_copies,disc_only_copies,rocksdb_copies]
```
(In `mnesia_rocksdb`, a shortcut for this exists in `mnesia_rocksdb:create_schema(Nodes)`.)
2. Add it when starting mnesia, using `mnesia:start/1` (undocumented)
```erlang
Eshell V10.7 (abort with ^G)
1> mnesia:create_schema([node()]).
ok
2> mnesia:start([{schema,[{backend_types,
[{rocksdb_copies,mnesia_rocksdb}]}]}]).
ok
3> mnesia_schema:backend_types().
[ram_copies,disc_copies,disc_only_copies]
```
3. Call `mnesia_schema:add_backend_type/2` when mnesia is running.
```erlang
Eshell V10.7 (abort with ^G)
1> mnesia:create_schema([node()]).
ok
2> mnesia:start().
ok
3> mnesia_schema:add_backend_type(rocksdb_copies,mnesia_rocksdb).
{atomic,ok}
4> mnesia_schema:backend_types().
[ram_copies,disc_copies,disc_only_copies,rocksdb_copies]
```
In all cases the schema is updated, and other nodes, and subsequently
added nodes, will automatically receive the information.
The function `mnesia_schema:backend_types()` shows which backend plugin
aliases are registered.
The information is also displayed when calling `mnesia:info()`:
```erlang
5> mnesia:info().
---> Processes holding locks <---
---> Processes waiting for locks <---
---> Participant transactions <---
---> Coordinator transactions <---
---> Uncertain transactions <---
---> Active tables <---
schema : with 1 records occupying 443 words of mem
===> System info in version "4.16.3", debug level = none <===
opt_disc. Directory "/.../Mnesia.nonode@nohost" is used.
use fallback at restart = false
running db nodes = [nonode@nohost]
stopped db nodes = []
master node tables = []
backend types = rocksdb_copies - mnesia_rocksdb
remote = []
ram_copies = []
disc_copies = [schema]
disc_only_copies = []
[{nonode@nohost,disc_copies}] = [schema]
2 transactions committed, 0 aborted, 0 restarted, 0 logged to disc
0 held locks, 0 in queue; 0 local transactions, 0 remote
0 transactions waits for other nodes: []
ok
```
To illustrate how mnesia persists the information in the schema:
```erlang
6> mnesia:table_info(schema,user_properties).
[{mnesia_backend_types,[{rocksdb_copies,mnesia_rocksdb}]}]
```
#### Rocksdb registration shortcut
Call `mnesia_rocksdb:register()` after starting mnesia.
#### Creating a table
Put `{rocksdb_copies, [node()]}` into the table definitions of
tables you want to be in RocksDB.
```erlang
4> mnesia:create_table(t, [{rocksdb_copies,[node()]}]).
{atomic,ok}
5> rr(mnesia).
[commit,cstruct,cyclic,decision,log_header,mnesia_select,
tid,tidstore]
6> mnesia:table_info(t,cstruct).
#cstruct{name = t,type = set,ram_copies = [],
disc_copies = [],disc_only_copies = [],
external_copies = [{{rocksdb_copies,mnesia_rocksdb},
[nonode@nohost]}],
load_order = 0,access_mode = read_write,majority = false,
index = [],snmp = [],local_content = false,record_name = t,
attributes = [key,val],
user_properties = [],frag_properties = [],
storage_properties = [],
cookie = {{1621758137965715000,-576460752303423420,1},
nonode@nohost},
version = {{2,0},[]}}
```
In the example above, we take a peek at the `cstruct`, which is the
internal metadata structure for mnesia tables. The attribute showing
that the table has been created with a `rocksdb_copies` instance, is
the `external_copies` attribute. It lists the alias, the callback module
and the nodes, where the instances reside.
The table works essentially like one of the built-in table types.
If we want to find out which type, we can query the callback module:
```erlang
8> mnesia_rocksdb:semantics(rocksdb_copies, storage).
disc_only_copies
```
Consult the `mnesia_rocksdb` man page for more info on the
`Mod:semantics/2` function.
### New indexing functionality
With the introduction of backend plugins, a few improvements were made
to mnesia's indexing support.
#### Persistent indexes
In the past, and still with the built-in types, indexes were always
rebuilt on startup. Since backend plugins were introduced mainly in
order to support very large tables, a couple of callback functions
were added in order to detect whether a full rebuild is needed.
> The callback functions are `Mod:is_index_consistent/2` and
> `Mod:index_is_consistent/3`.
> The first function (figuratively) always returns `false` for indexes
> on built-in table types. Backend plugin modules should always return
> `false` if they have no information. After building the index, mnesia
> calls `Mod:index_is_consistent(Alias, IxTab, true)`, and the callback
> is expected to persist this information. `IxTab`, in this case, is
> a logical name for the index 'table': `{Tab, index, PosInfo}`
#### Ordered indexes
A problem in the past with mnesia indexing has been that indexes with
very large fan-out were inefficient. Indexes were represented as `bag`
tables, and the cost of inserting a secondary key was proportional to
the number of identical secondary keys already in the index.
When adding the backend plugin support - also not least because the
first candidate LevelDb didn't do bags well - support for ordered
indexes was added. They turn out to be have much more stable performance
for indexes with large fan-out. They also work on all built-in table
types.
When creating an index, you can specify the type of index as `bag` or
`ordered`. If you omit the type, it will default to `bag` for built-in
table types, and for external types, whatever is the first type in the
list of supported index types returned by `Mod:semantics(Alias, index_types)`.
> For `mnesia_rocksdb`, only `ordered` is supported, but a bug in mnesia
> makes it ignore this, and try to create a bag index anyway. The
> `mnesia_rocksdb` plugin rejects this.
> Note that while e.g. mnesia_rocksdb supports regular bag tables, they are not
> efficiently implemented.
Mnesia currently doesn't allow specifying an index type in
`mnesia:add_table_index/2`, so simply indicate the index position,
and let the backend choose the default.
Having ordered indexes opens up for some new possibilities, but
there are currently no functions in mnesia such as index_first, index_next
etc., or performing a select in index order.
#### Index plugins
Index plugins are a great new feature, also almost entirely undocumented.
An index plugin is a registered indexing function, which can operate
on the entire object, and shall return a list of secondary keys.
When registering an index plugin, it is given an alias, a callback module,
and an function name, not unlike backend plugins. The index plugin alias
must be an atom wrapped inside a 1-tuple, i.e. `{atom()}`.
To illustrate, we use a sample indexing function implemented in
mnesia_rocksdb, which checks all non-key attributes of an object,
and for each value that is a list, makes each list element a secondary
key value.
```erlang
9> mnesia_schema:add_index_plugin({lv}, mnesia_rocksdb, ix_listvals).
{atomic,ok}
10> mnesia:add_table_index(t,{lv}).
{atomic,ok}
11> mnesia:dirty_write({t,1,[a,b]}).
ok
12> mnesia:dirty_write({t,2,[b,c]}).
ok
13> mnesia:dirty_index_read(t,a,{lv}).
[{t,1,[a,b]}]
14> mnesia:dirty_index_read(t,b,{lv}).
[{t,1,[a,b]},{t,2,[b,c]}]
15> mnesia:dirty_index_read(t,c,{lv}).
[{t,2,[b,c]}]
```
For clarity, this is the implementation of the index callback:
```erlang
ix_listvals(_Tab, _Pos, Obj) ->
lists:foldl(
fun(V, Acc) when is_list(V) ->
V ++ Acc;
(_, Acc) ->
Acc
end, [], tl(tuple_to_list(Obj))).
```
Note that the index callback must be a pure function, as it
is also relied upon when deleting objects. That is, it must
always return the same values when called with a specific
set of input arguments.
+55
View File
@@ -0,0 +1,55 @@
/* standard EDoc style sheet */
body {
font-family: Verdana, Arial, Helvetica, sans-serif;
margin-left: .25in;
margin-right: .2in;
margin-top: 0.2in;
margin-bottom: 0.2in;
color: #000000;
background-color: #ffffff;
}
h1,h2 {
margin-left: -0.2in;
}
div.navbar {
background-color: #add8e6;
padding: 0.2em;
}
h2.indextitle {
padding: 0.4em;
background-color: #add8e6;
}
h3.function,h3.typedecl {
background-color: #add8e6;
padding-left: 1em;
}
div.spec {
margin-left: 2em;
background-color: #eeeeee;
}
a.module {
text-decoration:none
}
a.module:hover {
background-color: #eeeeee;
}
ul.definitions {
list-style-type: none;
}
ul.index {
list-style-type: none;
background-color: #eeeeee;
}
/*
* Minor style tweaks
*/
ul {
list-style-type: square;
}
table {
border-collapse: collapse;
}
td {
padding: 3
}