diff --git a/README.md b/README.md index 1ee59b8..b68cd1b 100644 --- a/README.md +++ b/README.md @@ -16,21 +16,21 @@ is provided. ### Table of Contents ### -1. [Usage](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Usage) -1. [Prerequisites](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Prerequisites) -1. [Getting started](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Getting_started) -1. [Special features](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Special_features) -1. [Customization](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Customization) -1. [Handling of errors in write operations](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Handling_of_errors_in_write_operations) -1. [Caveats](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Caveats) +1. [Usage](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Usage) +1. [Prerequisites](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Prerequisites) +1. [Getting started](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Getting_started) +1. [Special features](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Special_features) +1. [Customization](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Customization) +1. [Handling of errors in write operations](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Handling_of_errors_in_write_operations) +1. [Caveats](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Caveats) -1. [Mnesia backend plugins](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Mnesia_backend_plugins) -1. [Background](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Background) -1. [Design](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Design) +1. [Mnesia backend plugins](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Mnesia_backend_plugins) +1. [Background](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Background) +1. [Design](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Design) -1. [Mnesia index plugins](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Mnesia_index_plugins) +1. [Mnesia index plugins](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Mnesia_index_plugins) -1. [Rocksdb](https://github.com/aeternity/mnesia_rocksdb/blob/g3553-refactor-plugin-migration-tmp-220318/doc/README.md#Rocksdb) +1. [Rocksdb](https://github.com/aeternity/mnesia_rocksdb/blob/uw-mrdb-ms/doc/README.md#Rocksdb) @@ -80,7 +80,7 @@ RocksDB supports a number of customization options. These can be specified by providing a `{Key, Value}` list named `rocksdb_opts` under `user_properties`, for example: -```erlang +``` mnesia:create_table(foo, [{rocksdb_copies, [node()]}, ... {user_properties, @@ -93,7 +93,7 @@ for information on configuration parameters. Also see the section below on handl The default configuration for tables in `mnesia_rocksdb` is: -```erlang +``` default_open_opts() -> [ {create_if_missing, true} , {cache_size, @@ -131,18 +131,6 @@ depend on having an up to date size count at all times, you need to maintain it yourself. If you only need the size occasionally, you may traverse the table to count the elements. -When `mrdb` transactions abort, they will return a stacktrace caught -from within the transaction fun, giving much better debugging info. -This is different from how mnesia does it. - -If behavior closer to mnesia's abort returns are needed, say, for backwards -compatibility, this can be controlled by setting the environment variable -`-mnesia_rocksdb mnesia_compatible_aborts true`, or by adding a transaction -option, e.g. `mrdb:activity({tx, #{mnesia_compatible => true}}, fun() ... end)`. -For really performance-critical transactions which may abort often, it might -make a difference to set this option to `true`, since there is a cost involved -in producing stacktraces. - ### Mnesia backend plugins ### @@ -207,7 +195,7 @@ our example. It returns a list of index terms. Given the following index plugin implementation: -```erlang +``` -module(words). -export([words_f/3]). @@ -224,7 +212,7 @@ words_(_) -> We can register the plugin and use it in table definitions: -```erlang +``` Eshell V12.1.3 (abort with ^G) 1> mnesia:start(). ok @@ -240,7 +228,7 @@ as an exported function along the node's code path. To see what happens when we insert an object, we can turn on call trace. -```erlang +``` 4> dbg:tracer(). {ok,<0.108.0>} 5> dbg:tp(words, x). @@ -280,15 +268,17 @@ replaced by the new object.) - - - - - - - - - - -
mnesia_rocksdb
mnesia_rocksdb_admin
mnesia_rocksdb_app
mnesia_rocksdb_lib
mnesia_rocksdb_params
mnesia_rocksdb_sup
mnesia_rocksdb_tuning
mrdb
mrdb_index
mrdb_mutex
mrdb_select
+mnesia_rocksdb +mnesia_rocksdb_admin +mnesia_rocksdb_app +mnesia_rocksdb_lib +mnesia_rocksdb_params +mnesia_rocksdb_sup +mnesia_rocksdb_tuning +mrdb +mrdb_index +mrdb_mutex +mrdb_mutex_serializer +mrdb_select +mrdb_stats diff --git a/doc/README.md b/doc/README.md index 28c149d..0ae22f3 100644 --- a/doc/README.md +++ b/doc/README.md @@ -278,5 +278,7 @@ replaced by the new object.) mrdb mrdb_index mrdb_mutex -mrdb_select +mrdb_mutex_serializer +mrdb_select +mrdb_stats diff --git a/doc/mnesia_rocksdb.md b/doc/mnesia_rocksdb.md index cdb753e..2ac20de 100644 --- a/doc/mnesia_rocksdb.md +++ b/doc/mnesia_rocksdb.md @@ -267,7 +267,7 @@ table_type() = set | ordered_set | bag ### handle_call/3 ### -`handle_call(X1, From, St) -> any()` +`handle_call(M, From, St) -> any()` @@ -359,7 +359,7 @@ where Opts are parameters for the supervised child: ### load_table/4 ### -`load_table(Alias, Tab, LoadReason, Opts) -> any()` +`load_table(Alias, Tab, LoadReason, Props) -> any()` diff --git a/doc/mnesia_rocksdb_admin.md b/doc/mnesia_rocksdb_admin.md index 71ec35d..41b5fe6 100644 --- a/doc/mnesia_rocksdb_admin.md +++ b/doc/mnesia_rocksdb_admin.md @@ -48,7 +48,7 @@ cf() = mrdb:db_ref

-db_ref() = rocksdb:db_handle()
+db_ref() = rocksdb:db_handle()
 
@@ -108,7 +108,17 @@ reply() = any()

-req() = {create_table, table(), properties()} | {delete_table, table()} | {load_table, table()} | {related_resources, table()} | {get_ref, table()} | {add_aliases, [alias()]} | {write_table_property, tabname(), tuple()} | {remove_aliases, [alias()]} | {migrate, [{tabname(), map()}]} | {prep_close, table()} | {close_table, table()}
+req() = {create_table, table(), properties()} | {delete_table, table()} | {load_table, table(), properties()} | {related_resources, table()} | {get_ref, table()} | {add_aliases, [alias()]} | {write_table_property, tabname(), tuple()} | {remove_aliases, [alias()]} | {migrate, [tabname() | {tabname(), map()}], rpt()} | {prep_close, table()} | {close_table, table()} | {clear_table, table() | cf()}
+
+ + + + +### rpt() ### + + +

+rpt() = undefined | map()
 
@@ -146,7 +156,7 @@ tabname() = atom() ## Function Index ## -
add_aliases/1
close_table/2
code_change/3
create_table/3
delete_table/2
ensure_started/0
get_ref/1
get_ref/2
handle_call/3
handle_cast/2
handle_info/2
init/1
load_table/2
meta/0
migrate_standalone/2
prep_close/2
read_info/1
read_info/2
read_info/4
related_resources/2
remove_aliases/1
request_ref/2
start_link/0
terminate/2
write_info/4
write_table_property/3
+
add_aliases/1
clear_table/1
close_table/2
code_change/3
create_table/3
delete_info/3
delete_table/2
ensure_started/0
get_cached_env/2
get_ref/1
get_ref/2
handle_call/3
handle_cast/2
handle_info/2
init/1
load_table/3
meta/0
migrate_standalone/2
migrate_standalone/3
prep_close/2
read_info/1
read_info/2
read_info/4
related_resources/2
remove_aliases/1
request_ref/2
set_and_cache_env/2
start_link/0
terminate/2
write_info/4
write_table_property/3
@@ -159,6 +169,12 @@ tabname() = atom() `add_aliases(Aliases) -> any()` + + +### clear_table/1 ### + +`clear_table(Name) -> any()` + ### close_table/2 ### @@ -177,6 +193,12 @@ tabname() = atom() `create_table(Alias, Name, Props) -> any()` + + +### delete_info/3 ### + +`delete_info(Alias, Tab, K) -> any()` + ### delete_table/2 ### @@ -195,6 +217,12 @@ ensure_started() -> ok
+ + +### get_cached_env/2 ### + +`get_cached_env(Key, Default) -> any()` + ### get_ref/1 ### @@ -240,11 +268,11 @@ handle_info(Msg::any(), St::st()) -> + -### load_table/2 ### +### load_table/3 ### -`load_table(Alias, Name) -> any()` +`load_table(Alias, Name, Props) -> any()` @@ -256,7 +284,21 @@ handle_info(Msg::any(), St::st()) -> alias(), Tabs) -> Res + + + + + + +### migrate_standalone/3 ### + +

+migrate_standalone(Alias::alias(), Tabs, Rpt) -> Res
+
+ + @@ -300,6 +342,12 @@ handle_info(Msg::any(), St::st()) -> + +### set_and_cache_env/2 ### + +`set_and_cache_env(Key, Value) -> any()` + ### start_link/0 ### diff --git a/doc/mrdb.md b/doc/mrdb.md index c27d810..8738d60 100644 --- a/doc/mrdb.md +++ b/doc/mrdb.md @@ -26,8 +26,7 @@ follows: #{ name := , db_ref := , cf_handle := - , batch := - , tx_handle := + , activity := Ongoing batch or transaction, if any (map()) , attr_pos := #{AttrName := Pos} , mode := , properties := @@ -51,6 +50,16 @@ not be replicated. +### activity() ### + + +

+activity() = tx_activity() | batch_activity()
+
+ + + + ### activity_type() ### @@ -91,11 +100,21 @@ attr_pos() = #{atom() => pos()} +### batch_activity() ### + + +

+batch_activity() = #{type => batch, handle => batch_handle()}
+
+ + + + ### batch_handle() ###

-batch_handle() = rocksdb:batch_handle()
+batch_handle() = rocksdb:batch_handle()
 
@@ -105,7 +124,7 @@ batch_handle() = rocksdb:cf_handle() +cf_handle() = rocksdb:cf_handle() @@ -115,7 +134,7 @@ cf_handle() = rocksdb:db_handle() +db_handle() = rocksdb:db_handle() @@ -125,7 +144,7 @@ db_handle() = table(), alias => atom(), vsn => non_neg_integer(), db_ref => db_handle(), cf_handle => cf_handle(), semantics => semantics(), encoding => encoding(), attr_pos => attr_pos(), type => column_family | standalone, status => open | closed | pre_existing, properties => properties(), mode => mnesia, ix_vals_f => fun((tuple()) -> [any()]), batch => batch_handle(), tx_handle => tx_handle(), term() => term()} +db_ref() = #{name => table(), alias => atom(), vsn => non_neg_integer(), db_ref => db_handle(), cf_handle => cf_handle(), semantics => semantics(), encoding => encoding(), attr_pos => attr_pos(), type => column_family | standalone, status => open | closed | pre_existing, properties => properties(), mode => mnesia, ix_vals_f => fun((tuple()) -> [any()]), activity => activity(), term() => term()} @@ -165,7 +184,17 @@ index() = {tab_name(), index, any()}

-index_position() = atom() | pos()
+index_position() = atom() | pos() | plugin_ix_pos()
+
+ + + + +### inner() ### + + +

+inner() = non_neg_integer()
 
@@ -185,7 +214,7 @@ iterator_action() = first | last | next | prev | binary() | {seek, binary()} | {

-itr_handle() = rocksdb:itr_handle()
+itr_handle() = rocksdb:itr_handle()
 
@@ -211,6 +240,26 @@ key_encoding() = raw | sext | term +### match_pattern() ### + + +

+match_pattern() = matchpat_map() | ets:match_pattern()
+
+ + + + +### matchpat_map() ### + + +

+matchpat_map() = #{atom() => term()}
+
+ + + + ### mnesia_activity_type() ### @@ -251,6 +300,26 @@ obj() = tuple() +### outer() ### + + +

+outer() = non_neg_integer()
+
+ + + + +### plugin_ix_pos() ### + + +

+plugin_ix_pos() = {atom()}
+
+ + + + ### pos() ### @@ -265,7 +334,27 @@ pos() = non_neg_integer()

-properties() = #{record_name => atom(), attributes => [atom()], index => [{pos(), bag | ordered}]}
+properties() = #{record_name => atom(), attributes => [atom()], index => [{pos(), bag | ordered}], user_properties => #{propkey() => propvalue()}}
+
+ + + + +### propkey() ### + + +

+propkey() = any()
+
+ + + + +### propvalue() ### + + +

+propvalue() = any()
 
@@ -305,7 +394,7 @@ retainer() = {tab_name(), retainer, any()}

-retries() = non_neg_integer()
+retries() = outer() | {inner(), outer()}
 
@@ -325,7 +414,7 @@ semantics() = bag | set

-snapshot_handle() = rocksdb:snapshot_handle()
+snapshot_handle() = rocksdb:snapshot_handle()
 
@@ -351,11 +440,21 @@ table() = atom() | admin_tab() | tx_activity() ### + + +

+tx_activity() = #{type => tx, handle => tx_handle(), attempt => undefined | retries()}
+
+ + + + ### tx_handle() ###

-tx_handle() = rocksdb:transaction_handle()
+tx_handle() = rocksdb:transaction_handle()
 
@@ -365,7 +464,7 @@ tx_handle() = retries(), no_snapshot => boolean()} +tx_options() = #{retries => retries(), no_snapshot => boolean(), mnesia_compatible => boolean()} @@ -393,8 +492,8 @@ write_options() = [{sync, boolean()} | {disable_wal, boolean()} | {ignore_missin ## Function Index ## -
abort/1Aborts an ongoing activity/2
activity/3Run an activity (similar to //mnesia/mnesia:activity/2).
alias_of/1Returns the alias of a given table or table reference.
as_batch/2Creates a rocksdb batch context and executes the fun F in it.
as_batch/3as as_batch/2, but with the ability to pass Opts to rocksdb:write_batch/2
batch_write/2
batch_write/3
current_context/0
delete/2
delete/3
delete_object/2
delete_object/3
ensure_ref/1
ensure_ref/2
first/1
first/2
fold/3
fold/4
fold/5
get_batch/1
get_ref/1
index_read/3
insert/2
insert/3
iterator/1
iterator/2
iterator_close/1
iterator_move/2
last/1
last/2
match_delete/2
new_tx/1
new_tx/2
next/2
next/3
prev/2
prev/3
rdb_delete/2
rdb_delete/3
rdb_fold/4
rdb_fold/5
rdb_get/2
rdb_get/3
rdb_iterator/1
rdb_iterator/2
rdb_iterator_move/2
rdb_put/3
rdb_put/4
read/2
read/3
read_info/1
read_info/2
release_snapshot/1release a snapshot created by snapshot/1.
select/1
select/2
select/3
snapshot/1Create a snapshot of the database instance associated with the -table reference, table name or alias.
tx_commit/1
tx_ref/2
update_counter/3
update_counter/4
with_iterator/2
with_iterator/3
with_rdb_iterator/2
with_rdb_iterator/3
write_info/3
+
abort/1Aborts an ongoing activity/2
activity/3Run an activity (similar to //mnesia/mnesia:activity/2).
alias_of/1Returns the alias of a given table or table reference.
as_batch/2Creates a rocksdb batch context and executes the fun F in it.
as_batch/3as as_batch/2, but with the ability to pass Opts to rocksdb:write_batch/2
batch_write/2
batch_write/3
clear_table/1
current_context/0
delete/2
delete/3
delete_object/2
delete_object/3
ensure_ref/1
ensure_ref/2
first/1
first/2
fold/3
fold/4
fold/5
fold_reverse/3
fold_reverse/4
fold_reverse/5
get_batch/1
get_ref/1
index_read/3
insert/2
insert/3
iterator/1
iterator/2
iterator_close/1
iterator_move/2
last/1
last/2
match_delete/2
merge/3
merge/4
ms/2Produce a match specification for select(), supporting map-based match patterns.
new_tx/1
new_tx/2
next/2
next/3
prev/2
prev/3
rdb_delete/2
rdb_delete/3
rdb_fold/4
rdb_fold/5
rdb_fold_reverse/4
rdb_fold_reverse/5
rdb_get/2
rdb_get/3
rdb_iterator/1
rdb_iterator/2
rdb_iterator_move/2
rdb_put/3
rdb_put/4
read/2
read/3
read_info/1
read_info/2
release_snapshot/1release a snapshot created by snapshot/1.
select/1
select/2
select/3
select_reverse/2
select_reverse/3
snapshot/1Create a snapshot of the database instance associated with the +table reference, table name or alias.
tx_commit/1
tx_ref/2
update_counter/3
update_counter/4
with_iterator/2Create an iterator on table Tab for the duration of Fun
with_iterator/3Create an iterator on table Tab with ReadOptions for the duration of Fun
with_rdb_iterator/2
with_rdb_iterator/3
write_info/3
@@ -405,7 +504,10 @@ table reference, table name or alias. + +### clear_table/1 ### + +`clear_table(Tab) -> any()` + ### current_context/0 ### @@ -537,7 +653,7 @@ delete(Tab::ref_or_tab(), Key::ref_or_tab()) -> db_ref() +ensure_ref(R::ref_or_tab()) -> db_ref()
@@ -583,6 +699,24 @@ first(Tab::ref_or_tab(), Opts:: + +### fold_reverse/3 ### + +`fold_reverse(Tab, Fun, Acc) -> any()` + + + +### fold_reverse/4 ### + +`fold_reverse(Tab, Fun, Acc, MatchSpec) -> any()` + + + +### fold_reverse/5 ### + +`fold_reverse(Tab, Fun, Acc, MatchSpec, Limit) -> any()` + ### get_batch/1 ### @@ -682,6 +816,50 @@ last(Tab::ref_or_tab(), Opts:: + +### merge/3 ### + +`merge(Tab, Key, MergeOp) -> any()` + + + +### merge/4 ### + +`merge(Tab, Key, MergeOp, Opts) -> any()` + + + +### ms/2 ### + +

+ms(Tab::ref_or_tab(), Pat::[{match_pattern(), [term()], [term()]}]) -> ets:match_spec()
+
+
+ +Produce a match specification for select(), supporting map-based match patterns + +Using record syntax in match patterns tends to conflict with type checking. This +function offers an alternative approach, drawing on the fact that mnesia_rocksdb +keeps the record name and attribute names readily available as persistent terms. + +When using the map-based representation, the match pattern is built by matching +attribute names to map elements; any attribute not found in the map gets set to '_'. +Thus, +``` + [{#balance{key = {Acct,'$1'},_='_'},[{'>=','$1',Height}],['$_']}] +``` + can be +created as +``` + ms(balance,[{#{key => {Acct,'$1'}},[{'>=','$1',Height}],['$_']}]) +``` + +. + +This has the advantage over `ms_transform` that it can handle bound variables +in the match pattern. The function works on all mnesia table types. + ### new_tx/1 ### @@ -760,6 +938,18 @@ prev(Tab::ref_or_tab(), K::ke `rdb_fold(Tab, Fun, Acc, Prefix, Limit) -> any()` + + +### rdb_fold_reverse/4 ### + +`rdb_fold_reverse(Tab, Fun, Acc, Prefix) -> any()` + + + +### rdb_fold_reverse/5 ### + +`rdb_fold_reverse(Tab, Fun, Acc, Prefix, Limit) -> any()` + ### rdb_get/2 ### @@ -855,6 +1045,18 @@ release a snapshot created by [`snapshot/1`](#snapshot-1). `select(Tab, Pat, Limit) -> any()` + + +### select_reverse/2 ### + +`select_reverse(Tab, Pat) -> any()` + + + +### select_reverse/3 ### + +`select_reverse(Tab, Pat, Limit) -> any()` + ### snapshot/1 ### @@ -908,6 +1110,13 @@ with_iterator(Tab::ref_or_tab(), Fun::fun((
+Equivalent to [`with_iterator(Tab, Fun, [])`](#with_iterator-3). + +Create an iterator on table `Tab` for the duration of `Fun` + +The iterator is passed to the provided fun as `Fun(Iterator)`, and is +closed once the fun terminates. +
### with_iterator/3 ### @@ -917,6 +1126,15 @@ with_iterator(Tab::ref_or_tab(), Fun::fun((
+Create an iterator on table `Tab` with `ReadOptions` for the duration of `Fun` + +The iterator is passed to the provided fun as `Fun(Iterator)`, and is +closed once the fun terminates. + +The iterator respects `mnesia_rocksdb` metadata, so accesses through the iterator +will return `{ok, Obj}` where`Obj` is the complete decoded object. +For rocksdb-level iterators, see [`with_rdb_iterator/3`](#with_rdb_iterator-3). +
### with_rdb_iterator/2 ### diff --git a/doc/mrdb_index.md b/doc/mrdb_index.md index 5b890a0..a79300f 100644 --- a/doc/mrdb_index.md +++ b/doc/mrdb_index.md @@ -36,7 +36,7 @@ iterator_action() = mrdb:iterator(), type = set | bag, sub = mrdb:ref() | pid()} +ix_iterator() = #mrdb_ix_iter{i = mrdb:mrdb_iterator(), type = set | bag, sub = pid() | mrdb:db_ref()} @@ -54,13 +54,32 @@ object() = tuple() ## Function Index ## -
iterator/2
iterator_close/1
iterator_move/2
with_iterator/3
+
fold/4
index_ref/2
iterator/2
iterator_close/1
iterator_move/2
rev_fold/4
select/3
select/4
select_reverse/3
select_reverse/4
with_iterator/3
## Function Details ## + + +### fold/4 ### + +

+fold(Tab::mrdb:ref_or_tab(), IxPos::mrdb:index_position(), FoldFun::fun((index_value(), object() | [], Acc) -> Acc), Acc) -> Acc
+
+ +
  • Acc = any()
+ + + +### index_ref/2 ### + +

+index_ref(Tab::mrdb:ref_or_tab(), Ix::mrdb:index_position()) -> mrdb:db_ref()
+
+
+ ### iterator/2 ### @@ -88,6 +107,40 @@ iterator_move(Mrdb_ix_iter::ix_iterator(), Dir::
+ + +### rev_fold/4 ### + +

+rev_fold(Tab::mrdb:ref_or_tab(), IxPos::mrdb:index_position(), FoldFun::fun((index_value(), object() | [], Acc) -> Acc), Acc) -> Acc
+
+ +
  • Acc = any()
+ + + +### select/3 ### + +`select(Tab, Ix, MS) -> any()` + + + +### select/4 ### + +`select(Tab, Ix, MS, Limit) -> any()` + + + +### select_reverse/3 ### + +`select_reverse(Tab, Ix, MS) -> any()` + + + +### select_reverse/4 ### + +`select_reverse(Tab, Ix, MS, Limit) -> any()` + ### with_iterator/3 ### diff --git a/doc/mrdb_mutex.md b/doc/mrdb_mutex.md index cc11dfc..0e2996d 100644 --- a/doc/mrdb_mutex.md +++ b/doc/mrdb_mutex.md @@ -9,7 +9,7 @@ ## Function Index ## -
do/2
ensure_tab/0
+
do/2
@@ -22,9 +22,3 @@ `do(Rsrc, F) -> any()` - - -### ensure_tab/0 ### - -`ensure_tab() -> any()` - diff --git a/doc/mrdb_select.md b/doc/mrdb_select.md index 295116d..1dfd187 100644 --- a/doc/mrdb_select.md +++ b/doc/mrdb_select.md @@ -9,25 +9,43 @@ ## Function Index ## -
fold/5
rdb_fold/5
select/1
select/3
select/4
+
continuation_info/2
fold/5
fold_reverse/5
rdb_fold/5
rdb_fold_reverse/5
select/1
select/3
select/4
select_reverse/3
select_reverse/4
## Function Details ## + + +### continuation_info/2 ### + +`continuation_info(Item, C) -> any()` + ### fold/5 ### `fold(Ref, Fun, Acc, MS, Limit) -> any()` + + +### fold_reverse/5 ### + +`fold_reverse(Ref, Fun, Acc, MS, Limit) -> any()` + ### rdb_fold/5 ### `rdb_fold(Ref, Fun, Acc, Prefix, Limit) -> any()` + + +### rdb_fold_reverse/5 ### + +`rdb_fold_reverse(Ref, Fun, Acc, Prefix, Limit) -> any()` + ### select/1 ### @@ -46,3 +64,15 @@ `select(Ref, MS, AccKeys, Limit) -> any()` + + +### select_reverse/3 ### + +`select_reverse(Ref, MS, Limit) -> any()` + + + +### select_reverse/4 ### + +`select_reverse(Ref, MS, AccKeys, Limit) -> any()` + diff --git a/doc/overview.md b/doc/overview.md new file mode 100644 index 0000000..331811d --- /dev/null +++ b/doc/overview.md @@ -0,0 +1,180 @@ +# Mnesia Rocksdb - Rocksdb backend plugin for Mnesia + +Copyright © 2013-21 Klarna AB + +__Authors:__Ulf Wiger ([`ulf@wiger.net`](mailto:ulf@wiger.net)). + +The Mnesia DBMS, part of Erlang/OTP, supports 'backend plugins', making it possible to utilize more capable key-value stores than the `dets` module (limited to 2 GB per table). Unfortunately, this support is undocumented. Below, some informal documentation for the plugin system is provided. + +### Table of Contents + +1. [Usage](#Usage) +1. [Prerequisites](#Prerequisites) +1. [Getting started](#Getting_started) +1. [Special features](#Special_features) +1. [Customization](#Customization) +1. [Handling of errors in write operations](#Handling_of_errors_in_write_operations) +1. [Caveats](#Caveats) + +1. [Mnesia backend plugins](#Mnesia_backend_plugins) +1. [Background](#Background) +1. [Design](#Design) + +1. [Mnesia index plugins](#Mnesia_index_plugins) +1. [Rocksdb](#Rocksdb) + +### Usage + +#### Prerequisites + +* rocksdb (included as dependency) +* sext (included as dependency) +* Erlang/OTP 21.0 or newer (https://github.com/erlang/otp) + +#### Getting started + +Call `mnesia_rocksdb:register()` immediately after starting mnesia. + +Put `{rocksdb_copies, [node()]}` into the table definitions of tables you want to be in RocksDB. + +#### Special features + +RocksDB tables support efficient selects on *prefix keys*. + +The backend uses the `sext` module (see [`https://github.com/uwiger/sext`](https://github.com/uwiger/sext)) for mapping between Erlang terms and the binary data stored in the tables. This provides two useful properties: + +* The records are stored in the Erlang term order of their keys. +* A prefix of a composite key is ordered just before any key for which it is a prefix. For example, `{x, '_'}` is a prefix for keys `{x, a}`, `{x, b}` and so on. + +This means that a prefix key identifies the start of the sequence of entries whose keys match the prefix. The backend uses this to optimize selects on prefix keys. + +\### Customization + +RocksDB supports a number of customization options. These can be specified by providing a `{Key, Value}` list named `rocksdb_opts` under `user_properties`, for example: + +```text +mnesia:create_table(foo, [{rocksdb_copies, [node()]}, + ... + {user_properties, + [{rocksdb_opts, [{max_open_files, 1024}]}] + }]) +``` + +Consult the [RocksDB documentation](https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning) for information on configuration parameters. Also see the section below on handling write errors. + +The default configuration for tables in `mnesia_rocksdb` is: + +```text +default_open_opts() -> + [ {create_if_missing, true} + , {cache_size, + list_to_integer(get_env_default("ROCKSDB_CACHE_SIZE", "32212254"))} + , {block_size, 1024} + , {max_open_files, 100} + , {write_buffer_size, + list_to_integer(get_env_default( + "ROCKSDB_WRITE_BUFFER_SIZE", "4194304"))} + , {compression, + list_to_atom(get_env_default("ROCKSDB_COMPRESSION", "true"))} + , {use_bloomfilter, true} + ]. +``` + +It is also possible, for larger databases, to produce a tuning parameter file. This is experimental, and mostly copied from `mnesia_leveldb`. Consult the source code in `mnesia_rocksdb_tuning.erl` and `mnesia_rocksdb_params.erl`. Contributions are welcome. + +#### Caveats + +Avoid placing `bag` tables in RocksDB. Although they work, each write requires additional reads, causing substantial runtime overheads. There are better ways to represent and process bag data (see above about *prefix keys*). + +The `mnesia:table_info(T, size)` call always returns zero for RocksDB tables. RocksDB itself does not track the number of elements in a table, and although it is possible to make the `mnesia_rocksdb` backend maintain a size counter, it incurs a high runtime overhead for writes and deletes since it forces them to first do a read to check the existence of the key. If you depend on having an up to date size count at all times, you need to maintain it yourself. If you only need the size occasionally, you may traverse the table to count the elements. + +### Mnesia backend plugins + +#### Background + +Mnesia was initially designed to be a RAM-only DBMS, and Erlang's `ets` tables were developed for this purpose. In order to support persistence, e.g. for configuration data, a disk-based version of `ets` (called `dets`) was created. The `dets` API mimicks the `ets` API, and `dets` is quite convenient and fast for (nowadays) small datasets. However, using a 32-bit bucket system, it is limited to 2GB of data. It also doesn't support ordered sets. When used in Mnesia, dets-based tables are called `disc_only_copies`. + +To circumvent these limitations, another table type, called `disc_copies` was added. This is a combination of `ets` and `disk_log`, where Mnesia periodically snapshots the `ets` data to a log file on disk, and meanwhile maintains a log of updates, which can be applied at startup. These tables are quite performant (especially on read access), but all data is kept in RAM, which can become a serious limitation. + +A backend plugin system was proposed by Ulf Wiger in 2016, and further developed with Klarna's support, to finally become included in OTP 19. Klarna uses a LevelDb backend, but Aeternity, in 2017, instead chose to implement a Rocksdb backend plugin. + +### Design + +As backend plugins were added on a long-since legacy-stable Mnesia, they had to conform to the existing code structure. For this reason, the plugin callbacks hook into the already present low-level access API in the `mnesia_lib` module. As a consequence, backend plugins have the same access semantics and granularity as `ets` and `dets`. This isn't much of a disadvantage for key-value stores like LevelDb and RocksDB, but a more serious issue is that the update part of this API is called on *after* the point of no return. That is, Mnesia does not expect these updates to fail, and has no recourse if they do. As an aside, this could also happen if a `disc_only_copies` table exceeds the 2 GB limit (mnesia will not check it, and `dets` will not complain, but simply drop the update.) + +### Mnesia index plugins + +When adding support for backend plugins, index plugins were also added. Unfortunately, they remain undocumented. + +An index plugin can be added in one of two ways: + +1. When creating a schema, provide `{index_plugins, [{Name, Module, Function}]}` options. +1. Call the function `mnesia_schema:add_index_plugin(Name, Module, Function)` + +`Name` must be an atom wrapped as a 1-tuple, e.g. `{words}`. + +The plugin callback is called as `Module:Function(Table, Pos, Obj)`, where `Pos=={words}` in our example. It returns a list of index terms. + +__Example__ + +Given the following index plugin implementation: + +```text +-module(words). +-export([words_f/3]). + +words_f(_,_,Obj) when is_tuple(Obj) -> + words_(tuple_to_list(Obj)). + +words_(Str) when is_binary(Str) -> + string:lexemes(Str, [$\s, $\n, [$\r,$\n]]); +words_(L) when is_list(L) -> + lists:flatmap(fun words_/1, L); +words_(_) -> + []. +``` + +We can register the plugin and use it in table definitions: + +```text +Eshell V12.1.3 (abort with ^G) +1> mnesia:start(). +ok +2> mnesia_schema:add_index_plugin({words}, words, words_f). +{atomic,ok} +3> mnesia:create_table(i, [{index, [{words}]}]). +{atomic,ok} +``` + +Note that in this case, we had neither a backend plugin, nor even a persistent schema. Index plugins can be used with all table types. The registered indexing function (arity 3) must exist as an exported function along the node's code path. + +To see what happens when we insert an object, we can turn on call trace. + +```text +4> dbg:tracer(). +{ok,<0.108.0>} +5> dbg:tp(words, x). +{ok,[{matched,nonode@nohost,3},{saved,x}]} +6> dbg:p(all,[c]). +{ok,[{matched,nonode@nohost,60}]} +7> mnesia:dirty_write({i,<<"one two">>, [<<"three">>, <<"four">>]}). +(<0.84.0>) call words:words_f(i,{words},{i,<<"one two">>,[<<"three">>,<<"four">>]}) +(<0.84.0>) returned from words:words_f/3 -> [<<"one">>,<<"two">>,<<"three">>, + <<"four">>] +(<0.84.0>) call words:words_f(i,{words},{i,<<"one two">>,[<<"three">>,<<"four">>]}) +(<0.84.0>) returned from words:words_f/3 -> [<<"one">>,<<"two">>,<<"three">>, + <<"four">>] +ok +8> dbg:ctp('_'), dbg:stop(). +ok +9> mnesia:dirty_index_read(i, <<"one">>, {words}). +[{i,<<"one two">>,[<<"three">>,<<"four">>]}] +``` + +(The fact that the indexing function is called twice, seems like a performance bug.) + +We can observe that the indexing callback is able to operate on the whole object. It needs to be side-effect free and efficient, since it will be called at least once for each update (if an old object exists in the table, the indexing function will be called on it too, before it is replaced by the new object.) + +### Rocksdb + +### Usage diff --git a/rebar.config b/rebar.config index e480c14..cafa227 100644 --- a/rebar.config +++ b/rebar.config @@ -34,7 +34,7 @@ {edown, %% Use as `rebar3 as edown do edoc` [ - {deps, [{edown, "0.8.4"}]}, + {deps, [{edown, "0.9.2"}]}, {edoc_opts, [{doclet, edown_doclet}, {app_default, "http://www.erlang.org/doc/man"},