Keeping CALM: When Distributed Consistency is Easy

Eventual consistency.

In Dok, I can mark some action as “not backtrack-able”. The actions backtrackable can be done following the CALM apprach and eventual consistency.

In Bud, that supports this paradigm, instead of queues like in the paradigm/actor-model, there are tables and messages are sent in batch, after each clock cycle.

http://bloom-lang.net/index.html

The CAP theorems says that strong consistency (SSI) and availability are unachievable in the presence of partitions.

FACT A new theorem proves that no consistency model stronger than causal consistency is available in the presence of partitions.

Causal consistency:

• each process writes are seen in order
• writes follow reads: if P read A=5 and write B=10, then another process P’ can not read B=10 and then an older value of A)
• transitive data dependencies hold

Causal consistency is still less than ACID transactions, but a lot of useful code can be written with causal-consistency guarantee, and it is easy to understand and predictable for the programmers.

There are many systems with very fast performances, only marginally slower than eventually-consistent counterpart (< 10%), that guarantee causal-consistency.

TODO see https://github.com/bloom-lang/bud/blob/master/docs/cheat.md for more details about Bloom language

TODO regardnig Calm and Bloom:

FACT Anna seems to favour copy of values also in case of shared RAM because it seems faster in any case. The same for seastars framework. They do not use shared-ram, NUMA. This reduce stress on TLB and cache coherence.

MAYBE transfer complex data between distributed Actors using some incremental compression way.

TODO read again the “Programming in E as Plan Coordination by Miller …”

• a lot of examples
• a lot of communication patterns
• example of error management
• omnicomprehensive: security, error management, reliability
• TODO integrate with Plan9 plumber idea: scriptable reaction to events, in a dbus-like setting, with data transformation
• TODO see if with Plumber integration, the integrated set of rules mantain an OCAP flavour, or it became an ACL (the users that can access to some events)

MAYBE a BUS of events/facts can be used for having CLIPS like generative rules and reacting to certain conditions. So it is a form of CLIPS, TupleSpaces, DashBoard and other similar design patterns.

TODO add also fuzzy rules to dashboard-like design patterns.

TODO probably it can be integrated with tech/syndicate.

TODO probably I can extend using the same approach used for Bloom:

• modules with override
• named rules with meta-annotations
• new facts are in the form of functions returing values and they are not directly added
• I can compile functional and other code to incremental update and rete-like code and the compiler can advise when this is not possible
• a variant of derived facts is compiled returning only new facts added from a certain time-stamp
• global code analysis is performed or understanding which types of queries and new facts are used and for producing only this

TODO integrate discussion on Calm with more papers on DokMelody about distributed systems of the same authors

TODO integrate Calm with capability model ideas in particular when I pass address of external process and resources

TODO in Calm support also :

• periodic
• lattices like max/bool/min/etc..

TODO there must be an analysis phase in which the coordination points must be completed

TODO say in Relational DBMS section how join are converted to Knil code, i.e. using explicit filter and so on

TODO represents STREAMS like Calm Process but give them a reasonable API

TODO support Process address in channels

TODO support local Process address @local_addr

MAYBE an interface is like a Channel with implicit local address

TODO derive graphs like in the Alvaro paper describing the dataflow of the code and where synchronization is needed

TODO @Include(…) can be used for compact data presentation in Knil when there are many repeated fields

TODO use the concept of Tuple* as a table also in Knil

## Why using Calm

In theory DBMS solve the problem of safe and fast transaction processing, because they guarantee ACID properties, but in practice for performance reasons rarely they support full ACID transactions in production, and for distributed DBMS there can be several consistency problems due the CAP theorem.

The maximum level of consistency supported by a distributed DBMS is casual consistency. (TODO specify better).

Current distributed DBMS use usually only eventually-consistency:

• at some point all the nodes will be informed of the writes
• at some point reads will be consistent
• in case of conflicts, there had to be a repair procedure
• good enough in real-world scenario
• usually in practice nodes can update themself in less than 1 second
• usually also banks can adopt repair procedures (compensation), in case of inconsistent coded
• it is possible (and product do) to calculate, or configure, the maximum times within data is not consistent between nodes, and calculate the probability of bad transactions, that are usually very low
• so an optimistic approach work good in this type of applications
• but writing compensation code is difficult and error-prone

Code with monotonic operations is consistent in a eventually-consistent DBMS. Monotonic operations are:

• accumulating values in a set
• testing a threshold conditions

These operations are non-monotonic, and so can cause inconsistency:

• set deletion
• counter resets
• negation conditions (not-exists)

PAPER: Anna: A KVS For Any Scale by Wu, Faleiro, Lin, Hellerstein

They take the idea of Bloom, and implemented a fast and scalable KVS database in C++.

Anna is also a library, and not only a product, because it can be used for implementing different things.

The design:

• one thread per core, without shared memory
• every thread spend all the time working on data, without data synchronization
• data is sharded
• there can be multiple-master: data is replicated for faster parallel access when it is needed, both for reading and writing
• also simple locks using fast atomic operations are very slow in high concurrently scenario, for example also incrementing the counter of next transaction-id (INCREDIBLE)
• threads comunicate using message passing
• threads coordinate the state using “distribuited lattices” that are a theory that enable various types of transaction isolations, also if not fully serializable
• using distribuited lattices they minimize the cost of synchronizations, because the temporal/order differences of messages can be merged in a unique state
• they allow multi-master mode: multiple nodes can have shard of the same data, and then they are merged using distributed lattices, so they remain consistent enough
• distributed lattice allow read committed, and read uncommitted consistency levels, but not lirearizable
• it is in any case more than many distributed KVS, and often in par with many relational DBMS as used in practice

FACT ideas are very similar to seastars approarch with a thread for core and in full control of the CPU and no shared RAM and similar things.

## https://www.reactivemanifesto.org

Reactive systems are:

• responsive: soft realtime, guarantee uptime bound, problems are discovered and signaled in advance, favour fast interaction with the user.
• resilient: the system stays responsive in face of failure, by replication, containement, isolation and delegation. Part of the system can fail, without compromizing the whole system. Parts of the systems are devoted to check wich sub-parts had failed. In practice this mean redundant hardware, and no off-line management.
• elastic: the system is reactive under varying workloads, and it scales linearly according the added resources. They provides relevant live performance measures.
• message driven: they use asynchronous message-passing, obtaining loose coupling, isolation, and location transparency.

cLarge systems are composed of smaller ones, following the same principles. So these rules apply at all level of scales.

So Calm should allows reactive systems:

• loggin of problems
• distributed services
• monitoring