Skip to content
Unbelievably space efficient data structures in Golang.
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Squashed .github/CODE_OF_CONDUCT.md master:.github/CODE_OF_CONDUCT.md… Mar 16, 2019
array bits: use universal int types Mar 17, 2019
bits bits: use universal int types Mar 17, 2019
docs/trie add memusage.md Mar 14, 2019
index index: move SlimIndex into a separate sub-package Mar 8, 2019
iohelper Add Makefile of checkings and fix reported minor issues Feb 21, 2019
marshal marshal: add signed int marshallers; fix typo in Makefile Mar 5, 2019
prototype array: refine proto definition Feb 25, 2019
serialize rename and refactor linters warning Feb 23, 2019
strhelper strhelper: add Feb 27, 2019
tools tools: add memory usage report Mar 13, 2019
trie
typehelper typehelper: addit Feb 27, 2019
vendor/github.com add dependence in vendor/ Feb 27, 2019
version array: add GetVersion() Feb 25, 2019
.gitignore gitignore .DS_Store Mar 14, 2019
.gitsubrepo .github: import CODE_OF_CONDUCT.md Mar 16, 2019
.travis.yml travis: tell travis to use makefile to test; test go-1.9 and newer Feb 23, 2019
Gopkg.lock refine trie search benchmark Feb 27, 2019
Gopkg.toml refine trie search benchmark Feb 27, 2019
LICENSE Initial commit Feb 2, 2019
Makefile .github: import CODE_OF_CONDUCT.md Mar 16, 2019
README.md .github: import CODE_OF_CONDUCT.md Mar 16, 2019
appveyor.yml add appveyor.yml for auto build on appveyor.com Feb 20, 2019
git-subrepo Squashed git-subrepo master:git-subrepo 4cf23f478 (2019-02-14 19:18:30) Mar 16, 2019
int-types.md array: converter.go add doc; use int type for Unmarshal() return value Feb 21, 2019

README.md

Slim - surprisingly space efficient data types in Golang

Travis-CI AppVeyor GoDoc Report card Sourcegraph

Slim is collection of surprisingly space efficient data types, with corresponding serialisation APIs to persisting them on-disk or for transport.

Why slim

As data on internet keeps increasing exponentially, the capacity gap between memory and disk becomes greater.

Most of the time, a data itself does not need to be loaded into expensive main memory. Only the much more important information, WHERE-A-DATA-IS, deserve a seat in main memory.

This is what slim does, keeps as little information as possible in main memory, as a minimised index of huge amount external data.

  • SlimIndex: is a common index structure, building on top of SlimTrie.

    GoDoc

  • SlimTrie is the underlying index data structure, evolved from trie.

    GoDoc

    Features:

    • Minimised: requires only 6 bytes per key(even less than an 8-byte pointer!!).

    • Stable: memory consumption is stable in various scenarios. The Worst case converges to average consumption tightly. See benchmark.

    • Unlimited key length: You can have VERY long keys, without bothering yourself with any waste of memory(and money). Do not waste your life writing another prefix compression:). (aws-s3 limits key length to 1024 bytes). Memory consumption only relates to key count, not to key length.

    • Ordered: like btree, keys are stored in alphabetic order, but faster to access. Range-scan is supported!(under development)

    • Fast: time complexity for a get is O(log(n) + k); n: key count; k: key length. With comparison to btree, which is O(log(n) * k)(tree-like). And golang-map is O(k)(hash-table-like).

    • Ready for transport: SlimTrie has no gap between its in-memory layout and its on-disk layout or transport presentation. Unlike other data-structure such as the popular red-black-tree, which is designed for in-memory use only and requires additional work to persist or transport it.

    • Loosely coupled design: index logic and data storage is completely separated. Piece of cake using SlimTrie to index huge data.

Memory overhead

Comparison of SlimTrie and native golang-map.

  • Key length: 512 byte, different key count:
Key count Key length SlimTrie: byte/key Map: byte/key
13107 512 7.3 542.2
16384 512 7.0 554.5
21845 512 7.2 544.9
  • Key count: 16384, different key length:
Key count Key length SlimTrie: byte/key Map: byte/key
16384 512 7.0 554.5
16384 1024 7.0 1066.5

Memory overhead per key is stable with different key length(k) and key count(n):

benchmark-mem-kn-png

SlimTrie memory does not increase when key become longer.

Performance benchmark

Time(in nano second) spent on a get operation with SlimTrie, golang-map and btree by google.

Smaller is better.

benchmark-get-png

Key count Key length SlimTrie Map Btree
1 1024 86.3 5.0 36.9
10 1024 90.7 59.7 99.5
100 1024 123.3 60.1 240.6
1000 1024 157.0 63.5 389.6
1000 512 152.6 40.0 363.0
1000 256 152.3 28.8 332.3

It is about 2.6 times faster than the btree by google.

Time(in nano second) spent on a get with different key count(n) and key length(k):

benchmark-get-kn-png

See: benchmark-get-md.

Status

  • 0.1.0(Current): index structure.

Synopsis

Use SlimIndex to index external data

package main
import (
        "fmt"
        "strings"
        "github.com/openacid/slim/index"
)

type Data string

func (d Data) Read(offset int64, key string) (string, bool) {
        kv := strings.Split(string(d)[offset:], ",")[0:2]
        if kv[0] == key {
                return kv[1], true
        }
        return "", false
}

func main() {
        // `data` is a sample external data.
        // In order to let SlimIndex be able to read data, `data` should have
        // a `Read` method:
        //     Read(offset int64, key string) (string, bool)
        data := Data("Aaron,1,Agatha,1,Al,2,Albert,3,Alexander,5,Alison,8")

        // keyOffsets is a prebuilt index that stores key and its offset in data accordingly.
        keyOffsets := []index.OffsetIndexItem{
                {Key: "Aaron",     Offset: 0},
                {Key: "Agatha",    Offset: 8},
                {Key: "Al",        Offset: 17},
                {Key: "Albert",    Offset: 22},
                {Key: "Alexander", Offset: 31},
                {Key: "Alison",    Offset: 43},
        }

        // Create an index
        st, err := index.NewSlimIndex(keyOffsets, data)
        if err != nil {
                fmt.Println(err)
        }

        // Lookup by SlimIndex
        v, found := st.Get2("Alison")
        fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "Alison", found, v)

        v, found = st.Get2("foo")
        fmt.Printf("key: %q\n  found: %t\n  value: %q\n", "foo", found, v)

        // Output:
        // key: "Alison"
        //   found: true
        //   value: "8"
        // key: "foo"
        //   found: false
        //   value: ""
}

Getting started

Install

go get github.com/openacid/slim

All dependency packages are included in vendor/ dir.

Prerequisites

  • For users (who'd like to build cool stuff with slim):

    Nothing.

  • For contributors (who'd like to make slim better):

    • dep: for dependency management.
    • protobuf: for re-compiling *.proto file if on-disk data structure changes.

    Max OS X:

    brew install dep protobuf

    On other platforms you can read more: dep-install, protoc-install.

Who are using slim

baishancloud

Roadmap

  • 2019 Mar 08: SlimIndex, SlimTrie
  • Marshalling support
  • SlimArray

Feedback and contributions

Feedback and Contributions are greatly appreciated.

At this stage, the maintainers are most interested in feedback centred on:

  • Do you have a real life scenario that slim supports well, or doesn't support at all?
  • Do any of the APIs fulfil your needs well?

Let us know by filing an issue, describing what you did or wanted to do, what you expected to happen, and what actually happened:

Or other type of issue.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.