Get up and running with CouchDb and C# using MyCouch on Windows

This will show you how easy it is to get CouchDb up and running on a Windows machine. I will show you how to use the builtin admin tool to secure the server as well as the database that we will create. I will then show you how easy it is to interact with it using a C# client called, MyCouch.

Install CouchDb (5min)

1) Download CouchDb from the official site

2) Select the Windows download, I selected: Windows (x86) Erlang/OTP R15B03-1 | Version 1.3.0.

gurwcam-download

3) After download is complete, run it. I went with the provided default options:

gurwcam-install1

gurwcam-install2

4) Test it. Fire up a browser and head over to: http://localhost:5984. You will be presented with something similar to this:

{
    "couchdb": "Welcome",
    "uuid": "c9e34dba2a5cc89e63cbd96be61c7529",
    "version": "1.3.0",
    "vendor": {
        "version": "1.3.0",
        "name": "The Apache Software Foundation"
    }
}

Well done, first task completed!

Use the admin tool to secure the server and our test database

This chapter shows you more than what is required just to test things out, so if you are not interested in the security parts, just ensure you have created the test database before continuing to the next chapter.

CouchDb comes with a web based admin tool called Futon. You find it by navigating to: http://localhost:5984/_utils/

gurwcam-hellofuton

Here you can manage your CouchDb installation like securing it, adding users, creating databases, designing views, managing documents in the database etc. What is important to note is the little text in the right corner:

Welcome to Admin Party!
Everyone is admin. Fix this

If you click the "Fix this" link, you will be presented with a dialog where you can create a server admin. Read up on this.

gurwcam-createserveradmin

Now, log out and then try to create a database:

gurwcam-createdbnotallowed

Nice! At least somewhat more secure. You should also note that CouchDb has support for activating SSL.

Login to Futon again and create a database called "test". And the press the Security icon. You should be presented with:

gurwcam-dbsecurity

Here you can specify specific users or roles by providing either user names or roles for:

  • database admins (NOT server admins).
  • members

The user names and roles should match credentials for users stored in the _users database (open). In there the server admin user already exists. Lets create two new users. One that will be database admin for our test database and one that will be a simple member of it.

gurwcam-createdba

{
   "_id": "org.couchdb.user:danielwertheim1",
   "name": "danielwertheim1",
   "password": "testtest1",
   "roles": ["dba"],
   "type": "user"
}

gurwcam-createmember

{
   "_id": "org.couchdb.user:danielwertheim2",
   "name": "danielwertheim2",
   "password": "testtest2",
   "roles": [],
   "type": "user"
}

The passwords will NOT be stored in clear text, but will be encrypted upon saved. Also note that the first user was assigned a role dba while the second user wasn’t assigned any roles.

Now, head back to the test database and click the security icon and fill in the form:

gurwcam-securethedb

Voila! It’s now secured. But again, you really should look into the link above regarding SSL, since CouchDb uses HTTP as transport protocol, hence credentials will be transmitted unsecured.

HTTP you say

Yes, CouchDb uses HTTP as its transport protocol and it is REST based, hence you could easily interact with it using any HTTP enabled client. My favorite tool for this is: Postman; which is a simple extension to Google Chrome.

Let me show you have to insert a simple document representing an artist using Postman. The JSON for the document looks like this:

{
    "_id":"1",
    "$doctype":"artist",
    "name": "Super duper artist",
    "albums":[
        {"name": "Super duper album #1", "tracks": 12},
        {"name": "Super duper album #2", "tracks": 10}
    ]
}

_id is a CouchDb reserved field that holds the id of the document. I’ve also added a field $doctype which is not a CouchDb field, but something I use to indicate what kind of entity the document represents. This is useful when it comes to authoring views (as we will see later on). Some people include this info in the _id: "_id":"artist:1"; but that will increase the _id size which will have negative effects on sizes of your views etc., since they always outputs the _id.

Time to try it out. Execute a POST against http://localhost:5984/test/ where test addresses the database we created before.

gurwcam-postmanunauthpost

Notice the response. We secured the database hence we need to provide credentials for an user that is a member. I will use basic authentication for this.

gurwcam-postmanbasicauth

Now if we retry the POST it shall succeed.

gurwcam-postmanauthpost

Pay attention to the response. It’s an 201 and it has the _id but also a new reserved CouchDb field, the _rev field. That field will be assigned a new unique value every time the document is changed. And that is how CouchDb handles concurrency checking etc. To update or delete a document, you need to provide the latest matching _rev. If not, CouchDb will not accept your update and will note you about it. Lets try it. Lets do a PUT to add a new album to the previously added artist. Execute a PUT against the following address:

http://localhost:5984/test/1?rev=1-6474d7cfb6040bdc24d697d013c9e868

And provide the following document:

{
    "_id":"1",
    "$doctype":"artist",
    "name": "Super duper artist",
    "albums":[
        {"name": "Super duper album #1", "tracks": 12},
        {"name": "Super duper album #2", "tracks": 10},
        {"name": "Super duper album #3", "tracks": 2}
    ]
}

Note that the address now points to the document with _id being 1 in the database having the name test. We also include the current _rev, and then a new document. Since CouchDb does not work with partial document updates, we have to provide the complete document. The response is an 201 with the following content:

{
    "ok": true,
    "id": "1",
    "rev": "2-8b8e0ed734dad8f7175bde5e6bff1161"
}

There are solutions you can implement to work around the fact that CouchDb updates the whole document. Read about "Update handlers".

If you try and execute the exact same request again, your request will not be accepted. This, since you are not using the latest known _rev. The response will be and 409 Conflict with the following content:

{
    "error": "conflict",
    "reason": "Document update conflict."
}

If you would like to get the document, just execute a GET using the following address: http://localhost:5984/test/1; and you will get the document in response.

gurwcam-postmanget

Views

Lets take a crash course in CouchDb views. For more in depth material I recommend you to go through these resources:

While you were fetching a coffee or reading some Twitter feeds our Facebooking, I took the liberty to add two more artists with some more albums so the database now contains a total of three artists. Now, lets look at them using a view. Before we create our own view, lets use a built in system view called all_docs. Execute a GET against the following address: http://localhost:5984/test/_all_docs. The response will be something like this:

{
    "total_rows": 3,
    "offset": 0,
    "rows": [
        {
            "id": "1",
            "key": "1",
            "value": {
                "rev": "2-8189ea9db5d99febbdff578aaa15f08d"
            }
        },
        {
            "id": "2",
            "key": "2",
            "value": {
                "rev": "1-9dbcefdd8a022731f10e693d7327d3dc"
            }
        },
        {
            "id": "3",
            "key": "3",
            "value": {
                "rev": "1-8efdf5f4c63b3f23ad48250450f13f4f"
            }
        }
    ]
}

There’s a special flag you can include to force the view to also load in the docs with the result. Do the same request and this time append include_docs=true in the query string.

{
    "total_rows": 3,
    "offset": 0,
    "rows": [
        {
            "id": "1",
            "key": "1",
            "value": {
                "rev": "2-8189ea9db5d99febbdff578aaa15f08d"
            },
            "doc": {
                "_id": "1",
                "_rev": "2-8189ea9db5d99febbdff578aaa15f08d",
                "$doctype": "artist",
                "name": "Super duper artist",
                "albums": [
                    {
                        "name": "Super duper album #1",
                        "tracks": 12
                    },
                    {
                        "name": "Super duper album #2",
                        "tracks": 10
                    },
                    {
                        "name": "Super duper album #3",
                        "tracks": 2
                    }
                ]
            }
        },
        {
            "id": "2",
            "key": "2",
            "value": {
                "rev": "1-9dbcefdd8a022731f10e693d7327d3dc"
            },
            "doc": {
                "_id": "2",
                "_rev": "1-9dbcefdd8a022731f10e693d7327d3dc",
                "$doctype": "artist",
                "name": "Super bad artist",
                "albums": [
                    {
                        "name": "Super bad album #1",
                        "tracks": 77
                    }
                ]
            }
        },
        {
            "id": "3",
            "key": "3",
            "value": {
                "rev": "1-8efdf5f4c63b3f23ad48250450f13f4f"
            },
            "doc": {
                "_id": "3",
                "_rev": "1-8efdf5f4c63b3f23ad48250450f13f4f",
                "$doctype": "artist",
                "name": "Awesome artist",
                "albums": [
                    {
                        "name": "Awesome album #1",
                        "tracks": 3
                    },
                    {
                        "name": "Awesome album #2",
                        "tracks": 10
                    }
                ]
            }
        }
    ]
}

You can also access this view via Futon. Just open the following URL in your browser: http://localhost:5984/_utils/database.html?test/_all_docs

gurwcam-futonalldocs

Create a custom view

Lets have a look at creating a simple custom view that only will output the albums of an artist. You can do this using Futon. Open: http://localhost:5984/_utils/database.html?test/_temp_view and type in the following map function.

function(doc) {
  if(doc.$doctype !== 'artist')
    return;

  emit(doc.name, doc.albums);
}

gurwcam-futonnewview

If I chose to run the view, I get presented by the view result. Have in mind though that running a temporary view will traverse all your documents. Doing this is resource intensive, hence only suitable for development environments. For production use, you should store your views in a design document (a design document can hold many views).

Lets say we are happy with this view and want to "publish" it by making it part of a design document. Then all we have to do is to press the Save As....

gurwcam-futonsaveview

The view is now saved as part of the design document but yet not "materialized". The view will be materialized the first time it is queried. This goes for all the vies in the same document:

To change a view or multiple view just alter the design document they are stored in and save it as a new revision. This causes all the views in that design document to be rebuilt on the next access in case the view code has been changed – http://wiki.apache.org/couchdb/HTTPviewAPI

Some other readings in the subject of design documents and views:

Consume the view

Lets just query the view as it is, without limiting the result. Execute a GET against http://localhost:5984/test/_design/artists_albums/_view/albums_by_artist

{
    "total_rows": 3,
    "offset": 0,
    "rows": [
        {
            "id": "3",
            "key": "Awesome artist",
            "value": [
                {
                    "name": "Awesome album #1",
                    "tracks": 3
                },
                {
                    "name": "Awesome album #2",
                    "tracks": 10
                }
            ]
        },
        {
            "id": "2",
            "key": "Super bad artist",
            "value": [
                {
                    "name": "Super bad album #1",
                    "tracks": 77
                }
            ]
        },
        {
            "id": "1",
            "key": "Super duper artist",
            "value": [
                {
                    "name": "Super duper album #1",
                    "tracks": 12
                },
                {
                    "name": "Super duper album #2",
                    "tracks": 10
                },
                {
                    "name": "Super duper album #3",
                    "tracks": 2
                }
            ]
        }
    ]
}

CouchDb lets you "filter" your views in a variety of ways (http://wiki.apache.org/couchdb/HTTPviewAPI). Lets do a quick test. Lets just target one row. The row which has a key being "Super bad artist". Easy, just append key to the query string and provide an URL encoded value. Execute a GET against http://localhost:5984/test/_design/artists_albums/_view/albums_by_artist?key="Super bad artist"

{
    "total_rows": 3,
    "offset": 1,
    "rows": [
        {
            "id": "2",
            "key": "Super bad artist",
            "value": [
                {
                    "name": "Super bad album #1",
                    "tracks": 77
                }
            ]
        }
    ]
}

That was a brief tour on views. And even though we only scratched the surface of views and only used map functions, I hope it was enough for this introductory post, and let us now switch over to using the C# driver.

Introducing MyCouch

MyCouch is a simple async, open sourced, CoucDb client written in C#. It builds on top of the async HttpClient and lets you work with native JSON and/or entities/POCOs. With that said, remember, CouchDb uses HTTP as a protocol, hence if you find something missing or not acting as you want it, you could easily acomplish what you want using pure http-requests. MyCouch tries to keep the domain language of CouchDb and not hiding or abstracting away the fact that it is http based.

Install it

Lets have a quick look at how we would accomplish some of the steps we have done above. For simplicity I have just created a simple Console application using Visual Studio 2012 (there is a sample console app in the GitHub repository). MyCouch is distributed via NuGet. So to install it, open the Package Manager Console and invoke:

PM> Install-Package MyCouch

Now, lets start typing some code. The first task is to add yet another artist. What you use to communicate with CouchDb in MyCouch, is an implementation of MyCouch.IClient. There’s one, MyCouch.Client which accepts either a custom connection or a simple URL/Uri pointing to your database (read more).

static void Main(string[] args)
{
    using (var db = new MyCouch.Client("http://danielwertheim1:testtest1@localhost:5984/test"))
    {
        //Consume client here
    }
}

Since we have secured our database, I somehow need to pass the credentials. Here, I have chosen to do it using Basic http authentication (read more).

Documents

MyCouch lets you work with native JSON documents or entities/POCOs (read more) and you can either run it synchronously or asynchronously (read more). Lets start with inserting an artist using JSON. Inserting is done using a POST or PUT (according to docs, PUT is recommended).

var artistJson = "{\"_id\": \"4\", 
                   \"$doctype\": \"artist\",
                   \"name\": \"Fake artist 1\", 
                   \"albums\":[{\"name\": \"Greatest fakes #1\", \"tracks\": 3}]}";

var response = db.Documents.Post(artistJson);

//for async: db.Documents.PostAsync(artistJson);

Console.Write(response.GenerateToStringDebugVersion());

Every command in MyCouch will provide you with a response object in turn. The response will look a bit differently depending on what command you executed (read more).

RequestUri: http://localhost:5984/test/
RequestMethod: POST
Status: Created(201)
Error:<NULL>
Reason: <NULL>
Id: 4
Rev: 1-2cc0876b59cadf2e0271f03a7ea54c5c

To update the artist, lets say by adding a new album, you invoke a PUT:

var artistJson = "{\"_id\": \"test:4\", 
                   \"$doctype\": \"artist\",
                   \"name\": \"Fake artist 1\", 
                   \"albums\":[
                       {\"name\": \"Greatest fakes #1\", \"tracks\": 3},
                       {\"name\": \"Greatest fakes #2\", \"tracks\": 6}]}";

var response = db.Documents.Put("5", "1-2cc0876b59cadf2e0271f03a7ea54c5c", artistJson);

Console.Write(response.GenerateToStringDebugVersion());

The response:

RequestUri: http://localhost:5984/test/4?rev=1-2cc0876b59cadf2e0271f03a7ea54c5c
RequestMethod: PUT
Status: Created(201)
Error:<NULL>
Reason: <NULL>
Id: 4
Rev: 2-9d756cc83e3937f953648032ef8629bb

How about conflicts

Well, lets try it. Invoke the exact same PUT as we just did and lets inspect parts of the response:

Status: Conflict(409)
Error:conflict
Reason: Document update conflict.

Entities

Before looking at using views to query data, lets have a quick look at the Entity API. To insert a new artist like we did above, you would first need a model (read more about member names for _id and _rev etc).

public class Artist
{
    public string ArtistId { get; set; }
    public string ArtistRev { get; set; }

    public string Name { get; set; }
    public Album[] Albums { get; set; }
}

public class Album
{
    public string Name { get; set; }
    public int Tracks { get; set; }
}

Now, you could just use it to work with entities, e.g doing the same as we did with JSON previously.

var artist = new Artist
{
    ArtistId = "5",
    Name = "Foo bar",
    Albums = new[]
    {
        new Album { Name = "Hello world!", Tracks = 9 }
    }
};

var response = db.Entities.Post(artist);

Console.Write(response.GenerateToStringDebugVersion());

The response:

RequestUri: http://localhost:5984/test/
RequestMethod: POST
Status: Created(201)
Error:<NULL>
Reason: <NULL>
Id: 5
Rev: 1-725e193084247fcf7cc84e4cfa179746
IsEmpty: False
Model: Artist

In the JSON sample that we saw previously, we explicitly specified a meta field $doctype. When you work with entities, MyCouch will handle this for you, and it will be default use the name of the class as the value for $doctype. When working with entities, MyCouch will, on successful operation, extract e.g the _rev from the response and assign it to the entity, which also is returned via the response property: response.Entity.

Views

Apart from using MyCouch to perform simple and fast GET by id requests, you can also use it to query views (read more). There are different ways of doing this, but I will only show one way. But do not fear, it’s documented. But before querying the view, lets look at how we could create the view we previously created using Futon, but this time using MyCouch.

Creating a view

A view is, as previously stated, part of a design document which is a plain document; hence you can create one by just inserting a document. Lets have a look. First lets define it:

public static class Views
{
    public const string ArtistsAlbums =
        "{" +
            "\"_id\": \"_design/artists_albums\"," +
            "\"language\": \"javascript\"," +
            "\"views\": {" +
                "\"albums_by_artist\": {" +
                    "\"map\": \"function(doc) {  if(doc.$doctype !== 'artist') return;  emit(doc.name, doc.albums);}\"" +
                "}" +
            "}" +
        "}";
}

Now we could just use the client and POST or PUT it into the db:

db.Documents.Post(Views.ArtistsAlbums);

Remember, that you will get a conflict error if you didn’t delete the view we created before.

Query the view

Now that the view is in place, lets consume it. Views are accessed via the View API, hence using db.Views. Queries can be executed in two ways, via: RunQuery or Query. The difference in short:

Query<>() method will create a query and run the query for you using RunQuery. RunQuery lets you work freely with any IViewQuery, so that you could run the same query against multiple databases.

You can use a builtin chained-configure API to configure your queries, or directly assign values to query.Options. I’ll use the former:

gurwcam-mycouchrunquery

As you can see, there are a bunch of query capabilities and you can read about them in the documentation. I’ll go for a simple Key filter.

var query = new MyCouch.Querying.ViewQuery("artists_albums", "albums_by_artist");
query.Configure(cfg => cfg.Key("Fake artist 1"));

var response = db.Views.RunQuery<Album[]>(query);

Console.Write(response.GenerateToStringDebugVersion());

The response:

RequestUri: http://localhost:5984/test/_design/artists_albums/_view/albums_by_artist?key="Fake artist 1"
RequestMethod: GET
Status: OK(200)
Error:<NULL>
Reason: <NULL>
IsEmpty: False
TotalRows: 5
RowCount:1
Offset: 1

So you get some meta data that could be used for paging etc, but where are the albums? They are in the response.Rows.

gurwcam-mycouchentityqueryresult

Of course you could get the result as JSON if you want:

gurwcam-mycouchjsonqueryresult

Or as a JSON-array:

gurwcam-mycouchjsonarrayqueryresult

Summary

That’s it for now. I hope it wasn’t to overwhelming. I do realize I probably left you with a bunch of question but I hope you could find some of the answers in the project wiki. If not, just drop a comment here or ping me on Twitter: @danielwertheim or tweet with the hashtag #mycouch.

Cheers,

//Daniel

Introducing MyCouch – a Simple async CouchDb client for .Net

Just got a new NuGet out – “MyCouch – a Simple async CouchDb client for .Net” – which uses the async HTTP-client to interact with the HTTP API of CouchDb. It tries to mimic the domain-language and structure of CouchDb and lets you work with pure JSON or using entities/POCOs. The future will bring builtin support for caching, proxies/sharding, etc. Go grab it now and try it out. It’s on NuGet. Documentation is improved continuously.

It’s simple to get started with

install-package mycouch

using (var client = new Client("http://127.0.0.1:5984/test"))
{
    //Consume here e.g
    var response1 = client.Documents.Post(json);

    //or using the async API
    var response2 = client.Documents.PostAsync(json);

    //Simple get
    var getResponseOfJson = client.Documents.Get("someid");
    var json = getResponseOfJson.Content;

    //or using an entity
    var getResponseOfEntity = client.Documents.Get<MyEntity>("someid");

    //or querying using views
    var viewResponse = client.Views.Query<MyEntity[]>(q => q
        .StartKey("foo")
        .EndKey("bar")
        .Skip(2)
        .Limit(5);

    foreach (var myEntity in viewResponse.Rows) {
        //Do something with your entity.
    }
}

Every action will give you a response in turn. The response gives you information about state etc:

{
    RequestUri: http://127.0.0.1:5984/test/1
    RequestMethod: GET
    Status: OK(200)
    Error:<NULL>
    Reason: <NULL>
    Id: 1
    Rev: 34-e0813c9f99766efcf679468adb02c6ce
    Content: {"_id":"1","_rev":"34-e0813c9f99766efcf679468adb02c6ce","$doctype":"artist","name":"Fake artist 1","albums":[{"name":"Greatest fakes #1"}]}
}

Well that’s it for now. More info will come.

//Daniel

SisoDb – Using DbPipe to add compression

In v16.1.0 of SisoDb I’ve added a new interface SisoDb.IDbPipe which lets you hook into the process of writing and reading the string-data representing a document (default JSON) to and from the DB. You could use that for encryption, compression, ….? Now I will just show you how to add compression.

Lets implement a simple deflate compression using System.IO.Compression.


public class DeflatePipe : IDbPipe
{
    public string Writing(IStructureSchema structureSchema, string data)
    {
        return Compress(data);
    }

    public string Reading(IStructureSchema structureSchema, string data)
    {
        return Decompress(data);
    }

    private static string Compress(string j)
    {
        using (var source = new MemoryStream(Encoding.UTF8.GetBytes(j)))
        {
            source.Position = 0;
            using (var trg = new MemoryStream())
            {
                using (var c = new DeflateStream(trg, CompressionMode.Compress, false))
                {
                    source.CopyTo(c);
                }
                return Convert.ToBase64String(trg.ToArray());
            }
        }
    }

    private static string Decompress(string j)
    {
        using (var source = new MemoryStream(Convert.FromBase64String(j)))
        {
            source.Position = 0;
            using (var trg = new MemoryStream())
            {
                using (var c = new DeflateStream(source, CompressionMode.Decompress, false))
                {
                    c.CopyTo(trg);
                }
                return Encoding.UTF8.GetString(trg.ToArray());
            }
        }
    }
}

I don’t make use of the IStructureSchema but could easily use it to just use compression for a certain document by inspecting:


if(structureSchema.Name == "Customer") {}

or

if(structureSchema.Type.Type == typeof(Customer)) {}

Now lets hook the pipe in:

db.Pipe = new DeflatePipe();

I did one for LZ4 compression as well and lets just have a look at the result.


use [SisoDb.SampleApp]

---5000 items

-- Raw
--select sum(DATALENGTH(Json)) from dbo.CustomerStructure
--4883358 bytes

-- Deflate
--select sum(DATALENGTH(Json)) from dbo.CustomerStructure
--3678256 bytes

-- Lz4
--select sum(DATALENGTH(Json)) from dbo.CustomerStructure
--4845176 bytes

-- Diff
select 
	(4883358 - 3678256) TotalSaveDeflate, --1205102
	(4883358 - 3678256) / 5000 PerItemDeflate, --241
	(4883358 - 4845176) TotalSaveLZ4, --38182
	(4883358 - 4845176) / 5000 PerItemLZ4 --7

You should really evaluate both algorithm and the document type to see if you save anything.

//Daniel

SisoDb – Cacheable queries

This weekend I just release SisoDb v16.1.0 and about a week before that v16.0.0. I thought we might take a look at some new features with regards to caching. From version 16.0.0 of SisoDb you can not only make use of a cache provider for GetById, GetByIds but also for queries. First you need to enable caching. You can easily write one on your own, by implementing SisoDb.Caching.ICacheProvider and SisoDb.Caching.ICache. But for this post we will use an existing one, provided via NuGet.

PM > Install-package SisoDb.MsMemoryCache

Now you can enable it to you ISisoDatabase instance.

db.CacheProvider = new MsMemCacheProvider ();

By default caching is of. You could either activate it per entity/structure or for all:

db.CacheProvider.EnableFor();

or

db.CacheProvider.AutoEnable = true;

This is something you would do while configuring your singleton instance of your ISisoDatabase e.g while bootstraping your app and configuring your IoC.

Now, when the cache is enabled you get implicit cache for:

session.GetById | GetByIds | GetByQuery

but you can also make specific queries cacheable:


session.Query()
       .Where(i => i.Score > minScore)
       .OrderBy(i => i.Score)
       .Cacheable();

That’s it! If you now execute this query two times and have the same value on minScore, it will only hit the Db the first time.

//Daniel

Entity framework, Really do drop create database if model changes and Db is in use

This week I saw a tweet from Julie Lerman (@julielerman) which caught my interest:

tweet

The error report found here, says:

Once you have expanded a database in VS 2012 SQL Server Object Explorer, it is not possible to delete or detach it. This is a big problem for Entity Framework Code First database initialization when code first needs to drop and recreate a database. I have seen this consistently with (localdb)/v11.0 databases. In SSMS, you can solve this problem by expanding tables in a different database. But in SSOE, this won’t work. I have also disconnected the server instance and then reconnected it but the database is still “in use” and cannot be detached or deleted.

I mainly work in Management Studio when I work with SQL Server and the same behavior goes there as well. If you change your model and use the initializer DropCreateDatabaseIfModelChanges, you will be presented with:

SqlException

Since I don’t have these issues with SisoDb that I develop, I thought, “there has to be a good workaround”. Well, if it’s good or not I don’t know, but here it goes. I just had a quick look at Codeplex at the source code for DropCreateDatabaseIfModelChanges and made my own initializer that works.


public class ReallyDoDropCreateDatabaseIfModelChanges 
    : IDatabaseInitializer where TContext : DbContext
{
    protected const string Sql =
        "if (select DB_ID('{0}')) is not null\r\n"
        + "begin\r\n"
        + "alter database [{0}] set offline with rollback immediate;\r\n"
        + "alter database [{0}] set online;\r\n"
        + "drop database [{0}];\r\n"
        + "end";

    public virtual void InitializeDatabase(TContext context)
    {
        if (DbExists(context))
        {
            if (context.Database.CompatibleWithModel(false))
                return;

            DropDatabase(context);
        }

        context.Database.Create();
        Seed(context);
        context.SaveChanges();
    }

    protected virtual bool DbExists(TContext context)
    {
        using (new TransactionScope(TransactionScopeOption.Suppress))
        {
            return context.Database.Exists();
        }
    }

    protected virtual void DropDatabase(TContext context)
    {
        context.Database.ExecuteSqlCommand(
            string.Format(Sql, context.Database.Connection.Database));
    }

    protected virtual void Seed(TContext context) { }
}

This will work even if I have an query window against my db in Management studio. Of course it get’s a bit slower if the db is in use and the model has been updated, but it beats manual workarounds.

This was tested with Entity framework 5.0, SQL Server 2012

//Daniel

SisoDb v16.0.0 is out

I’ve been working on some changes lately on SisoDb. Mostly it’s been about cleaning the code base up but also bring in support for spatial data as well as cachable queries. There’s a bunch of changes and I will cover some of the here soon. You can have a check at the release notes here: http://sisodb.com/wiki/release-notes

While typing this I’m currently running an app that uses these new features, against a SQL Azure database. Once you have deployed the DB, SisoDb seems to work fine. Not tested all the features though.

//Daniel

Don’t force your data-persistence solution to fit every needs

The N+1 is evil. I want referential integrity. I want…

“Hold it right there Mr!” A document-oriented Db (DOD) is not an RDBMS! If you are having an N+1 scenario or feeling a pain with references, you should probably first have a look at your model and your chosen data-persistence solution (DPS). Is it perhaps so that you shouldn’t be using a DOD for the scenario you are having issues with?

The DPS are designed for specific use-cases and it’s likely so, that if you find your self in a somewhat complex domain; you should have more than one DPS, instead of forcing one to fit in every context.

Vendors can bring all kinds of help to the table, generating proxies around your POCOs and let the DPS: track changes, manage references and automatically pull stuff in from the DPS into cache; All kinds of magical stuff could be done by the DPS. But is this really the way to go for a DOD?

If you are experiencing e.g an N+1 scenario as a result of you iterating over a bunch of documents referencing other documents. Could it perhaps be so that you should instead have designed a specific document containing all data for that particular view? Sure, that would mean you would take parts of documents and duplicate the storage of it, but you will get a view representing a certain context. This would benefit read scenarios whilst you will get harder to administer the write scenarios, and if that gets a pain (a real pain), you should start thinking of using another DPS that fits your needs for that use-case.

Just some rambling, nothing scientific. Agree or don’t agree.

//Daniel

SisoDb – Simple tip on how to write more performant queries

Time for a really quick tip when it comes to queries in SisoDb that will give you better performance. When firing of a query with lots of parameters in your expression, move the logic into your model instead. That way the expression only has to lookup one index-value for your structure, and it will probably also state what you are doing. Example

session
    .Query<WorkOrder>()
    .Where(w => w.Status == WoStatus.Done || w.Status == WoStatus.Trashed);

should be

session
    .Query<WorkOrder>()
    .Where(w => w.IsClosed);

This was just a super simple example, but really. Try and get it down to one param. You will gain performance.

//Daniel

Writing own libs for SisoDb – SisoDb.Spatials

I’m in the process of finalizing a v16.0.0 of SisoDb and there’s a first pre-release out on NuGet (SQLCE4, SQL2005, SQL2008, SQL2012). I’m will soon get a pre-release 2 out which have one simple change in code but it allows you to write third party libs to SisoDb. Almost anything can allready be swapped out for your implementations. But what if you just want to provide some new functions via a NuGet? Lets say I would like to create SisoDb.Spatial and provide some simple helpers for spatial data. You should then extend the ISession and make use of the Session.ExecutionContext. It exposes all you need via the Session, but more importantly, exposes two Try methods that you should wrap your methods within. Why? Well SisoDb uses Auto-commit and rollback on exception, but to be sure states are updated correctly, the exception should be caught by that specific try methods.

Lets look at an example. First I extend ISession and pass along the ExecutionContext (available from v16.0.0-pre2)

public static class SessionExtensions
{
    public static ISisoSpatials Spatials(this ISession session)
    {
        return new SqlServerSisoSpatials(session.ExecutionContext);
    }
}

Lets define some operations

public interface ISisoSpatials 
{
    void UpsertFor<T>() where T : class;
    void DropFor<T>() where T : class;
    bool PointExistsInPolygonFor<T>(object id, Tuple<double, double> coords, int srid = SpatialReferenceId.Wsg84) where T : class;
    void DeleteGeoIn<T>(object id) where T : class;
    void InsertPolygonTo<T>(object id, Tuple<double, double>[] coords, int srid = SpatialReferenceId.Wsg84) where T : class;
    void UpdatePolygonIn<T>(object id, Tuple<double, double>[] coords, int srid = SpatialReferenceId.Wsg84) where T : class;
    void SetPolygonIn<T>(object id, Tuple<double, double>[] coords, int srid = SpatialReferenceId.Wsg84) where T : class;
}

Now just implement it and make use of the API exposed by the ExecutionContext.

public class SqlServerSisoSpatials : ISisoSpatials
{
    protected readonly ISessionExecutionContext ExecutionContext;
    protected ISession Session { get { return ExecutionContext.Session; } }
    protected readonly ISqlStatements SqlStatements;

    protected internal SqlServerSisoSpatials(ISessionExecutionContext executionContext)
    {
        ExecutionContext = executionContext;
        SqlStatements = SpatialSqlStatements.Instance;
    }

    public virtual void SetPolygonIn<T>(object id, Tuple<double, double>[] coords, int srid = SpatialReferenceId.Wsg84) where T : class
    {
        ExecutionContext.Try(() =>
        {
            var schema = Session.GetStructureSchema<T>();
            var sidParam = CreateStructureIdParam<T>(id);
            var geoParam = CreatePolygonParam(coords, srid);
            var sql = SqlStatements.GetSql("SetGeo").Inject(GenerateTableName(schema));
            Session.DbClient.ExecuteNonQuery(sql, sidParam, geoParam);
        });
    }

    //...
    //Some stuffe left out.
    //...

    protected virtual string GenerateTableName(IStructureSchema schema)
    {
        return string.Concat(DbSchemaNamingPolicy.GenerateFor(schema.Name), "Spatials");
    }

    protected virtual DacParameter CreateStructureIdParam<T>(object id) where T : class
    {
        return new DacParameter("id", id);
    }

    protected virtual GeographyDacParameter CreatePointParam(Tuple<double, double> coords, int srid)
    {
        return new GeographyDacParameter("geo", SqlGeography.Point(coords.Item1, coords.Item2, srid));
    }
}

That’s pretty much it.

//Daniel

MongoDb for .Net folks – a quick theoretical peak what schema-less means

Now it’s time to have a quick peak on this whole “schema-less” thing that NoSQL products offer. We will of course focus on MongoDB when we compare a document-oriented DB against a more traditional, relational DB commonly found in RDBMS like SQL Server.

Previous articles are:

Schema, MongoDB vs RDBMS

You are probably familiar with some of the parts building up a schema in a relational database of a RDBMS, like SQL Server. Lets just see how the terminology maps to MongoDB world:

Building blocks, RDBMS vs MongoDb

Table --> Collection
Row --> Document
Column --> Field
Join --> Embedded document
Join --> Link/Reference
Index --> Index

Table vs Collection

The Collection is a named bag where you put documents in. It’s created on the fly (upon first use). Collections are schema-less, and being schema-less, there’s nothing keeping you from mixing entities in one certain collection. You could have a collection named Cars and let it hold: Car, Driver, Fruit, ... with completely different members in their documents:

Car: {registrationNo: 'ABC123', make: 'VOLVO', modell: 'XC90'}

vs

Driver: {socSecNo: 12345, firstname: 'Daniel', lastname: 'Wertheim', license: 'AB'}

VS

Fruit: {type: 'APPLE', name: 'Royal Gala', color: 'Red', season: 'Autumn'} 

So in MongoDB there’s NO underlying scheme stating: this collection has the fields: registrationNo, make, modell, socSecNo, firstname, lastname, license, type, name, color, season, with information that e.g ‘firstname’ is ‘nvarchar(50) NULL’. You can of course try and mimic this behavior in a RDBMS, by having key-value tables, like I have in SisoDb.

Schema-less, does that mean I don’t need to care about schemas?

No it doesn’t! In theory, perhaps, but I would say NO! The way you design your documents should be with regards taken to how you will consume them. What is of most importance for the current context? Write- or read performance? Are you working in a statically typed language like C# and you wan’t to have pre-made classes defining your storage model? If, then mixing Cars and Fruits in the same collecton will enforce you to have some sort of application logic when you query the collection. Will document D1 be a Car or a Fruit?

Since you are writing the next “killer app”, you will probably have lots of data. You will then have to select an index for documents in the collection and most likely also a sharding-key. Having documents with different schema in the same collection could cause you trouble, since the sharding-key needs to be in both.

I also thing you should be thinking of maintenance. There’s no schema that you can inspect. So what if your collection would contain documents with very different contracts? The person maintaing your app might not realize this if he just inspects some documents in a certain collection.

Row vs Document & Join vs Embedded document

It’s probably easy to think of both row and document, as representations of an instance of one certain entity. In a traditional normalized DB in a RDBMS, this would only hold true for dead simple entities that has no relations; that is, all data for the entity is contained in one row. Let say we have the entity Order, where one Order can contain many OrderLine instances. In a relational DB this would be expressed by an Order- and an OrderLine-table, with a relation between them, where the OrderLine-table would have a foreign-key to a certain row in the Order-table. In a document oriented DB you would most likely embedd the Order-lines within the Order; hence we would store the data seeing the Order as an aggregate containing Order lines. So in MongoDB, you wouldn’t have to perform a join to build up the complete result of one certain Order.

Relational DB   
Order {#Id, Date, CustomerNo, ...}
--<1:N> --
OrderLine {#OrderId, #ArticleNo, Quantity, Price, Tax }

Document DB:
Order {#_id, date, customerNo, ..., lines: [
    {articleNo, quantity, prixe, tax}, 
    {articleNo, quantity, prixe, tax}]}

So, lets say you need to get all orders that contains a certain article number. When your query will hit the database it will look something like this in a relational database:

select [columns of interest]
from dbo.Order as O
inner join dbo.OrderLine as Ol
    on Ol.OrderId = O.Id
where Ol.ArticleNo = @articleNo;

In MongoDB this would look like:

db.orders.find({'lines.articleNo'=articleNo})

So, what if we would add a Customer into our model? Please, bear with me. I’ll elaborate on this below, under: “Join vs Link/Reference“.

Index vs Index

The concepts of indexes exists in MongoDB as well. The query above could be made faster by applying an index on articleNo:

db.ensureIndex({'lines.articleNo': 1})

We will look into indexing in a future articles. For now, it’s enough to realize indexes exists in MongoDB as well and affects the design of your documents. Also note the difference compared to an RDBMS. You would have to ensure indexes exists for getting an effective join as well as an index on the ArticleNo.

Join vs Link/Reference

Above we dealt a bit if “Join vs Embed document”, which looked a bit of how documents can use embedding to create aggregates and thereby remove the requirements of joins. If you are coming from a RDBMS world you are probably twisting in your chair right now. Thinking about: duplication of data, referencial constraints, etc. You can of course separate data and stor them in different documents and include _id values that you would use to perform application level links/references. So now you have two techniques: embedding vs linking.

When to embed and when to link?

As always: “it depends on usage of the data”. I would say that data in a document-oriented world is stored in a use-case centric way rather then as normalized grid of small data units. A traditional example of this is. If we have a blog-post which in turn could have comments, which in turn could have replies, you could store this as one big document holding blog post information as well as a tree of comments:

blogpost: {
    _id: ..., title: ..., content: ..., ts: ..., author: ...,
    comments: [
        {title: ..., ts: ..., author: ..., comment: ...},
        {title: ..., ts: ..., author: ..., comment: ...,
            replies:[
                {title: ..., ts: ..., author: ...},
                {title: ..., ts: ..., author: ...}
            ]
        }
    ]
}

If you tend to always query and display a single blog-post as a whole unit this could be fine. But what if you get a massive amount of comments? You might hit the document size limit of 16Mb. Alternatives? Linking. You could store the blog-post id in a big document containing all the comments:

blogpost: {
    _id: 'fooId', title: ..., content: ..., ts: ..., author: ...}

comments: {_id: ..., blogpostId: 'fooId', replies: [
        {title: ..., ts: ..., author: ..., comment: ...},
        {title: ..., ts: ..., author: ..., comment: ...}]
}

But again, you might hit the size limit of this new comments document. Or perhaps you want to be able to work with individual comments. So, split it apart.

blogpost: {
    _id: 'fooId', title: ..., content: ..., ts: ..., author: ...,
    comments: ['commentId1', 'commentId2', ...]
}

comment: {_id: 'commentId1', blogpostId: 'fooId'}
comment: {_id: 'commentId2', blogpostId: 'fooId'}

Since we have introduced _ids on both sides we can navigate in both directions. We can work with the blog-post and we can work with single comments as well as list all comments by blog-post id. But, but, but! Again. Will this suite your needs? In this case perhaps. But in similar models the extra queries needed to build up the same “blog-post view” with the post and all comments might hurt your performance. The design should always be context specific. You might also have situations were you actually need to mix, chosing a relational DB and a document-oriented DB? You could also duplicate data and create views specific to certain use-cases. One view holding parts of the blog-post, like: title, ts, top10comments.

A more clear example would be to continue the Order sample. We would probably have a document called Customer; which wouldn’t be embeded in the Order, but parts of the customer would. At least one unique value, so that you could link to the Customer in your application.

order: {_id: ..., customerId: ...m customerNo: ..., ...}

Summary

We have had a quick look at the concept of being “schema-less”. Using MongoDB as an example, I would say, your applications still has a conceptual schema, but your database has no schema enforcing conformity in your collections/sets. And I would like to end this with a traditional quote:

“With great powers comes great responsibility”

That’s all. Lets hope I’ve time to make a more “non theoritical” entry about this soon.