What about tweaking in SisoDb?

This could have been the shortest post I ever have written, since the question could be answered with a simple one liner:

There are no tweaks

The whole point with SisoDb is making it simple and performant out of the box. It will never be targetting multitudes of different scenarios craving needs for adoptions to perform in each scenario.

With that said, there are things you should take in to thoughts.

Effectively work with references between documents

A document/structur has no relations. Everything included in it will get serialized and stored. The root is indicated by adding this one member public Guid|int SisoId { get; set;} to it, saying this is a document and I want to be able to store it. You can of course let your document contain other complex types/classes both with or without the SisoId member. But there’s a huge difference.

Given an Order:

public class Order
{
	public Guid SisoId { get; set; }
}

If I add some simple attributes, it’s easy to understand that these will belong to the document.

public class Order
{
	public Guid SisoId { get; set; }
	
	public string OrderNo { get; set; }
	
	public DateTime CreatedAt { get; set; }
	
	public DateTime? ShippedAt { get; set; }
}

But what if I want a Customer linked to my Order? First, look at the Order as a pile of documents with the title “Order”. When writing a document and you want to reference someone elses writings, you put a reference to it by providing information where to find it, using e.g footnotes. The same thing goes here. Include CustomerId in the Order.

public class Order
{
	...
	public Guid CustomerId { get; set; }
	...	
}

public class Customer
{
	public Guid SisoId { get; set; }
	
	public string CustomerNo { get; set; }
	...
	...
}

Now you have made a connection saying: “My Order documents can point to Customer documents”. What If I want to have a Customer instance in the Order, since I don’t always might want to fire of new queries to fetch a Customer for a certain Order. Easy, just add a property for the Customer.

public class Order
{
	...
	public Guid CustomerId
	{
		get { return Customer.SisoId; }
		private set { Customer.CustomerId = value; }
	}
	
	public Customer Customer { get; set; }
	
	public Order()
	{
		Customer = new Customer();
	}
}

Now, when you store a Order document, Customer will not be stored, since SisoDb will know that Customer is a document living on it’s own, since it has the member public Guid SisoId { get; set; }. The CustomerId will get stored, and this can be used when you are querying orders. What you then can do is to say: “Hey, load Orders and include Customers.”

using(var uow = db.CreateUnitOfWork())
{
	var orders = uow.Query<Order>(q =>
				     q.Include<Customer>(order => order.CustomerId));
}

What SisoDb will do is to in the same query and JSON-result incorporate the JSON of the included documents/structures. Hence you will not get any extra roundtrips and you have gotten yourself a fully loaded Order. More information.

Control what to make Queryable

By default every simple property is extracted from your document/structures and made queryable in SisoDb. This is of course something that you can control and if you have a deep graph with lots of members but you only query on a few of them, you will get better performance by only making these fields queryable, which means that the Indexes table for that document type will get smaller. More information.

db.StructureSchemas.StructureTypeFactory.Configurations.NewForType<Customer>().OnlyIndexThis(c => c.CustomerNo);

That’s it for now. Now I’m going to tweak SisoDb for you, so you don’t have to ;-)

//Daniel

SisoDb now lets you query without transactions

Before v2.0 of SisoDb (http://sisodb.com) you could only query using an UnitOfWork. All UnitOfWorks are transactional and in some cases you might want to perform queries in the same transaction as you are doing inserts, updates and deletes in, but if you just want to query you should use the new QueryEngine class.

using(var qe = database.CreateQueryEngine())
{
    customers = qe.Where<Customer>(c => c.Lastname == "Andersson");
}

//Daniel

SisoDb now supports TransactionScope

When using an UnitOfWork in SisoDb you are using traditional ADO.Net transactions. Lets have a look at an example:

using(var unitOfWork = dataBase.CreateUnitOfWork())
{
    unitOfWork.InsertMany(customers);
    unitOfWork.Commit();
}

This is all fine but suppose you want to control several UnitOfWorks, well the obvious solution is to use the TransactionScope class. Since we are targetting SQL Server 2008 we also don’t want to escalate to an distributed transaction, even if multiple connections are opened targeting the same DB using the same connection string. This is now supported. Hence you can do things like:

using(var ts = new TransactionScope())
{
    using(var unitOfWork = dataBase.CreateUnitOfWork())
    {
        unitOfWork.InsertMany(customers);
        unitOfWork.Commit();
    }

    ts.Complete();
}

The UnitOfWork will check if there is an ongoing Transaction from a TransactionScope, and if there is, no ADO.Net transactions will be created and the Commit and Rollback of the UnitOfWork is left to the outer TransactionScope.

You can of course have multiple UnitOfWorks as well:

using(var ts = new TransactionScope())
{
    using(var unitOfWork = dataBase.CreateUnitOfWork())
    {
        unitOfWork.InsertMany(customers1);
        unitOfWork.Commit();
    }

    using(var unitOfWork = dataBase.CreateUnitOfWork())
    {
        unitOfWork.InsertMany(customers2);
        unitOfWork.Commit();
    }

    ts.Complete();
}

Hope it helps you.

//Daniel

SisoDb vs Entity Framework 4.1 Code first – Inserts

Before making any performance comparisions I just want to state the following:

I like Entity Framework Code first and I don’t see SisoDb as a complete replacement. SisoDb should be seen as a complement. Both tools have their place. EF being an O/RM and SisoDb being a document-oriented provider.

With that said, lets continue.

SisoDb – Simple Structure Oriented Db

I will not get into any details of what SisoDb is. If you are interested in knowing more I recommend the following:
SisoDb – Overview
http://sisodb.com/Docs/Doc0

Overview of the internal workings of SisoDb
http://daniel.wertheim.se/2011/04/14/overview-of-the-internal-workings-of-sisodb/

For this post, just keep in mind that SisoDb is a document-oriented (in a NoSQL way of seing things) storage provider sitting on top of SQL Server. It’s not an O/RM.

Entity framework 4.1 – Code first

It’s being called the magical unicorn but I think it’s fair to say that EF Code first is nothing more than Microsoft first decent O/RM giving you a model first experience in .Net, much like the one NHibernate has been giving for years.

They are not the same

With EF you get a lot of O/RM like features like first level caching with identity maps, change tracking etc. It gives you a normalized database so that your object graphs gets stored in several tables. These tables needs to be joined to construct both queries and to return the resulting data for reconstructing your entities. This is a normal case of traditional relational database models used in RDMS.

With SisoDb you get simplicity. You get object-graphs stored as documents getting rid of all those joins. There’s no way to provide mappings and by using JSON you can of course work with base-classes, interfaces etc. You can also include/reference other documents which is returned in the same resultset as the main query. You can see each document/structure as an isolated store with no relations.

For dealing with JSON, SisoDb relies on a very fast library from ServiceStack. More info about the performance of this lib can be found here.

Testing environment

Both the application and the database (SQL Server developer edition) is executed on the same laptop. With 6GB RAM and an I7 Processor and a SSD disk on Windows 7 Ultimate, 64bit. Application was executed from within VS2010 in release mode and without debugger.

Performance – Simple inserts

For these simple inserts we will have a model looking like this:

Model for Simple inserts

In the aggregate root Customer there’s one difference between SisoDb and EF. In SisoDb the property by conventions must be named “SisoId”. In EF, this can be mapped but to get the convention support I’ll use “Id”.

[Serializable]
public class Customer
{
    //public int SisoId { get; set; }

    //public int Id { get; set; }

    public int CustomerNo { get; set; }

    public string Firstname { get; set; }

    public string Lastname { get; set; }

    public ShoppingIndexes ShoppingIndex { get; set; }

    public DateTime CustomerSince { get; set; }

    public Address BillingAddress { get; set; }

    public Address DeliveryAddress { get; set; }

    public Customer()
    {
        ShoppingIndex = ShoppingIndexes.Level0;
        BillingAddress = new Address();
        DeliveryAddress = new Address();
    }
}

[Serializable]
public class Address
{
    public string Street { get; set; }

    public string Zip { get; set; }

    public string City { get; set; }

    public string Country { get; set; }

    public int AreaCode { get; set; }
}

[Serializable]
public enum ShoppingIndexes
{
    Level0 = 0,
    Level1 = 10,
    Level2 = 20,
    Level3 = 30
}

Scenarios

  • 1.1) Insert 1000 customers – Take 1 – without any optimazations
  • 1.2) Insert 1000 customers – Take 2 – with some optimazations
  • 2.1) Insert 10000 customers – Take 1 – without any optimazations
  • 2.2) Insert 10000 customers – Take 2 – with some optimazations
  • 3.1) Insert 100000 customers – Take 1 – without any optimazations
  • 3.2) Insert 100000 customers – Take 2 – with some optimazations

In Scenario 1.1 and 1.2, I iterated it five times and took the best value.
In Scenario 2.1 and 2.2, I itarated it two times and took the best value.
In Scenario 3.1 and 3.2, I itarated it two times and took the best value.

Also note that EF will not handle the enumeration in the model above. There are ways to get around this problem, but that’s not what this post is about.

Performance optimizations

During Scenario 1.2, 2.2 and 3.2 I used some small performance optimizations. I was a little bit more gentle to EF, by turning off some features.

ctx.Configuration.AutoDetectChangesEnabled = false;
ctx.Configuration.ProxyCreationEnabled = false;
ctx.Configuration.ValidateOnSaveEnabled = false;

By the way, feel free to tell me more things to configure to make it perform better.

For SisoDb there’s really is not optimazation. The point of SisoDb is to be simple and performant pout of the box. Although there is one feature you can take advantage off. You can tell SisoDb what to index(make queryable). In this case I selected only CustomerNo. It’s not a nice comparision but if you do have this scenario where you don’t plan on searching on just about everything, you can turn it off (http://sisodb.com/docs/doc15).

Simple inserts – Summary

  #1000 – #1 #1000 – #2 #10000 – #1 #10000 – #2 #100000 – #1 #100000 – #2
EF 0.83s 0.42s 50.85s 5.12s N/A 56.91s
SisoDb 0.09s 0.06s 0.86s 0.55s 8.71s 5.73s

Memory consumption

When inserting 100000 items with EF and the optimizations on, I got around 1GB of memory consumption. With SisoDb I got around 100MB of memory usage.

Source code

The source code for this article is hosted at Github: https://github.com/danielwertheim/SisoDbVs

Summary

This time I treated inserts of simple object graphs. Next post will be about complex inserts as well as reading/querying.

//Daniel

Overview of the internal workings of SisoDb

I thought it’s time to give you an overview of how the internals of SisoDb works so that you get some insight into “performance” considerations.

How is data stored?

Before continuing, lets give a quick intro to SisoDb. SisoDb is a NoSql influenced provider giving you a document-oriented solution over Sql-server. It does this by seing your object graphs as structures (document in a NoSql document-oriented database) where public members of simple types (strings, numbers, dates etc) in the hierarchy are made queryable. As a default every property of the graph in the contract of the passed class or interface is flattened to fit one row in a special “Indexes-table”. This table is there for making queries against your structure. You can easily go in and place indexes on the columns you query a lot. All values are extracted using cached delegates generated using IL-Generator emits, hence I don’t relly on dirty, timeconsuming reflection calls.

Json-serialization

The structure is also stored as Json in the “Structure-table”. This is done to keep an intact schemaless representaion of your structure so that structures can be reindexed and to give an effective deserialization process when performing queries.

I’m using one of the fastest Json-serializer I know of in the .Net community: ServiceStack.Text you can read about a performance compare between the popular Json.Net library here: Json.Net vs ServiceStack.Text.

Not making everything queryable?

I’m currently implementing support for this, where you will be able to specify “hey don’t make eveything queryable, since I will only query on these properties”. That way you can boost performance making the “Indexes-table” much more slimmer.

This feature is coming really soon, perhaps it’s already implemented.

Separated entities & sharding

Since data is document-oriented one certain structure gets it’s own tables and they stand on their own legs not having relations to other tables. This is also a mindset you need to have when working with SisoDb, a mindset that it’s not an O/RM over a relational data model, it’s a document-database. You could take advantage of this and shard your model. I’m planning support for this in the future, but right now you will manually need to put up a proxy accessing different SisoDb instances depending on the type of structure being consumed.

Use replication for readmodels and writemodels

Since I’m targetting SQL-server you get some built in benefits where you could take advantage of the builtin support for replicating data between databases. This way you could easily have a write and a read store as well as put up a store which you then use some ETL tool to transform the data to a model more fit for reporting, warehousing etc.

How is data inserted?

When inserting entities there is a demand that you have a property named “SisoId”. That is the only demand SisoDb has on your model. That property could either return an Integer or an Guid.

Integer identities

In this scenario SisoDb looks how many entities you are inserting and reservers a range of identities and assign them to the model before performing the insert to the database. This way no ineffective insert + select for each row have to be made (as with Entity framework or traditional identities in NHibernate).

Sequential Guid identities

SisoDb doesn’t use traditional generated Guids but instead it uses sequential guids mimicking the algorithm used in SQL Server’s sequential guids.

Bulkcopy

I make use of custom datareader that reads over the structures and is consumed by the SQL bulkcopy. That way there are “NO custom generated ad-hoc batch SQL inserts” but effective inserts using the bulk copy.

Querying

When querying using uow.Where or uow.Query or uow.Get etc. your specified lambda expression are translated to parameterized SQL executed as a plain select via the ADO.Net command and NOT executed using ad-hoc SQL and the EXEC function in SQL server.

Well that was a short overview of how the internals works. Will be glad to try and answer any questions. There are more information about it here: http://sisodb.com/docs

//Daniel

SisoDb – Getting started

I think it’s high time for providing you with a simple example of how SisoDb (http://www.sisodb.com) can be put to work.

Download the source or the binaries from https://github.com/danielwertheim/SisoDb-Provider. The time of this writings, the version is v0.7.0.1.

In this simple example I will not have a GUI. I will actually use tests instead, and I will use NUnit as my testing framework. The scenarios below will only deal with the data storage strategy and not the business scenario, hence no validation etc.

Register a new customer

Fairly simple. We want to store some simple information about a customer. E.g: CustomerNo, First name, Last name, Shipping address and Billing address.

[Test]
public void RegisterNewCustomer()
{
    var customer = new Customer
    {
        CustomerNo = "Super100",
        Firstname = "Daniel",
        Lastname = "Wertheim",
        BillingAddress =
            new Address
            {
                Street = "The street 1", Zip = "12345", 
                City = "The City", Country = "Sweden"
            }
    };

    //TODO: Connect using SisoDb
}

which gives us a model of:

public class Customer
{
    public Guid Id { get; set; }
    public string CustomerNo { get; set; }
    public string Firstname { get; set; }
    public string Lastname { get; set; }
    public Address ShippingAddress { get; set; }
    public Address BillingAddress { get; set; }
}

public class Address
{
    public string Street { get; set; }
    public string Zip { get; set; }
    public string City { get; set; }
    public string Country { get; set; }
}

Note! There are not interfaces, base classes or any junk. We will however need to add one member to the aggregate root (the class being used when storing structures) Customer and that is a property containing the Id: public [int|Guid] Id { get; set; }. You can select between int or Guid, SisoDb will use int as indicator for identities and Guid as indicator for, well Guid’s. SisoDb will take care of the assignment of them.

Pause – lets get some insight to SisoDb and its building blocks

Before continuing and doing what the TODO says, lets learn a bit about the building blocks of SisoDb. There’s a Database and something called a Unit of work (which is transactional) and then there’s…. nothing more! Really, it’s that simple. No mappings, interfaces or base classes. Just you and your Plain Old Clr Objects (POCOs).

The terminology “Structure” is kind of synonym with “Document” which is one popular kind of NoSql storage entity used e.g in MongoDb. You can look at it as a object graph with x number of levels and members on each level. SisoDb will serialize this graph and store it as Json as well as flatten the hierarchy and extract all values of public property that has getters and are simple types (int, strings, decimals, datetimes etc.) and store them in a index table used for querying.

TODO: Connect using SisoDb

Sorry for the details. There is much more info at: http://www.sisodb.com/docs.

The next steps are easy: connect to a database and let SisoDb generate it.

[Test]
public void RegisterNewCustomer()
{
    ....
    ....
    var cnInfo = new SisoConnectionInfo(
        @"sisodb:provider=Sql2008||plain:Data source=.;
        Initial catalog=SisoDbDemo;Integrated security=SSPI;");
    var db = new SisoDatabase(cnInfo);
    db.EnsureNewDatabase(); //Note! This is a test hence it's ok that the database is recreated.

    //TODO: Insert the customer
}

The database is designed for being long lived and is something you would store in your IoC-container. It caches the structure schemes etc. and keeps track of if it has generated tables etc.

TODO: Insert the customer

What we have todo is: Get an unit of work and insert the item and commit the unit of work so that the changes made in the transaction is committed.

[Test]
public void RegisterNewCustomer()
{
    ....
    ....
    Customer refetched = nulll;
    using(var uow = db.CreateUnitOfWork())
    {
        uow.Insert(customer);
        uow.Commit();

        refetched = uow.Query<Customer>(
            c => c.CustomerNo == "Super100").SingleOrDefault();
    }
}
Assert.IsNotNull(refetched);

That’s it for a getting started post. I will be covering a lot more in a near future.

//Daniel

I’m alive! It’s just that SisoDb takes all my time

No I’m not dead I’m just putting in every free second I got on my project SisoDb – Simple-Structure-Oriented-Db >> a NoSql .Net implementation for Sql-Server (http://www.sisodb.com), hence my lack of covering Entity framework code first. There are several others doing this, e.g Scott Guthrie. If you don’t follow his tweets, start following. He publishes nice tips on links to articles about e.g. EF code first etc.

Back to SisoDb. I will start covering examples here at this blog or at the official blog (http://blog.sisodb.com)

Currently I’m targeting SQL 2008 R2 but there’s also a somewhat “unofficial” version that should work against SQL-Azure. I’m also about to try and get it to work against VistaDb (http://www.vistadb.net)

//Daniel

SisoDb moved to Codeplex

I have now moved the code to http://sisodb.codeplex.com and put up a dedicated blog for SisoDb at http://blog.sisodb.com

I’m currently working on supporting identities instead of just guids. After that I will focus on some documentation and better/more querying support.

If there is any feature you would like to see, please, contact me.

//Daniel