SITECORE XDB Architecture : Session State and the xDB

Introduction to the xDB

  • Sitecore’s Experience Database (xDB) was introduced in Sitecore 7.5 to solve the problem of scaling analytics

  • The xDB introduces a dependency on MongoDB, but you can also opt for Sitecore’s xDB in the cloud service.

  • In the xDB, raw analytics data is written to a MongoDB collection database and processed into formats that are used for analytics reporting. Every component of the xDB can be scaled.

The Role of MongoDB

  • In Sitecore’s xDB, MongoDB is used primarily for collecting data. Information about visitors and their interactions is written to MongoDB as flat JSON, which is then processed by an aggregation pipeline into a format that is used for reporting.

  • You can also choose to use MongoDB as your session state provider if you choose to use OutProc session state management. There is also a SQL version of this provider.

The xDB Contact

What is a contact?

  • All visitors are stored in the xDB - including anonymous visitors. As soon as you identify yourself by providing Sitecore with an e-mail address, you become a contact.

  • You can view identified contacts and individual visitors in the Experience Profile interface. Note that this interface does not necessarily pull all of its content directly from MongoDB.

Where are contacts stored?

  • The raw contact data is stored in the contacts collection of the analytics MongoDB. MongoDB stores information in JSON format. Here is a contact that has provided a first name, surname, and e-mail address:

{
    "_id" : LUUID("ad72f93a-a071-1f40-ac68-c165faf21e91"),
    "Identifiers" : {
        "IdentificationLevel" : 2,
        "Identifier" : "extranet\\jill_at_mail_dot_com"
    },
    "Lease" : {
        "ExpirationTime" : ISODate("2015-02-06T11:32:21.381Z"),
        "Owner" : {
            "Type" : 0
        }
    },
    "System" : {
        "Classification" : 0,
        "OverrideClassification" : 0,
        "VisitCount" : 1,
        "Value" : 0
    },
    "Personal" : {
        "FirstName" : "Jill",
        "Surname" : "Bean"
    },
    "Emails" : {
        "Preferred" : "work_email",
        "Entries" : {
            "work_email" : {
                "SmtpAddress" : "jillsmtp@mail.com"
            }
        }
    }
}
  • Contact data is also aggregated down to the reporting database (which is used primarily by the reporting API and Engagement Analytics) and the analytics index, which is used by the Experience Profile search page and Email Experience Manager.

Session State and the xDB

Where does session state fit in with the xDB?

  • As a visitor browses around your site, information about that visitor and their interaction is stored in session.
  • When the session ends, this information is flushed to the xDB - but for the duration of an interaction, session is solely responsible for storing valuable information about a visitor’s actions on your website.
  • This reduces the number of calls to the collection database, but it means that session management should be as robust as possible.

Shared vs Private session state

  • The xDB stores two kinds of session information - shared and private .
  • You can think of shared session state as the contact store - it has information about the contact, devices used, and engagement plan states.
  • Private session state contains information about interactions - such as goals triggered. When you install Sitecore on a single machine, both types of session data are stored InProc.
  • If you have a single CD, all session state management can be done InProc. However, as soon as you begin to scale by adding more CDs to your cluster, you have to switch to OutProc session management.
  • InProc is short for ‘In Process’, and means that all session data (both private and shared) is managed in memory.
  • OutProc, conversely, means that session data is stored somewhere else - it might be written to disk as the user browses around your site.
  • There are two custom OutProc session providers; one for MongoDB and one for SQL.
  • A default installation of Sitecore uses InProc session management for both shared and private session data.
  • As soon as you scale to multiple CDs within a cluster, you must use OutProc session management for both private and shared session data. This is because shared session data needs to be available to all sessions for a single contact. This is done by storing shared session data in an OutProc session state database.

Scenario 1: Single CD and InProc

In this example, session is managed in memory by a single CD:

Session and a single CD

  1. User signs in , thereby becoming a contact in the xDB.
  2. The single CD uses InProc session management, so this is all done in memory.
  3. When his session ends, the data is flushed to the xDB, where it will be processed and aggregated for reporting.


Scenario 2: Multiple CDs, Single Cluster, and OutProc

there are multiple CDs in a single cluster.

Session and a single CD

  • His request is routed to the least busy server via a non-sticky load balancer - because it is non-sticky, he is not attached to this server for the duration of his visit
  • No matter which CD his request is routed to, his session information is written to a shared session database that they all have access to
  • When Bob’s session ends, this data is written to the xDB and eventually disappears from the session database

Scenario 3: Two devices, two concurrent sessions

In this example, there is a sticky load balancer within a cluster. This scenario will not work with the xDB in a real production environment.

Session and a single CD

Your environment is set up to use three CDs and a sticky load balancer within the cluster - once Bob’s request is directed to a CD, it sticks to it until his session ends.

(B1) Bob logs in on his phone, whilst his first session is still going, and information about that visit is not yet in the xDB. (B2) This session sticks to a CD as well - it may or may not be the same CD, there is no way of knowing. (B3) This is where things start to go horribly wrong..
  • When a request comes in to a cluster of CDs, a lock is placed on your contact - ‘you belong to cluster A for the duration of this session’
  • But Bob has two sessions going at the same time, and if the cluster was using OutProc session state management, those two sessions would know about each other - they would share things like contact and engagement plan information.
  • You now have two conflicting sessions - Bob may have changed his contact data by using a form, or moved into different engagement plan states in either of his two sessions.
  • Which session wins? There is now way for the xDB to work that out, so your data is likely to be incomplete and/or inaccurate.

Other advantages of OutProc session state management

OutProc session state is never going to be as fast as InProc session state. However, there are a number of advantages that start to pay off as you scale beyond 10 CD instances.

  • You trade some speed for increased reliability
  • You can share information across concurrent sessions on multiple devices
  • No need for sticky sessions (sticky sessions may result in uneven splits of traffic across load-balanced CDs)

Comments