Daily Archives: September 29, 2008

v0.1 of my Visit/PageView Analysis Services Cube

So I created a cube using my the visit/pageview that I recently started collecting. The cube is VERY simple – only 4 dimensions {App, Page, Source, Time} and just a single measure – {Hit}. I created the cube using the 2005 versions of Visual Studio and Microsoft’s Analysis Services. The IDE’s wizards pretty much walk you through the process, which is great because creating an Analysis Services project from scratch is more than a little intimidating. Especially if your a web developer like me and you don’t know a whole lot about querying, let alone designing a cube.

Anyway, like I said, my cube is very simple. The App dimension only contains 2 members: ‘mattberseth’ for this site, and ‘mattberseth2′ for my live demo site. The Page dimension contains all of the unique URLs for both sites as its members, the Source dimension is essentially a bit field for determining if the traffic was the result of a direct hit or from a referring site and finally the Time dimension represents the calendar and is used for counting hits by a time interval (i.e. Days, Weeks, Months, etc…).

In my simple cube, all 4 dimensions and the Hit measure are currently coming from a single table. I have simulated the standard star schema by generating the keys for the different dimensions using the following 4 SQL queries. So the first query in the SQL snippet populates my measure group and the other three are responsible for populating the App, Source and Page dimensions.

-- builds the fact_pageview measure group ***
select
    -- if the referring url is empty than we know the source comes from a direct hit
    (case when ref_url = '' then 1 else 0 end) as source_id,
    -- the app_id - either mattberseth or mattberseth2
    app_id,
    -- turn the page url to an int
    checksum(url) as page_id,
    -- extract the date portion of the datetime
    convert(datetime, floor(convert(float, date))) as date,
    -- each row is a sinlge hit
    1 as hit
from
    -- my visit table
    visit_load with(nolock)

-- builds the app dimension ***
select
    'mattberseth' as app_id,
    'My Blog' as app_name
union
select
    'mattberseth2',
    'My Live Demo Site'

-- builds the source dimension ***
select
    0 as source_id,
    'Referrer' as source_name
union
select
    1,
    'Direct'

-- builds the page dimension ***
select
    checksum(url) as page_id,
    url as url
from
    visit_load with(nolock)

Next, I created views for these 4 queries, let my Analysis Services project know about them and used them as the data source for my cube. Conceptually, this diagram shows how these 4 queries are related.

image

Browsing and Querying the Cube

And amazingly, only after a few minutes of nexting through wizards and drag and drop design work, I deployed and processed the cube to my local Analysis Server instance. And now I can start taking a look at the data.

Browsing the Cube

Once the cube is deployed and processed, you can start browsing it. Below are a couple of screen shots that show the structure of my cube on the left, and hit counts for my two sites (mattberseth.com and mattberseth2.com) segmented by the traffic source (either direct traffic or referring site). The screen shot below that shows these counts as a percentage of the grand total. Looks like direct traffic to my demo site only makes up 2% of my total traffic ;(

image

image

Querying the Cube

And if you can stomach writing a little MDX, you can write custom queries to extract even more useful information. Below is a sample MDX query and result set that shows the average traffic per day for both mattberseth.com and mattberseth2.com segmented by week. The numbers are a little deceiving because only Week 39 consists of a full 7 days, but I think you can get the picture.

with
    -- define the Weekend and Weekday sets
    set [Weekday] as
    {
        [Time].[Day Of Week].[Day 2],
        [Time].[Day Of Week].[Day 3],
        [Time].[Day Of Week].[Day 4],
        [Time].[Day Of Week].[Day 5],
        [Time].[Day Of Week].[Day 6]
    }
    set [Weekend] as
    {
        [Time].[Day Of Week].[Day 1],
        [Time].[Day Of Week].[Day 7]
    }
    -- create a few calculated meausres based that make use of these sets
    member [Measures].[Weekday Average] as avg([Weekday], [Measures].[Hit]), format_string = '#'
    member [Measures].[Weekend Average] as avg([Weekend], [Measures].[Hit]), format_string = '#'
    member [Measures].[Weekly Average] as avg({[Weekday], [Weekend]}, [Measures].[Hit]), format_string = '#'
select
{
    [Measures].[Weekday Average],
    [Measures].[Weekend Average],
    [Measures].[Weekly Average]
} on 0,
non empty
{
    [App].children * [Time].[Week Of Year].children
} on 1
from
    [PageView]

image

What’s Next?

Well, I am pretty excited. I only have a handful of development hours invested in my visit cube (it honestly took longer to write this post than it did to create the cube) and I can already tell I have made the right decision by maintaining my own pageview/visit database. Of course there is still a lot to do …

  • My pageview JavaScript tracking code needs some work. I have been tweaking it over the past 2 weeks to play around with different techniques to keep my tracking request from getting cached. I have finally come within a few percent of what Google Analytics is reporting so I am happy. I think I will clean the handler up and write a quick post describing what I did.
  • My Source dimension on has 2 members – Direct and Referrer. I would like to break down the Referrer further to include Search Engines, Community (DNK, Reddit, Digg, DZone, etc…), and forums (forums.asp.net, stackoverflow, etc…).

  • I need to extract keywords from the Search Engine sources and get them into the cube
  • Look up geography information based on IP
  • IP + User Agent Sessionization. I would like to track time on site, navigation paths, etc…
  • Incorporate additional dimensional data from my Moveable Type database

At some point I plan on sharing the solution: JavaScript tracking code, HttpHandler, OLTP and OLAP databases as well as the Analysis Services Project …

That’s it. Enjoy!

TagScanner Renames and Tags Your Digital Music [Featured Windows Download]


Windows only: Rename the thousands of MP3 files in your digital music library and add or edit tags, lyrics, and album art in one fell swoop with free utility TagScanner. Not only can TagScanner clean up the artist, album, song title, and track number information for your digital music files, it can rename your songs based on a pattern you define (like %artist% - %title%), it can make music playlists, and search online databases like freedb and Amazon to automatically tag music missing information. It includes a built-in player as well so you can listen to tracks while you edit. We’ve recommended Media Monkey to whip your music’s metadata into shape, but TagScanner looks like a solid alternative. TagScanner is a free download for Windows only.

WP Like Button Plugin by Free WordPress Templates