Transitioning from Traditional Data Warehousing to Modern Web Analytics


the journey from traditional data warehousing to modern web analytics has been transformative. Initially, data integration tools were pivotal in extracting and transforming structured data from line-of-business (LOB) operational applications, loading it into data warehouses for business intelligence (BI) reporting and analysis. Today, the analytics arena has expanded to include content analytics, event analytics, and web analytics, with collaborative analytics emerging as a new frontier. This article delves into the purpose and functionality of web analytics, the tools and techniques employed, and the future of data analytics in a collaborative environment.
The Evolution of Data Analytics
From Traditional Data Warehousing to Modern Analytics
In the early days, data warehousing was the cornerstone of business intelligence. Data integration tools played a crucial role in extracting and transforming structured data from traditional LOB operational applications. This data was then loaded into a data warehouse, where BI reporting and analysis tools could process it. However, the analytics landscape has since evolved, with various departments now deploying different types of analytics, including content analytics, event analytics, and web analytics.
The Rise of Collaborative Analytics
The development of collaborative and social computing tools is paving the way for collaborative analytics. This shift is significant because many individuals building these analytical solutions may lack deep knowledge of BI and data warehousing. As a result, it is unrealistic for the BI group to assume they can fully integrate this influx of information into a data warehousing environment. Web analytics serves as a prime example of this challenge, as it can be developed by various groups within an organization.
The Purpose of Web Analytics
Understanding Web Analytics
According to the Web Analytics Association (WAA), web analytics involves the collection, measurement, analysis, and reporting of internet data to optimize web usage. The WAA establishes standard metrics that web analytics products should support, such as page views, visits, unique visitors, new visitors, returning visitors, clickthroughs, and conversions. These metrics are essential for identifying website visitors, understanding their behavior, and measuring the success of their visits, such as purchasing a product or service.
Tactical and Strategic Decision-Making
WAA metrics provide after-the-fact summaries of past events, primarily intended for tactical and strategic decision-making. However, for operational decision-making, such as fraud detection or real-time marketing campaigns, different tools are required. Identifying visitors can be challenging and often requires the use of cookies or customer relationship management (CRM) tools to gather additional data.
Optimizing Web Performance
Overall, web analytics offers valuable insights into website usage, visitor behavior, and online performance. These insights can help businesses optimize their online presence and improve customer engagement.
How Web Analytics Products Function
Data Collection Methods
Web data can be collected using two primary methods: page tagging and log file analysis.
Page Tagging
Page tagging involves adding additional code, often written in JavaScript, to a web page to inform a third-party server when the page is rendered by a web browser.
Log File Analysis
Log files generated by the web server managing a website can be analyzed to gather data. While some products support network sniffing, it will not be discussed here.
Prominent Web Analytics Tools
Google Analytics
Google Analytics is a prominent example of a product that utilizes page tagging. It is a free software-as-a-service (SaaS) offered by Google, which generates comprehensive visitor metrics for a website, aimed at marketers instead of webmasters. The product is useful for measuring the effectiveness of marketing campaigns utilizing Google's AdWords feature. Even websites with less than 5 million page views per month can use the service, even without an AdWords account. The Google Analytics JavaScript code collects visitor data and sends it back to Google data collection servers. The servers process the data periodically and generate reports that the website owner can access on-demand. Google also provides the fee-based Urchin Software for in-house use.
Other Tools
Other SaaS and in-house products that compete with Google Analytics include Coremetrics, Omniture (recently acquired by Adobe) SiteCatalyst, Unica NetInsight, WebTrends Analytics, and Yahoo Web Analytics. CMS Watch offers an excellent report for purchase comparing these and other web analytics products, and their website has a free report appendix documenting how these products support WAA metrics.
Considerations for Purchasing Web Analytics Products
When purchasing a web analytics product, it is crucial to consider its ability to handle web pages with dynamic content that includes Rich Internet Applications (RIA) created with technologies such as Ajax and Adobe Flash. The capability to track RSS syndication readership and mobile users may also be essential for some organizations. Some vendors, such as SeeWhy, provide specific applications for web marketing. All of the products mentioned above support page tagging, while a few also support log file processing.
Comparing Tagging and Log File Approaches
The comparison table below highlights the differences between the tagging and log file approaches.
Leveraging Web Data in a Data Warehousing Environment
Integrating Web Data with Enterprise Data
Log files can serve as an ideal data source for a data warehousing environment, allowing web data to be correlated with other types of enterprise data. Given the volume of data involved, some filtering and consolidation may be necessary before loading the log data into a data warehouse. This can be done using standard data integration tools that support flat files or using technologies like Hadoop MapReduce.
Real-Time and Near-Real-Time Web Analytics
When real-time or near-real-time web analytics are required, there are two alternative approaches available from vendors.
Business Activity Monitoring (BAM)
BAM tracks and analyzes business transactions generated by web interactions as they pass through operational systems. BAM is useful for analyzing a continuous stream of business transactions and generating real-time reports and dashboards.
Complex Event Processing (CEP)
For more complex processing of transaction and event streams, products that support complex event processing (CEP) can be used. CEP solutions can analyze and correlate multiple streams of current and historical data, identify patterns and trends, and predict potential outcomes. Examples of vendors in this area include Aleri, IBM (WebSphere Business Events, InfoSphere Streams), Oracle (Oracle CEP), Tibco (BusinessEvents), and Truviso. Note that some vendors use the term business event processing (BEP) instead of CEP, while others use terms such as continuous intelligence and continuous analytics.