Passive Analytics Collection - A Strategy

I’ve come to learn that collecting application metrics is tricky to get right. Developer needs, product needs, design needs, and senior management needs differ. There is no one-size-fits-all solution to gathering and organizing those metrics. A consistent theme throughout my career has been that while we have the best intentions when collecting UI interaction data, we rarely provide the depth of data sufficient to answer critical business questions from product and design. Furthermore, without a generalized approach to metrics gathering, metrics tend be logged explicitely, in code, for every desired action/click/etc. This post explores a generalized strategy for collecting UI metrics that is both clean and comprehensive.

A common methodology I’ve seen for UI metrics collection, especially in the React world, is to gather metrics on the front end much like we gather metrics on the server: logging an individual action or side-effect where the action occurs. Doing this seems OK at first. Take the following piece of code:

import React from 'react';
import Analytics from 'some-analytics-library';

const ContactForm = () => (
  <form>
    <button
      onClick={() => {
        Analytics.logCustom({ name: 'submit-form', value: 'submit-button' });
      }}
    >
      Submit
    </button>
  </form>
);

In this example, a custom event name and value gets logged when the user clicks the submit button. This method is reasonable until design and product teams want to know about every button click in the application. Very quickly, the codebase gets polluted with logCustom calls. Furthermore, a button click is just a single type of interaction. As product and design teams consider new experiences or redesigning the current experience, they’ll want to know customer interactions with surrounding or related features like page views, menu interactions, table column interactions, link interactions, and other feature interactions. It doesn’t take long for your codebase to be stuffed full of metrics calls. Metrics emissions become coupled to application code, which can be especially difficult to follow when coupled with things like component lifecycle methods in React. Eventually, (particularly on large teams) attribute misspellings and inconsistent conventions creep their way into the events, much to the consternation of the product and analytics teams trying to query your data.

One alternative to the log pollution is a global click handler running in the background that tracks clicks to any element or set of whitelisted elements. For identification, the click handler would use the ID based on the ID attribute, classname, custom attribute, text, or some other identifying information. The ID can also be concatenated with parent element ids, to form a globally unique ID that is easily identifiable and queryable. The global click handler would be deployed to the page as a separate script and would turn the above code into:

import React from 'react';

const ContactForm = () => (
  <form id="contact-form">
    <button id="submit">
      Submit
    </button>
  </form>
);

Clicks are listened for passively and the element ID used for identification. A structure for storing information about the UI element might look something like this:

{
  "event": {
    "tracker": "Click",
    "type": "click",
    "name": "button[@id='submit']",
    "value": "contact-form:submit:button[@id='submit']",
    "detail": {
      "className": "",
      "id": "submit",
      "tagName": "button",
      "textContent": "Submit"
    }
  },
  "session": "uuid",
  "timestamp": 1587310932364
}

For elements that do not have explicit IDs or data attributes, other identifying information can be used to build an ID including classnames, text, etc. For example, the above button click without ID attributes on the element might log:

{
  "event": {
    "tracker": "Click",
    "type": "click",
    "name": "button[text()='submit']",
    "value": "contact-form:button[text()='submit']",
    "detail": {
      "className": "",
      "id": null,
      "tagName": "button",
      "textContent": "Submit"
    }
  },
  "session": "uuid",
  "timestamp": 1587310932364
}

One challenge to this method is creating a logical strategy for naming elements across your application and not breaking those conventions. Another challenge is ensuring that developers do not change the identifiers used by teams querying the data. Or communicating such changes to those teams querying your data. It isn’t immediately apparent that changing an element ID, or even a custom data attribute, might cause queries to break.

A future blog post will explore other types of UI metrics helpful in understanding how customers interact with your application.