Jump to content

Data Platform/Sessions

From Wikitech

This page documents the various notions of a "session" used in analytics for Wikimedia projects and Foundation products.

Implementations

These are the major implementations of sessions used by analytics instruments in production. This section attempts to document their technical details because an analyst shouldn't have to read instrumentation source code to know these details and Slack is not a good platform for documenting caveats that one should be aware of when working with session data.

Feature-specific instrumentation sometimes performs its own session management (e.g. editing session in instruments adhering to the EditAttemptStep schema; search sessions in SearchSatisfaction and SearchVue; Help Panel sessions in HelpPanel). Currently only the cross-instrument session management is documented on this page.

Mobile Apps

Android

Sessions in the Wikipedia app for Android reset after 30 minutes of inactivity and are persisted if the app is closed. If the user opens the app within 30 minutes, their previous session is resumed. If it has been more than 30 minutes, that is a new session.

Regarding inactivity:

  • When reading an article, any tap whatsoever keeps the current session going.
  • In any other screen, it's only entering that screen keeps the current session going.
  • If the app remains open for more than 30 minutes without any interaction at all, then the next interaction will cause a new session to start.

It is implemented within AppSessionEvent.kt. The underlying ID-generating algorithm is the same as mw.user.sessionId()'s and is implemented within EventPlatformClient.kt.

iOS

App version 7.2.2 and below implementation: Sessions in the Wikipedia app for iOS reset after 15 minutes of moving to the background (via explicit backgrounding or if app is interrupted by a phone call), and reset upon foregrounding (if more than 15 minutes have passed). Unlike the Android app, they are not persisted – so if the user closed the app completely and immediately re-launches it, that will be a new session (regenerated lazily next time an event is logged). The session ID is also regenerated if the user toggles usage sharing off and then back on.

App version 7.3.0 and above implementation: Sessions in the Wikipedia app for iOS reset after 30 minutes of moving to the background (via explicit backgrounding or if app is interrupted by a phone call), and reset upon foregrounding (if more than 30 minutes have passed). Like Android, session IDs are persisted, so if the user terminates the app completely and immediately re-launches it, it will use the same session ID. If 30+ minutes have passed after terminating, it will generate a new session ID after relaunching. The session ID is also regenerated if the user toggles usage sharing off and then back on.

If the app is in the foreground for 15+ (for 7.2.2 and below) or 30+ (for 7.3.0 and above) minutes and the screen isn't automatically locked, the session ID remains unchanged even if the user is not actively interacting with the app. Therefore it is theoretically possible to have very long (multi-hour, multi-day) sessions without activity between the first and last events recorded in a session.

It is implemented within EventPlatformClient.swift. The underlying ID-generating algorithm is the same as mw.user.sessionId()'s and is implemented within EventPlatformClient.swift.

Note: Any events produced but not sent (since events are sent in bursts) during the previous session are persisted and are then sent during the new session, but they will (correctly) include the previous session ID.

Web

Sessions on the web are… complicated.

There are two functions that provide session identifiers, mw.user.sessionId() and mw.eventLog.id.getSessionId(), which are defined in MediaWiki Core and the EventLogging MediaWiki extension respectively. Thus, web sessions are only available on pages that are served by MediaWiki, e.g. https://en.wikipedia.org/wiki/Main_Page but not https://www.wikipedia.org.

Both session identifiers have the following properties:

  • They are stored in session cookies, which means:
    • They are deleted when the current session ends.[1] The browser defines when the "current session" ends but usually it's when the browser process has been terminated. However, some browsers like Chrome[2] and Firefox[3] have a "Continue where I left off" / "Restore previous session" feature which restores cookies and data, so a session cookie can stay around for a long time – longer than a month, even![4]
  • They are stored in cookies without a domain, [5][6] so access is restricted to the same host that set the cookie[1]
    • This results in different session IDs on en.wikipedia.org and he.wikipedia.org even if user visits both domains in the same browser at the same time
    • Because of how our mobile variant sub-domains are configured, we cannot share session cookies between the desktop and mobile automatically. For that to be possible our mobile domains would need to be formatted like m.en.wikipedia.org
  • They are hex-encoded 80-bit random integers, generated using crypto.getRandomValues()

mw.eventLog.id.getSessionId() is a newer, slightly "smarter" session identifier, which is barely used. This session identifier resets after 30 minutes of inactivity. It does this by subscribing to a session reset event emitted by the SessionTick instrument, which collects data for the SessionLength dataset and is the instrumentation that monitors activity (scrolling, clicking, or typing). However, because session management is done by the SessionTick instrument, this can lead to race conditions during instrument initialization – resulting in instruments using a stale session ID before a refresh is triggered.

mw.user.sessionId() is implemented within mediawiki.user.html and the instruments using the identifier can be found here: https://codesearch.wmcloud.org/extensions/?q=mw.user.sessionId()&i=nope&files=&excludeFiles=&repos=

mw.eventLog.id.getSessionId() is implemented within core.html and the instruments using the identifier can be found here: https://codesearch.wmcloud.org/extensions/?q=mw.eventLog.id.getSessionId&i=nope&files=&excludeFiles=tests%2F*&repos=

See also

Activity Session and Browser Session glossary entries in Wikimedia's internal data catalog.

References

  1. 1.0 1.1 https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies
  2. https://web.archive.org/web/20230213215842/https://support.google.com/chrome/answer/95314#zippy=
  3. https://web.archive.org/web/20230203063845/https://support.mozilla.org/en-US/kb/restore-previous-session
  4. https://web.archive.org/web/20221128054153/https://textslashplain.com/2019/06/24/surprise-undead-session-cookies/
  5. https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/f6b9381d7f3664a00f9fd4e5597f161155b1f036/resources/src/mediawiki.cookie/index.js
  6. https://codesearch.wmcloud.org/search/?q=wgCookieDomain&i=nope&files=.php&excludeFiles=&repos=