Installing → Upgrade Guide

General Upgrade Notes

The XDMoD OnDemand module should be upgraded at the same time as the main Open XDMoD software. The upgrade procedure is documented on the XDMoD upgrade page. Downloads of RPMs and source packages for the XDMoD OnDemand module are available from GitHub.

Additional 11.0.0 Upgrade Notes

Open XDMoD 11.0.0 fundamentally changes how page loads, sessions, and applications are counted and categorized, as described in detail in the sections below. The changes only apply to newly ingested Open OnDemand web server logs after upgrading to 11.0.0. If you have already ingested logs with a prior version of Open XDMoD, those logs will not be able to be recounted and recategorized using the new methods from 11.0.0 unless you still have copies of the original log files, in which case you can delete the corresponding rows from the modw_ondemand.page_impressions database table and reingest and reaggregate the logs following the instructions below:

  1. Back up the modw_ondemand.page_impressions database table (e.g., using mysqldump).
  2. Run the Bash loop below in the directory containing the log files to find the earliest and latest timestamps in the logs:
     earliest=9999999999;
     latest=0;
     while read line; do
         current=$(date -d "$line" +"%s");
         if [ $current -lt $earliest ]; then
             earliest=$current;
         fi;
         if [ $current -gt $latest ]; then
             latest=$current;
         fi;
     done < <(cat *.log* | cut -d ']' -f 1 | cut -d '[' -f 2 | sed 's#/#-#g' | sed 's/:/ /');
     echo -e "Earliest: $earliest\nLatest: $latest"
    
  3. Run the SQL command below to list the page impressions that will be deleted, replacing :earliest with the earliest timestamp and :latest with the latest timestamp obtained in the previous step:
     SELECT *
     FROM modw_ondemand.page_impressions
     WHERE log_time_ts BETWEEN :earliest AND :latest
    
  4. If that is the correct list of page impressions you want to delete, run the same SQL command from the previous step, replacing SELECT * with DELETE.
  5. Reingest and reaggregate the logs following these instructions.

The sections below explain the details of the changes in 11.0.0.

Using request path instead of Referer header

In previous versions, during ingestion of the Open OnDemand web server logs, the Referer header of each line was used to determine which application was being requested for that line. For example, take the following line from a web server log file:

127.0.0.1 - sfoster [21/Feb/2024:22:30:56 +0000] "GET /pun/sys/dashboard/batch_connect/sys/jupyter/session_contexts/new HTTP/1.1" 200 13058 "https://resource.example.com/pun/sys/dashboard" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

In this example, the Referer header is https://localhost:3443/pun/sys/dashboard, so prior to 11.0.0, this would be counted as a request for the sys/dashboard application. However, the Referer header actually indicates from which page the request originated, not the page that is actually being requested. The actual application being requested on this line is indicated by the request path, /pun/sys/dashboard/batch_connect/sys/jupyter/session_contexts/new, which would be the sys/jupyter application. In version 11.0.0, the request path is now used to determine which application was being requested.

For more information on how applications are categorized, including instructions for how to recategorize them, see this page.

In order to enable parsing of the request path (and the request method, which is not currently displayed in the XDMoD portal but is used for deduplication and may be displayed in a future version), the webserver_format_str defined in portal_settings.d/ondemand.ini must include either %r or both of %m and %U.

11.0.0 also changes the criteria used for deduplicating page impressions. Prior to 11.0.0, only the request time, user, and application were used. In 11.0.0, the resource, request path, request method, reverse proxy host and port (if applicable), browser geolocation (if applicable), browser family, and OS family are also used to deduplicate page impressions.

Removal of -u / --url option from xdmod-ondemand-ingestor

Prior to 11.0.0, logs would only be ingested if the Referer header matched the value of the -u or --url option to xdmod-ondemand-ingestor. For example, if the Referer header were https://resource.example.com/pun/sys/dashboard, then the ingestor would only ingest the line if the -u https://resource.example.com or --url https://resource.example.com option were passed to xdmod-ondemand-ingestor. Now, the Referer header is not used, so the -u / --url option is removed from xdmod-ondemand-ingestor. All page impressions in the file(s) being ingested will be assigned to the resource specified by the -r or --resource option to xdmod-ondemand-ingestor.

Separation of ingestion and aggregation steps

Prior to 11.0.0, xdmod-ondemand-ingestor would always run both ingestion and aggregation. In 11.0.0, ingestion and aggregation can now be run as separate steps. The usage is explained on this page.

Inclusion of reverse proxy hosts and ports

In Open OnDemand, requests can be reverse proxied to other servers such as Jupyter notebook servers, RStudio servers, VNC servers, etc. (details here). Prior to Open XDMoD 11.0.0, such requests were not counted in XDMoD. In 11.0.0, such requests are now counted, and the reverse proxy host and port are preserved in the modw_ondemand.page_impressions table (but the host and port are not currently displayed in the XDMoD portal; they may be in a future version).

Exclusion of non page impressions

In 11.0.0, requests for app icons, images, stylesheets, scripts, datafiles, etc. are not counted as page impressions unless they were being loaded from the OnDemand “Files” or “File Editor” applications. Requests are only ingested if the request path starts with /pun/, /node/, or /rnode/; the request is from an authenticated user (i.e., not "-"); and the request is not for one of the excluded file extensions from the list below (this is defined in the configuration file etl/etl_action_defs.d/ood/normalized.json).

aff, css, dic, eot, gif, ico, jpeg, jpg, js, json, map, mp3, oga, ogg, otf, png, rstheme, svg, ttf, wasm, woff, woff2

Removing ihpc from application names

Prior to 11.0.0, some applications were given a name with (sys/ihpc) at the end. iHPC is an old name for interactive OnDemand apps; this name is no longer used. In 11.0.0, page impressions that are ingested will no longer have (sys/ihpc) in their application names. Applications for page impressions that have already been ingested can be recategorized as explained on this page.

Speeding up person lookup

Prior to 11.0.0, each time ingestion was run, the ingestor would try to match the username of all unknown people from the modw_ondemand.page_impressions table with usernames from the modw.systemaccount table. 11.0.0 still does this, but rather than doing so for all unknown people in the table, it only does it for the page impressions that were ingested during the current run of xdmod-ondemand-ingestor. This speeds up the overall ingestion.

Allowing @ in usernames

The xdmod-ondemand-ingestor now allows the @ character to appear in ingested usernames.

Configuration File Changes

The upgrade renames etl/etl_data.d/ood/application_map.json to etl/etl_data.d/ood/application-map.json and updates it with additional application mappings. See this page for information on how to recategorize applications.

Database Changes

During the upgrade, the modw_ondemand.staging table will have its header_referer column removed and columns added for request_method and request_path.

The modw_ondemand.normalized table will be truncated during the upgrade. It will also receive new columns for id (which is now its primary key), request_path, request_method, reverse_proxy_host, and reverse_proxy_port. Its unique index will be updated to no longer include app and to include request_path, request_method, ua_family, and ua_os_family. In order to fit the new index, the ua_family and ua_os_family columns are downsized from VARCHAR(255) to VARCHAR(32).

During the upgrade, the modw_ondemand.page_impressions table will have its id column updated to use bigint(20) unsigned instead of int(11) to be able to accommodate more than 2,147,483,647 page impressions. It will also have columns added for request_path_id, request_method_id, reverse_proxy_host_id, and reverse_proxy_port_id. Its unique index will be updated to remove app_id and to include resource_id, request_path_id, request_method_id, reverse_proxy_host_id, reverse_proxy_port_id, app_id, location_id, ua_family_id, and ua_os_family_id. Indexes will be added to speed up aggregation, person lookup, and application recategorization.

If the modw_ondemand.location table has a row with unknown as its value for city, state, and country; and Unknown as its value for name; the upgrade will change the values for city, state, and country to NA for that row.

The upgrade will add tables for modw_ondemand.request_method, modw_ondemand.request_path, modw_ondemand.reverse_proxy_host, and modw_ondemand.reverse_proxy_port.

During the upgrade, if the modw_ondemand.location table has a row with unknown as its value for city, state, and country, these will be replaced with the value NA.