Installing → Upgrade Guide
General Upgrade Notes
The OnDemand module for Open XDMoD should be upgraded at the same time as the main Open XDMoD software. The upgrade procedure is documented on the Open XDMoD upgrade page. Downloads of RPMs and source packages for the OnDemand module for Open XDMoD are available from GitHub.
11.0.0 Upgrade Notes
Open XDMoD 11.0.0 fundamentally changes how page impressions, sessions, and
applications are counted and categorized, as described in detail in the
sections below. The changes only apply to newly ingested Open OnDemand web
server logs after upgrading to 11.0.0. If you have already ingested logs with a
prior version of Open XDMoD, those logs will not be able to be recounted and
recategorized using the new methods from 11.0.0 unless you still have copies of
the original log files, in which case you can delete the corresponding rows
from the modw_ondemand.page_impressions
database table and reingest and
reaggregate the logs following the instructions below:
- Back up the
modw_ondemand.page_impressions
database table (e.g., usingmysqldump
). - Run the Bash loop below in the directory containing the log files to find
the earliest and latest timestamps in the logs:
earliest=9999999999; latest=0; while read line; do current=$(date -d "$line" +"%s"); if [ $current -lt $earliest ]; then earliest=$current; fi; if [ $current -gt $latest ]; then latest=$current; fi; done < <(cat *.log* | cut -d ']' -f 1 | cut -d '[' -f 2 | sed 's#/#-#g' | sed 's/:/ /'); echo -e "Earliest: $earliest\nLatest: $latest"
- Run the SQL command below to list the page impressions that will be deleted,
replacing
:earliest
with the earliest timestamp and:latest
with the latest timestamp obtained in the previous step:SELECT * FROM modw_ondemand.page_impressions WHERE log_time_ts BETWEEN :earliest AND :latest
- If that is the correct list of page impressions you want to delete, run the
same SQL command from the previous step, replacing
SELECT *
withDELETE
. - Reingest and reaggregate the logs following these instructions.
The sections below explain the details of the changes in 11.0.0.
Using request path instead of Referer header
In previous versions, during ingestion of the Open OnDemand web server logs, the Referer header of each line was used to determine which application was being requested for that line. For example, take the following line from a web server log file:
127.0.0.1 - sfoster [21/Feb/2024:22:30:56 +0000] "GET /pun/sys/dashboard/batch_connect/sys/jupyter/session_contexts/new HTTP/1.1" 200 13058 "https://resource.example.com/pun/sys/dashboard" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
In this example, the Referer header is
https://localhost:3443/pun/sys/dashboard
, so prior to 11.0.0, this would be
counted as a request for the sys/dashboard
application. However, the Referer
header actually indicates from which page the request originated, not the page
that is actually being requested. The actual application being requested on
this line is indicated by the request path,
/pun/sys/dashboard/batch_connect/sys/jupyter/session_contexts/new
, which
would be the sys/jupyter
application. In version 11.0.0, the request path is
now used to determine which application was being requested.
For more information on how applications are categorized, including instructions for how to recategorize them, see this page.
In order to enable parsing of the request path (and the request method, which
is not currently displayed in the XDMoD portal but is used for deduplication
and may be displayed in a future version), the webserver_format_str
defined
in portal_settings.d/ondemand.ini
must include either %r
or both of %m
and %U
.
11.0.0 also changes the criteria used for deduplicating page impressions. Prior to 11.0.0, only the request time, user, and application were used. In 11.0.0, the resource, request path, request method, reverse proxy host and port (if applicable), browser geolocation (if applicable), browser family, and OS family are also used to deduplicate page impressions.
Removal of -u
/ --url
option from xdmod-ondemand-ingestor
Prior to 11.0.0, logs would only be ingested if the Referer header matched the
value of the -u
or --url
option to xdmod-ondemand-ingestor
. For example,
if the Referer header were https://resource.example.com/pun/sys/dashboard
,
then the ingestor would only ingest the line if the -u
https://resource.example.com
or --url https://resource.example.com
option
were passed to xdmod-ondemand-ingestor
. Now, the Referer header is not used,
so the -u
/ --url
option is removed from xdmod-ondemand-ingestor
. All
page impressions in the file(s) being ingested will be assigned to the resource
specified by the -r
or --resource
option to xdmod-ondemand-ingestor
.
Separation of ingestion and aggregation steps
Prior to 11.0.0, xdmod-ondemand-ingestor
would always run both ingestion and
aggregation. In 11.0.0, ingestion and aggregation can now be run as separate
steps. The usage is explained on this page.
Inclusion of reverse proxy hosts and ports
In Open OnDemand, requests can be reverse proxied to other servers such as
Jupyter notebook servers, RStudio servers, VNC servers, etc. (details
here).
Prior to Open XDMoD 11.0.0, such requests were not counted in XDMoD. In 11.0.0,
such requests are now counted, and the reverse proxy host and port are
preserved in the modw_ondemand.page_impressions
table (but the host and port
are not currently displayed in the XDMoD portal; they may be in a future
version).
Exclusion of non page impressions
In 11.0.0, requests for app icons, images, stylesheets, scripts, datafiles,
etc. are not counted as page impressions unless they were being loaded from
the OnDemand “Files” or “File Editor” applications. Requests are only ingested
if the request path starts with /pun/
, /node/
, or /rnode/
; the request is
from an authenticated user (i.e., not "-"
); and the request is not for one of
the excluded file extensions from the list below (this is defined in the
configuration file etl/etl_action_defs.d/ood/normalized.json
).
aff, css, dic, eot, gif, ico, jpeg, jpg, js, json, map, mp3, oga, ogg, otf, png, rstheme, svg, ttf, wasm, woff, woff2
Removing ihpc
from application names
Prior to 11.0.0, some applications were given a name with (sys/ihpc)
at the
end. iHPC
is an old name for interactive OnDemand apps; this name is no
longer used. In 11.0.0, page impressions that are ingested will no longer have
(sys/ihpc)
in their application names. Applications for page impressions
that have already been ingested can be recategorized as explained on
this page.
Speeding up person lookup
Prior to 11.0.0, each time ingestion was run, the ingestor would try to match
the username of all unknown people from the modw_ondemand.page_impressions
table with usernames from the modw.systemaccount
table. 11.0.0 still does
this, but rather than doing so for all unknown people in the table, it only
does it for the page impressions that were ingested during the current run of
xdmod-ondemand-ingestor
. This speeds up the overall ingestion.
Allowing @
in usernames
The xdmod-ondemand-ingestor
now allows the @
character to appear in
ingested usernames.
Configuration File Changes
The upgrade renames etl/etl_data.d/ood/application_map.json
to
etl/etl_data.d/ood/application-map.json
and updates it with additional
application mappings. See this page for
information on how to recategorize applications.
Database Changes
During the upgrade, the modw_ondemand.staging
table will have its
header_referer
column removed and columns added for request_method
and
request_path
.
The modw_ondemand.normalized
table will be truncated during the upgrade. It
will also receive new columns for id
(which is now its primary key),
request_path
, request_method
, reverse_proxy_host
, and
reverse_proxy_port
. Its unique index will be updated to no longer include
app
and to include request_path
, request_method
, ua_family
, and
ua_os_family
. In order to fit the new index, the ua_family
and
ua_os_family
columns are downsized from VARCHAR(255)
to VARCHAR(32)
.
During the upgrade, the modw_ondemand.page_impressions
table will have its
id
column updated to use bigint(20) unsigned
instead of int(11)
to be
able to accommodate more than 2,147,483,647 page impressions. It will also have
columns added for request_path_id
, request_method_id
,
reverse_proxy_host_id
, and reverse_proxy_port
. Its unique index will be
updated to remove app_id
and to include resource_id
, request_path_id
,
request_method_id
, reverse_proxy_host_id
, reverse_proxy_port
,
app_id
, location_id
, ua_family_id
, and ua_os_family_id
. Indexes will be
added to speed up aggregation, person lookup, and application recategorization.
If the modw_ondemand.location
table has a row with unknown
as its value for
city
, state
, and country
; and Unknown
as its value for name
; the
upgrade will change the values for city
, state
, and country
to NA
for
that row.
The upgrade will add tables for modw_ondemand.request_method
,
modw_ondemand.request_path
, and modw_ondemand.reverse_proxy_host
.
11.0.1 Upgrade Notes
Configuration File Changes
Fixing application mapping of noVNC page impressions
This release updates application-map.json
to fix the application mapping of
page impressions for OnDemand applications launched via noVNC, specifically
page impressions whose request paths are of this form:
/pun/sys/dashboard/noVNC-[version]/vnc.html?[params]&commit=Launch+[app]
Previously, these page impressions were mapped to the sys/dashboard
application. This release fixes this to map them to the value of [app]
. For
example, a page impression with a request path that has the following form will
be mapped to the application Desktop
:
/pun/sys/dashboard/noVNC-1.3.0/vnc.html?[params]&commit=Launch+Desktop
This new mapping will apply to any new page impressions that are ingested into XDMoD. Page impressions that have already been ingested will also be remapped during the upgrade.
Fixing request path filtering of File Editor page impressions
This release fixes request-path-filter.json
to fix the request path filter
for categorizing page impressions for requests of the OnDemand File Editor app.
In 11.0.0, if a page impressions had a request with a path of the following
form:
/pun/sys/dashboard/files/edit/[path]
it would mistakenly map that to this request path instead:
/pun/sys/dashboard/files/[path]
This is fixed in 11.0.1 for any new page impressions that are ingested into XDMoD. Page impressions that have already been ingested will also be remapped during the upgrade.
Database Changes
Release 11.0.0 had bugs in which columns in the
modw_ondemand.page_impressions
table were too small to fit the corresponding
IDs in the dimension tables. Specifically, the
modw_ondemand.page_impressions.reverse_proxy_port_id
column was of type
smallint(5) unsigned
, but the corresponding
modw_ondemand.reverse_proxy_port.id
column was of type int(11)
. This meant
that all values of modw_ondemand.reverse_proxy_port.id
> 65535 (the maximum
value of smallint(5) unsigned
) would have the wrong value stored in
modw_ondemand.page_impressions.reverse_proxy_port_id
(they all have 65535).
Similarly, the modw_ondemand.page_impressions.request_method_id
column was of
type tinyint
, but the corresponding modw_ondemand.request_method.id
column
was of type int(11)
. This meant
that all values of modw_ondemand.request_method.id
> 127 (the maximum
value of tinyint
) would have the wrong value stored in
modw_ondemand.page_impressions.request_method_id
(they all have 127).
If you are upgrading directly from 10.5 to 11.0.1, this will not be an issue.
However, if you previously upgraded from 10.5.0 to 11.0.0 and are now upgrading from 11.0.0 to 11.0.1, the upgrade will automatically do the following:
- Create a new
modw_ondemand.page_impressions.reverse_proxy_port
column. - Fill it in with the corresponding values from
modw_ondemand.reverse_proxy_port.port
(but only formodw_ondemand.page_impressions.reverse_proxy_port_id
< 65535). - Drop the
modw_ondemand.page_impressions.reverse_proxy_port_id
column. - Drop the
modw_ondemand.reverse_proxy_port
table.
It will also resize the modw_ondemand.page_impressions.request_method_id
column to int(11)
.
It will also fix the application mapping of noVNC page impressions and the
request path filtering of File Editor page impressions as explained in the
section above, Configuration File Changes
.
Remapping the reverse proxy port IDs
Because the correct mapping of port numbers cannot be determined for
modw_ondemand.page_impressions.reverse_proxy_port_id
≥ 65535, these rows will
have their modw_ondemand.page_impressions.reverse_proxy_port
column set to
0
during the upgrade. In order for these rows to be fixed to the correct
value, they will need to be reingested from the original OnDemand web server
log files. The recommended way to do this is as follows.
- Make sure to follow these steps when the automated ingestion and aggregation of OnDemand logs are NOT running.
- Make a backup of the database, specifically the
modw_ondemand
schema, in case you need to recover it later. - Run the SQL below to delete all the rows from the
modw_ondemand.page_impressions
table whosereverse_proxy_port
is0
and whosereverse_proxy_host_id
is not-1
(this is important because0
will also be the value ofreverse_proxy_port
for page impressions for apps that are not running on a reverse proxy server, that is, whosereverse_proxy_host_id
is-1
):DELETE FROM modw_ondemand.page_impressions WHERE reverse_proxy_port = 0 AND reverse_proxy_host_id != -1 AND reverse_proxy_host_id != ( SELECT id FROM modw_ondemand.reverse_proxy_host WHERE name = '-1' );
- Reingest and aggregate the original log
files. You can limit it to just the relevant lines by using grep to search
for requests for apps running on reverse proxy servers and output the
relevant lines into new files. The following command will do this for log
files matching the filename pattern
*.log*
and create new files in the directory/tmp/ood-logs
(make sure tomkdir
it first), which you can then ingest and aggregate.for i in *.log*; do grep '/r\?node/' $i > /tmp/ood-logs/$i; done
Remapping the request method IDs
You may need to run manual SQL to fix the request method IDs of the already ingested page impressions. First run the SQL below and check how many request methods have IDs ≥ 127:
SELECT * FROM modw_ondemand.request_method;
- If there are no rows with ID ≥ 127, you do not need to do anything further to remap the request methods.
- If only one row has ID ≥ 127, you can fix the mapping by doing the following.
- Make sure to follow these steps when the automated ingestion and aggregation of OnDemand logs are NOT running.
- First make a backup of the database, specifically the
modw_ondemand
schema, in case you need to recover it later. - Run the following SQL:
UPDATE modw_ondemand.page_impressions SET request_method_id = ( SELECT id FROM modw_ondemand.request_method WHERE request_method_id >= 127 ) WHERE request_method_id = 127;
- If more than one row has ID ≥ 127, you will need to do the following.
- Make sure to follow these steps when the automated ingestion and aggregation of OnDemand logs are NOT running.
- First make a backup of the database, specifically the
modw_ondemand
schema, in case you need to recover it later. - Run the query below to delete all the rows from the
modw_ondemand.page_impressions
table whoserequest_method_id
≥ 127:DELETE FROM modw_ondemand.page_impressions WHERE request_method_id >= 127;
- Reingest and aggregate the original log
files. You can limit it to just the relevant lines by using grep to
search for the request method and output the relevant lines into new
files. The following command will do this for log files matching the
filename pattern
*.log*
and create new files in the directory/tmp/ood-logs
(make sure tomkdir
it first), which you can then ingest and aggregate. ReplaceMETHOD
with the given request method:for i in *.log*; do grep '] "METHOD ' $i > /tmp/ood-logs/$i; done