Skip to content

UFAL/Matomo fix tracking of the statistics#912

Merged
milanmajchrak merged 8 commits intodtq-devfrom
ufal/matomo-fix-tracking
Apr 4, 2025
Merged

UFAL/Matomo fix tracking of the statistics#912
milanmajchrak merged 8 commits intodtq-devfrom
ufal/matomo-fix-tracking

Conversation

@milanmajchrak
Copy link
Copy Markdown
Collaborator

@milanmajchrak milanmajchrak commented Apr 3, 2025

Phases MP MM MB MR JM Total
ETA 0 0 0 0 0 0
Developing 0 0 0 0 0 0
Review 0 0 0 0 0 0
Total - - - - - 0
ETA est. 0
ETA cust. - - - - - 0

Problem description

Summary by CodeRabbit

  • New Features

    • Enhanced file download tracking to provide more precise analytics for individual file and ZIP archive downloads.
    • Improved download URL processing ensures clearer differentiation between single downloads and ZIP file downloads.
  • Tests

    • Added comprehensive test coverage to validate the updated download tracking functionality, including scenarios for valid and invalid requests.
    • Introduced a new test class specifically for validating the functionality of the enhanced download tracking.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request enhances the tracking of bitstream downloads across various components. The ClarinMatomoBitstreamTracker class now extracts a UUID from the action URL, and a new boolean parameter distinguishes between individual and ZIP downloads. A dedicated method has been added to log ZIP downloads. Additionally, a new utility method extracts UUIDs from URLs using a regex with robust error handling. The REST controllers have been updated to utilize the revised tracker, and new tests validate the changes.

Changes

File(s) Change Summary
dspace-api/.../ClarinMatomoBitstreamTracker.java Updated preTrack to extract Bitstream UUID from the URL; added an isZip parameter to trackBitstreamDownload; introduced logUserDownloadingZip method; enhanced error logging for null bitstreams and malformed URLs.
dspace-api/.../Utils.java Introduced UUID_PATTERN and added the static method fetchUUIDFromUrl(String urlString) to extract UUIDs from URLs with error handling, including throwing IllegalArgumentException for missing or invalid UUIDs.
dspace-api/.../ClarinMatomoBitstreamTrackerTest.java Added a new test class using JUnit and Mockito to validate bitstream download tracking; covers scenarios for both valid and erroneous URL UUID extraction, ensuring proper logging and URL generation.
dspace-server-webapp/.../BitstreamRestController.java
dspace-server-webapp/.../MetadataBitstreamController.java
Modified REST controllers to call the updated trackBitstreamDownload method with the new boolean flag; added dependency injection for the tracker in the MetadataBitstreamController and adjusted the logic to differentiate between single bitstream and ZIP file downloads.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RESTController
    participant Tracker
    participant Utils
    participant Matomo

    Client->>RESTController: Request download (bitstream or ZIP)
    RESTController->>Tracker: trackBitstreamDownload(context, request, bit, isZip)
    Tracker->>Utils: fetchUUIDFromUrl(actionURL)
    Utils-->>Tracker: Return UUID or error
    alt isZip == true
        Tracker->>Matomo: logUserDownloadingZip(item)
    else
        Tracker->>Matomo: logUserDownloadingBitstream(bit)
    end
Loading

Possibly related PRs

  • UFAL/Matomo fix tracking of the statistics #912: The changes in the main PR are directly related to those in the retrieved PR, as both involve modifications to the ClarinMatomoBitstreamTracker class, specifically the trackBitstreamDownload method, which has been updated to include a new parameter in both cases.

Poem

Oh, I’m a rabbit, hopping with delight,
In code-carrots fresh and updates so bright.
Bitstreams now tracked with a zip or a skip,
UUIDs and logs in a clever new script.
Through fields of data, I cheerfully bound—
A happy bunny in code, where wonders are found!


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8756dd3 and fcdea55.

📒 Files selected for processing (1)
  • dspace-api/src/test/java/org/dspace/statistics/ClarinMatomoBitstreamTrackerTest.java (1 hunks)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
dspace-api/src/main/java/org/dspace/app/statistics/clarin/ClarinMatomoBitstreamTracker.java (1)

80-91: Improved URL construction with proper error handling.

The changes improve how download URLs are constructed by extracting the Bitstream UUID from the action URL. This provides more specific tracking information while maintaining fallback behavior if extraction fails.

I would suggest enhancing the error logging to include the exception details for easier troubleshooting:

- log.error("Cannot get the Bitstream UUID from the URL {}", matomoRequest.getActionUrl());
+ log.error("Cannot get the Bitstream UUID from the URL {}: {}", matomoRequest.getActionUrl(), e.getMessage(), e);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ecfa18 and 25e2978.

📒 Files selected for processing (5)
  • dspace-api/src/main/java/org/dspace/app/statistics/clarin/ClarinMatomoBitstreamTracker.java (6 hunks)
  • dspace-api/src/main/java/org/dspace/core/Utils.java (3 hunks)
  • dspace-api/src/test/java/org/dspace/statistics/ClarinMatomoBitstreamTrackerTest.java (1 hunks)
  • dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamRestController.java (1 hunks)
  • dspace-server-webapp/src/main/java/org/dspace/app/rest/MetadataBitstreamController.java (4 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
dspace-server-webapp/src/main/java/org/dspace/app/rest/MetadataBitstreamController.java (1)
dspace-api/src/main/java/org/dspace/app/statistics/clarin/ClarinMatomoBitstreamTracker.java (1)
  • ClarinMatomoBitstreamTracker (41-194)
dspace-api/src/test/java/org/dspace/statistics/ClarinMatomoBitstreamTrackerTest.java (1)
dspace-api/src/main/java/org/dspace/app/statistics/clarin/ClarinMatomoBitstreamTracker.java (1)
  • ClarinMatomoBitstreamTracker (41-194)
dspace-api/src/main/java/org/dspace/app/statistics/clarin/ClarinMatomoBitstreamTracker.java (1)
dspace-api/src/main/java/org/dspace/core/Utils.java (1)
  • Utils (54-581)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: dspace-dependencies / docker-build (linux/amd64, ubuntu-latest, true)
  • GitHub Check: Run Integration Tests
  • GitHub Check: Run Unit Tests
🔇 Additional comments (20)
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamRestController.java (1)

168-168: Updated method call to support the enhanced download tracking.

The trackBitstreamDownload method call now includes a boolean parameter false to indicate this is a single bitstream download rather than a ZIP file. This aligns with the updated method signature in the ClarinMatomoBitstreamTracker class.

dspace-server-webapp/src/main/java/org/dspace/app/rest/MetadataBitstreamController.java (4)

31-31: Appropriately added import for the new dependency.

The import for ClarinMatomoBitstreamTracker is needed to support the new tracking functionality.


75-76: Added dependency injection for tracking functionality.

The autowired tracker instance will be used to track ZIP file downloads.


115-139: Added tracking for the first bitstream in ZIP downloads.

A reference to the first encountered bitstream is stored to be used for statistics tracking. This is a pragmatic approach since we need at least one bitstream to track the download event.


143-143: Added tracking for ZIP file downloads.

The method now tracks ZIP file downloads, passing true as the fourth parameter to indicate this is a ZIP download rather than a single bitstream. This complements the tracking for single bitstreams in BitstreamRestController.

dspace-api/src/main/java/org/dspace/core/Utils.java (2)

60-62: Added UUID pattern for regex matching.

A compiled regex pattern to efficiently match UUIDs in standard format has been defined. This pattern will be reused by the new utility method.


556-580: Added robust utility method to extract UUIDs from URLs.

The new fetchUUIDFromUrl method provides a centralized way to extract UUIDs from URLs using regex pattern matching. The implementation includes:

  1. Proper URI parsing
  2. Comprehensive error handling with descriptive error messages
  3. Thorough documentation explaining the method's purpose and expected inputs

This utility method helps standardize UUID extraction across the application, particularly for bitstream tracking purposes.

dspace-api/src/test/java/org/dspace/statistics/ClarinMatomoBitstreamTrackerTest.java (8)

1-41: Well-structured test class setup with appropriate imports and dependencies.

The test file includes all necessary imports and properly sets up mock objects for testing. The use of @Mock and @InjectMocks annotations follows testing best practices.


42-70: Test class properly extends AbstractDSpaceTest with well-defined constants.

The test class is appropriately set up with:

  • Constants for test data like handles and URLs
  • Mock objects for all required dependencies
  • Proper inheritance from the base test class

71-75: Test setup initializes necessary context.

The setUp method establishes a fresh context for each test, following testing best practices.


76-88: Test case for successful bitstream download tracking.

This test verifies that when a valid UUID is present in the URL, the tracker correctly extracts it and uses it in the download URL construction. The test:

  1. Sets up test data with a random UUID
  2. Mocks the necessary objects and behavior
  3. Calls the method under test
  4. Verifies the expected URL format is used

This ensures the tracker works correctly with valid input.


90-103: Test case for tracking with invalid UUID in URL.

This test verifies the tracker's behavior when faced with an invalid UUID in the URL. It ensures that:

  1. The tracker gracefully handles the situation
  2. Falls back to using the original URL when UUID extraction fails

This test increases confidence in the code's error handling capabilities.


105-111: Well-factored helper method for request setup.

The mockRequest method encapsulates the common setup for HTTP requests, making the test cases more readable and maintainable.


113-124: Comprehensive helper method for bitstream and item setup.

The mockBitstreamAndItem method properly sets up all the necessary mock objects and their relationships, including metadata values, which is crucial for testing the tracker functionality.


126-134: Robust verification helper using ArgumentCaptor.

The verifyMatomoRequest method uses ArgumentCaptor to capture and verify the properties of the MatomoRequest sent to the tracker. This ensures the tracking mechanism is functioning correctly with the expected values.

dspace-api/src/main/java/org/dspace/app/statistics/clarin/ClarinMatomoBitstreamTracker.java (5)

27-27: Import addition looks appropriate.

The addition of the Utils import is correctly used to access the new fetchUUIDFromUrl method needed for URL processing.


128-129: Method signature enhancement to distinguish download types.

The addition of the isZip parameter is an effective way to differentiate between individual file downloads and ZIP downloads, allowing for specialized tracking behavior.


139-142: Added defensive null check for bitstream.

Good defensive programming practice by adding a null check for the bitstream before proceeding, with appropriate error logging.


156-162: Implemented conditional logic for download types.

The code correctly implements different logging behaviors based on whether the download is a single file or a ZIP archive, providing more accurate metrics.


183-193: Well-implemented logging for ZIP downloads.

The new method for logging ZIP downloads maintains consistent formatting with the existing logging method while providing the appropriate context-specific information. The logging pattern clearly identifies the action as downloading a ZIP file containing all bitstreams from an item.

@milanmajchrak milanmajchrak self-assigned this Apr 3, 2025
@milanmajchrak milanmajchrak linked an issue Apr 3, 2025 that may be closed by this pull request
4 tasks
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
dspace-api/src/test/java/org/dspace/statistics/ClarinMatomoBitstreamTrackerTest.java (3)

95-108: Test handles invalid UUID case properly, but contains duplicate Context initialization.

The test correctly verifies the behavior when an invalid UUID is in the URL. However, there's an unnecessary Context initialization that duplicates the one in the setUp method.

@Test
public void testTrackBitstreamDownloadWrongUrl() throws SQLException {
-    context = new Context();
    UUID bitstreamId = UUID.randomUUID();
    mockRequest("/bitstreams/NOT_EXISTING_UUID/download");
    mockBitstreamAndItem(bitstreamId);
    when(matomoTracker.sendRequestAsync(any(MatomoRequest.class)))
            .thenReturn(CompletableFuture.completedFuture(null));

    clarinMatomoBitstreamTracker.trackBitstreamDownload(context, request, bitstream, false);

    String expectedUrl = BASE_URL + "/bitstreams/NOT_EXISTING_UUID/download";
    verifyMatomoRequest(expectedUrl);
}

110-139: Well-implemented helper methods, but missing documentation.

The helper methods effectively encapsulate common testing logic, but would benefit from JavaDoc comments to explain their purpose and parameters.

Add JavaDoc to the helper methods, for example:

+/**
+ * Mock an HttpServletRequest with specific URI and properties.
+ * 
+ * @param requestURI The URI to use in the mocked request
+ */
private void mockRequest(String requestURI) {
    when(request.getRequestURI()).thenReturn(requestURI);
    when(request.getScheme()).thenReturn("http");
    when(request.getServerName()).thenReturn("example.com");
    when(request.getServerPort()).thenReturn(80);
    when(request.getHeader("Range")).thenReturn(null);
}

47-139: Insufficient test coverage for important scenarios.

The tests only cover the isZip=false case for bitstream tracking. There are several important scenarios missing:

  1. Test for when isZip=true to verify ZIP file download tracking
  2. Test when Range header is non-null (should skip tracking per the implementation)
  3. Test when tracking is disabled in configuration
  4. Test error paths (e.g., empty results from findByBitstreamUUID)

Consider adding these test methods:

@Test
public void testTrackZipDownload() throws SQLException {
    UUID bitstreamId = UUID.randomUUID();
    mockRequest("/bitstreams/" + bitstreamId + "/download");
    mockBitstreamAndItem(bitstreamId);
    when(matomoTracker.sendRequestAsync(any(MatomoRequest.class)))
            .thenReturn(CompletableFuture.completedFuture(null));

    clarinMatomoBitstreamTracker.trackBitstreamDownload(context, request, bitstream, true);

    String expectedUrl = LOCALHOST_URL + "/bitstream/handle/" + HANDLE + "/" + bitstreamId;
    verifyMatomoRequest(expectedUrl);
}

@Test
public void testSkipTrackingWithRangeHeader() throws SQLException {
    UUID bitstreamId = UUID.randomUUID();
    mockRequest("/bitstreams/" + bitstreamId + "/download");
    // Override Range header to be non-null
    when(request.getHeader("Range")).thenReturn("bytes=0-1000");
    mockBitstreamAndItem(bitstreamId);

    clarinMatomoBitstreamTracker.trackBitstreamDownload(context, request, bitstream, false);

    // Verify that no tracking request was sent
    verify(matomoTracker, times(0)).sendRequestAsync(any(MatomoRequest.class));
}

@Test
public void testSkipTrackingWhenDisabled() throws SQLException {
    UUID bitstreamId = UUID.randomUUID();
    mockRequest("/bitstreams/" + bitstreamId + "/download");
    mockBitstreamAndItem(bitstreamId);
    // Configure tracking to be disabled
    when(configurationService.getBooleanProperty("matomo.track.enabled")).thenReturn(false);

    clarinMatomoBitstreamTracker.trackBitstreamDownload(context, request, bitstream, false);

    // Verify that no tracking request was sent
    verify(matomoTracker, times(0)).sendRequestAsync(any(MatomoRequest.class));
}

@Test
public void testNoItemFound() throws SQLException {
    UUID bitstreamId = UUID.randomUUID();
    mockRequest("/bitstreams/" + bitstreamId + "/download");
    when(bitstream.getID()).thenReturn(bitstreamId);
    // Return empty list to simulate no items found
    when(clarinItemService.findByBitstreamUUID(context, bitstreamId)).thenReturn(Collections.emptyList());

    clarinMatomoBitstreamTracker.trackBitstreamDownload(context, request, bitstream, false);

    // Verify that no tracking request was sent
    verify(matomoTracker, times(0)).sendRequestAsync(any(MatomoRequest.class));
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dcca4a and 9c1464c.

📒 Files selected for processing (1)
  • dspace-api/src/test/java/org/dspace/statistics/ClarinMatomoBitstreamTrackerTest.java (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
dspace-api/src/test/java/org/dspace/statistics/ClarinMatomoBitstreamTrackerTest.java (1)
dspace-api/src/main/java/org/dspace/app/statistics/clarin/ClarinMatomoBitstreamTracker.java (1)
  • ClarinMatomoBitstreamTracker (41-200)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Run Integration Tests
  • GitHub Check: dspace-dependencies / docker-build (linux/amd64, ubuntu-latest, true)
  • GitHub Check: Run Unit Tests
🔇 Additional comments (3)
dspace-api/src/test/java/org/dspace/statistics/ClarinMatomoBitstreamTrackerTest.java (3)

1-47: Well-structured test class with clear purpose.

The test class is well-organized with proper imports, documentation, and mock setup to test the ClarinMatomoBitstreamTracker functionality.


74-79: Good setup method.

The setup method correctly initializes a new Context object before each test.


81-93: Well-implemented test for single file download tracking.

The test correctly verifies that tracking a bitstream download works as expected by:

  1. Creating a random UUID for the bitstream
  2. Mocking the necessary dependencies
  3. Calling the method under test
  4. Verifying the correct URL is sent to the Matomo tracker

@milanmajchrak milanmajchrak requested a review from vidiecan April 4, 2025 08:04
vidiecan
vidiecan previously approved these changes Apr 4, 2025
@milanmajchrak milanmajchrak merged commit 1ed4a2d into dtq-dev Apr 4, 2025
6 of 7 checks passed
milanmajchrak added a commit that referenced this pull request Apr 4, 2025
* Bitstream preview wrong file name according to it's mimetype (#890)

* The owning community was null. (#891)

* The + characted was wrongly encoded in the URL (#893)

* Set limit when splitting key/value using = (#894)

* File preview - Added the method for extracting the file into try catch block (#909)

* Fix parts identifiers resolution (#913)

* Renamed property dspace.url to dspace.ui.url (#906)

* Update clarin-dspace.cfg - handle.plugin.checknameauthority (#897)

* File preview - Return empty list if an error has occured (#915)

* Matomo fix tracking of the statistics (#912)
kosarko added a commit to ufal/clarin-dspace that referenced this pull request Apr 10, 2025
Merging latest dataquest-dev/dspace:dtq-dev

This contains the following commits:

Run build action every 4h for every customer/ branch
UFAL/Do not use not-existing metadatafield `hasMetadata` in the submission-forms-cz (dataquest-dev#888)
UFAL/Created job to generate preview for every item or for a specific one (dataquest-dev#887)
UFAL/bitstream preview wrong file name according to it's mimetype (dataquest-dev#890)
Fixed typo in the error exception
The owning community was null. (dataquest-dev#891)
The `+` characted was wrongly encoded in the URL (dataquest-dev#893)
Set limit when splitting key/value using `=` (dataquest-dev#894)
Ufal/header value could have equals char (dataquest-dev#895)
UFAL/File preview - Added the method for extracting the file into try catch block (dataquest-dev#909)
UFAL/File preview better logs (dataquest-dev#910)
UFAL/File preview - Return empty list if an error has occured (dataquest-dev#915)
UFAL/Matomo fix tracking of the statistics (dataquest-dev#912)
UFAL/Matomo statistics - Use the bitstream name instead of the UUID in the tracking download url (dataquest-dev#917)
UFAL/Matomo bitstream tracker has error when bitstream name was null (dataquest-dev#918)
UFAL/Endpoints leaks private information (dataquest-dev#924)

UFAL/Fix parts identifiers resolution (dataquest-dev#913)
UFAL/Update `clarin-dspace.cfg` - handle.plugin.checknameauthority (dataquest-dev#897)

Creating Legal check (dataquest-dev#863)

import/comment-license-script (dataquest-dev#882)

UFAL/Renamed property dspace.url to dspace.ui.url (dataquest-dev#906)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UFAL/Matomo issues

2 participants