Skip to content

Google scholar fixes#868

Merged
tdonohue merged 8 commits intoDSpace:mainfrom
atmire:google-scholar-fixes
Sep 30, 2020
Merged

Google scholar fixes#868
tdonohue merged 8 commits intoDSpace:mainfrom
atmire:google-scholar-fixes

Conversation

@artlowel
Copy link
Copy Markdown
Member

@artlowel artlowel commented Sep 18, 2020

References

Description

This PR adds an option to rewrite URLs for Bitstream downloads and the citation_pdf_url <meta> tag using the UI domain. It also fixes the citation_abstract_html_url field, and adds docs on how to configure the proxy for bitstream downloads using httpd.

Instructions for Reviewers

This PR adds a rewriteDownloadUrls environment property, that defaults to false. When set to true it will change download URLs and the citation_pdf_url <meta> tag on item pages to be based on the UI origin instead of the REST origin.

That way, if you're hosting the rest api and the UI on different domains, download links will still appear to be hosted on the UI.

It also ensures the citation_abstract_html_url <meta> tag always uses the UI origin.

Finally it adds some documentation on how to configure the proxy for bitstream downloads using httpd.

In order to test this PR:

  • Verify that using the default settings, bitstream download URLs and the citation_pdf_url <meta> tag are unaffected: they use the REST origin
  • Verify that if you set rewriteDownloadUrls to true in your environment file, the bitstream download URLs and the citation_pdf_url <meta> tag use the UI origin
  • Verify that either way the citation_abstract_html_url <meta> tag uses the UI origin.

Checklist

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes TSLint validation using yarn run lint
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • If my PR includes new, third-party dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.

@artlowel artlowel added this to the 7.0beta4 milestone Sep 18, 2020
@artlowel artlowel added the 1 APPROVAL pull request only requires a single approval to merge label Sep 18, 2020
@lgtm-com
Copy link
Copy Markdown

lgtm-com bot commented Sep 18, 2020

This pull request introduces 4 alerts when merging 9757e66 into 9f396f7 - view on LGTM.com

new alerts:

  • 4 for Unused variable, import, function or class

@lgtm-com
Copy link
Copy Markdown

lgtm-com bot commented Sep 21, 2020

This pull request fixes 1 alert when merging 9886ba8 into 9f396f7 - view on LGTM.com

fixed alerts:

  • 1 for Unused variable, import, function or class

Copy link
Copy Markdown
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 @artlowel : Code looks good here, and I've tested and this works as described (UI rewrites the URL when flag is enabled). I've not tested the new docs on setting up the proxy server, but those can be tweaked as needed in the future.

Minor comments/questions inline. But, I think this is a good first implementation which we can have Google Scholar test against.

Indexers such as Google Scholar require that files are hosted on the same domain as the page that links them. In DSpace 7, Bitstreams are served from the REST server. So if you use different servers for the REST api and the UI you'll want to ensure that Bitstream downloads are proxied through the UI server.

In order to achieve this we'll need to do two things:
- **Proxy the Bitstream downloads through the UI server.** You'll need to put a webserver such as httpd or nginx in front of the UI server in order to achieve this. [Below](#apache-http-server-config) you'll find a section explaining how to do it in httpd.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question perhaps, but I have to ask.... What is the reason for requiring a separate server-level proxy configuration, instead of just using something like http-proxy-middleware (which we already use to proxy sitemaps)?

I'm assuming the server-level proxy likely results in better performance here?

Just thought I'd ask as I'm sure others will ask this same thing eventually, once they notice we are auto-proxying sitemaps, but not auto-proxying downloads.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is both performance, and the ease with which it would be possible to tweak or add e.g. caching to this config. That isn't supported out of the box for http-proxy-middleware and would require quite a bit of JS code to add.

For sitemaps it would likely also be a better idea to proxy them using httpd, that is something we should consider adding to our installation documentation. But because sitemaps are relatively small files, and because every installation of dspace 7 will need it, we decided to make it available out of the box for people that want a simple setup.

A setup with a rest server and a UI on different hosts is more complicated by definition, so it is likely we can expect it won't have the express server for the UI or the tomcat for rest being the direct access points for clients

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artlowel : Thanks for clarifying. I wanted to ensure this decision was documented clearly in this PR, which your comment now does. I think this is perfectly OK for now, I just wanted us to clarify why file downloads are (at least currently) proxied differently from sitemaps....and I do agree that auto-proxying sitemaps is much more important here, as we want to ensure those are always enabled by default (for SEO purposes). I'm OK with the extra steps regarding file downloads if you choose a more complex installation (of different hosts for UI vs REST)

@lgtm-com
Copy link
Copy Markdown

lgtm-com bot commented Sep 30, 2020

This pull request fixes 1 alert when merging 764cfc5 into c67d2e6 - view on LGTM.com

fixed alerts:

  • 1 for Unused variable, import, function or class

@tdonohue
Copy link
Copy Markdown
Member

Merging as my minor feedback/questions were addressed above. Thanks @artlowel !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1 APPROVAL pull request only requires a single approval to merge high priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

citation_abstract_html_url must be based on the Angular UI Hostname citation_pdf_url must be based on the Angular UI Hostname

3 participants