Skip to content

optimize pullRemote for large repos #985

@coopernetes

Description

@coopernetes

Is your feature request related to a problem? Please describe.
On particularly large repositories as well as when GitProxy is deployed in a production-like environment (constrained or at least defined upper limits on CPU/memory), pullRemote step will sometimes cause the git client to time out and return a server error if it takes too long to complete.

We should consider whether we need to pull down the whole history every time there is a new push. Using depth: 1, singleBranch: true on the git clone call seems to reduce the amount of time (in our particular instance) that the push takes and allows it to proceed through proxy processing normally.

Describe the solution you'd like

  • Tweak the isomorphic-git call during cloning to the minimum required to produce a working diff.
  • Pull down selective history such as default branches only?
  • Make it configurable?

Describe alternatives you've considered
Increasing the timeout for the server (we have it deployed behind a load balancer) helps marginally but the full clone can be very expensive with large repos.

Additional context

This is likely a problem that affects outdated branches more than anything. When a commit drifts (is left stale for days, weeks or months), the combination of the long clone time + large diff usually results in server errors.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions