Skip to content

Investigate removal of repositories other than maven central from POMs #2680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GCHQDeveloper314 opened this issue Jun 17, 2022 · 2 comments · Fixed by #3006
Closed

Investigate removal of repositories other than maven central from POMs #2680

GCHQDeveloper314 opened this issue Jun 17, 2022 · 2 comments · Fixed by #3006
Assignees
Labels
dependencies Updates/changes to Maven or other dependencies enhancement Improvement to existing functionality/feature
Milestone

Comments

@GCHQDeveloper314
Copy link
Member

It appears that all the dependencies required by Gaffer are available in Maven central, which is the default repository used by Maven. Although this may not have been the case in the past. When running builds Maven occasionally tries to check repos.spark-packages.org if it can't find a package in Maven central. This is often because of a mistake with the version.

It's unclear if this repository (see below - defined in spark library) is actually required or if it can be removed. On a clean installation of Maven with no preexisting dependencies downloaded, investigate to see if it can be removed without causing any missing dependencies.

<repositories>
   <repository>
      <id>Spark Packages</id>
      <url>https://repos.spark-packages.org/</url>
   </repository>
</repositories>
@GCHQDeveloper314 GCHQDeveloper314 added the good first issue Small, lower complexity and doesn't require pre-existing Gaffer knowledge label Jun 17, 2022
@GCHQDeveloper314 GCHQDeveloper314 added this to the v2_backlog milestone Jun 17, 2022
@GCHQDeveloper314
Copy link
Member Author

At least the module spark-library requires graphframes:graphframes which is not in Maven central. There doesn't appear to be a way to prevent maven from also trying to use this repository when looking for other dependencies.

Potentially the problem here is the Maven central repository being used as the fallback for spark modules due to being below the Spark repository in the repositories definitions. Further testing and looking at the Super-POM will answer this. If Maven central is also specified that may correct the order.

@GCHQDeveloper314 GCHQDeveloper314 added dependencies Updates/changes to Maven or other dependencies and removed good first issue Small, lower complexity and doesn't require pre-existing Gaffer knowledge labels Jul 6, 2023
@GCHQDeveloper314
Copy link
Member Author

GCHQDeveloper314 commented Jul 6, 2023

Running mvn help:effective-pom -Dverbose -pl :spark-library confirms that the way the spark-packages repo is specified causes it to take precedence over the default central repo:

<repositories>
    <repository>
      <id>Spark Packages</id>  <!-- uk.gov.gchq.gaffer:spark:2.0.1-SNAPSHOT, line 35 -->
      <url>https://repos.spark-packages.org/</url>  <!-- uk.gov.gchq.gaffer:spark:2.0.1-SNAPSHOT, line 36 -->
    </repository>
    <repository>
      <snapshots>
        <enabled>false</enabled>  <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 33 -->
      </snapshots>
      <id>central</id>  <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 28 -->
      <name>Central Repository</name>  <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 29 -->
      <url>https://repo.maven.apache.org/maven2</url>  <!-- org.apache.maven:maven-model-builder:3.8.6:super-pom, line 30 -->
    </repository>
  </repositories>

As a result, Maven will check the spark repo ahead of central. See Maven docs for the priority used. When cloning the project for the first time this can cause significant delays while Maven tries to fetch from this repo, only falling back to fetching from central after timing out in some cases.

The PR to fix this adds central to the POM above spark-packages. This ensures it is only used as a fallback when the single package graphframes:graphframes is not found on Maven central.

@GCHQDeveloper314 GCHQDeveloper314 modified the milestones: Backlog, v2.1.0 Jul 6, 2023
GCHQDeveloper314 added a commit that referenced this issue Jul 20, 2023
* Add central repo ahead of spark-packages
This fixes the issue where Maven uses the spark repo as the default instead of central. It's now used only when a package isn't found on central.
@t92549 t92549 added the enhancement Improvement to existing functionality/feature label Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Updates/changes to Maven or other dependencies enhancement Improvement to existing functionality/feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants