Johtizen

software engineering craftmanship blog
Home / About

View on GitHub
21 February 2021

Analyze java dependencies with jQAssistant

by JohT

jQAssistant extracts meta data of java applications and writes them into Neo4j, a native graph database. This blog shows how these tools can be used to analyze java jar dependencies e.g. for version updates.

Table of Contents

  1. Table of Contents
  2. Fast lane
  3. Getting started
  4. Get familiar with the data
  5. Query the data
    1. Query class dependencies
      1. Explanation
    2. Query Artifacts
      1. Explanation
      2. Troubleshooting
  6. Simple Data Refinement
  7. Manual Data Refinement
    1. Create relationships between Artifacts
      1. Explanation
  8. Query refined data
    1. Neo4j Graph Result
  9. Summary
    1. Simple
    2. Manual
  10. Updates
  11. References

Fast lane

(Updated November 2023)

If you’d like to start with ready-to-use reports and a fully automated analysis pipeline then have a look at Code Graph Analysis Pipeline.

The installation of the newest version of the tools is done automatically including plugins and configuration. A huge amount of preexisting queries and reports also run fully automated and provide already a lot of insights about the code base.

If you on the other hand want to start small and learn step by step then continue reading.

Getting started

As described here, all you need to get started with jQAssistant is:

Get familiar with the data

Neo4j is a graph database and uses Cypher as query language. For those who are used to SQL, “Comparing SQL with Cypher” might be a helpful guideline.

To get familiar with the data, use the database symbol on top left corner and select any of the node types to get some examples. Node symbols can be extended to show all relationships. Switching to table view shows all fields and their contents.

Query the data

Query class dependencies

Getting some dependencies between classes can be obtained with:

 MATCH (dependent:Class)-[:DEPENDS_ON]->(source:Class) 
RETURN source, dependent 
 LIMIT 25

Explanation

MATCH is somewhat comparable to SELECT in SQL. The variables dependent and source are references to nodes of type Class. Every class that depends on another class will be returned. If there is no relationship between classes, if there is another type of relationship (like “implements”) or if the dependency is in the opposite direction, then the nodes will be filtered out and will not be part of the result. Finally, all matching source and dependent class nodes will be returned, limited to at most 25.

Query Artifacts

Getting artifacts and types they require can be done using:

 MATCH (source:Artifact)-[:REQUIRES]->(required:Type) 
RETURN source, required 
 LIMIT 25

Explanation

Except for the types this query is pretty the same as the one above.

Troubleshooting

The next logical step would be MATCH (source:Artifact)-[:REQUIRES]->(required:Artifact). But this returns no results. The reason for that becomes clear while reading the jQAssistant User Manual: The scanner does only provide “REQUIRES” between an artifact and a type, not between two artifacts.

While analyzing jar files, each one is treated separately. Therefore, there will be no relationships between them. If two jars contain classes that depend on an external class, then the dependent class will appear multiple times. Caused by the way the jars are analyzed, there are multiple “Class” nodes with the same full qualified class name.

To summarize, the data is not yet ready to use for specific requirements.

Simple Data Refinement

(Updated December 2022)

The easiest way to close the gap between separately analyzed artifacts is provided by jQAssistant. The analyze command task can be configured to use -concepts to further enrich the nodes and their relationships. As described in Multiple jars with unique types (Stack Overflow), the classpath:Resolve Concept can be used to connect the types of the different artifacts. Furthermore, Concept dependency:Artifact adds the DEPENDS_ON relationship for Artifact nodes. Both of them combined into one command leads to:

bin/jqassistant.sh analyze -concepts classpath:Resolve dependency:Artifact

Further concepts and constraints can be found in the jQAssistant User Manual.

Remember to stop the server before analyzing the data.

Note that Simple Data Refinement leads to a DEPENDS_ON relationship between Artifact nodes. In Manual Data Refinement below we introduce a REQUIRES relationship.

Manual Data Refinement

Compared to classical relational databases, graph databases like Neo4j are by nature very flexible when it comes to relationships between nodes. The MERGE clause can be used to create a new relationship if it hadn’t been there yet. This is done while querying, which makes it possible to refine the data within a query and adapt it to the requirements.

Create relationships between Artifacts

The following query connects artifacts with required artifacts and classes with the same full qualified name (fqn). It returns a list of artifacts (jars) and their dependencies.

 MATCH (sourceArtifact:Artifact)-[:REQUIRES]->(requiredType:Type) 
      ,(dependencyType:Type)<-[:CONTAINS]-(dependencyArtifact:Artifact)
 WHERE dependencyType.fqn = requiredType.fqn
 MERGE (requiredType)-[:IS_SAME_AS]-(dependencyType)
 MERGE (sourceArtifact)-[:REQUIRES]->(dependencyArtifact)
RETURN sourceArtifact.fileName, dependencyArtifact.fileName

Explanation

The query above leads to a warning message like “This query builds a cartesian product…”. For every result in the first line all results of the second line will be returned. Without a WHERE clause (or other filter options), this would lead to an enormous amount of data. This is comparable to joins in SQL.

Since there is no relationship between types with the same full qualified name yet, they are queried using two comma separated MATCH clauses. The WHERE class ensures that there will be only one result in the second line for every result in the first line.

This is where MERGE comes into play. MERGE (requiredType)-[:IS_SAME_AS]-(dependencyType) creates an undirected relationship between types with the same full qualified name, so that later queries can be made without a potential cartesian product directly by querying the relationship.

It is also possible to add another MERGE clause to create one more relationship. (sourceArtifact)-[:REQUIRES]->(dependencyArtifact) adds the expected but missing directional relationship between an artifact and those artifacts it requires.

Query refined data

After adding relationships, artifacts that require other artifacts can easily be queried using:

 MATCH (source:Artifact)-[:REQUIRES]->(required:Artifact) 
RETURN source.fileName, required.fileName

With variable length relationships it is possible to query in a recursive manner. Here is an example that returns all artifacts that could be affected by a version update of e.g. /org.neo4j-neo4j-cypher-ir-3.5-3.5.14.jar:

 MATCH (changed:Artifact)<-[:REQUIRES*]-(dependent:Artifact)
 WHERE changed.fileName STARTS WITH "/org.neo4j-neo4j-cypher-ir"
RETURN changed, dependent LIMIT 10

Neo4j Graph Result

Neo4j Artifact Graph

Summary

Simple

(Updated December 2022)

Here are all steps using the simple data refinement in a nutshell:

Manual

Here are all steps using the manual data refinement in a nutshell:



Updates

References

tags: jqassistant - neo4j - cypher - java - jar - artifact - dependency

Hint: If you want to reach out to me without leaving a comment below, open a new discussion on GitHub.