21 February 2021

Analyze java dependencies with jQAssistant

by JohT

jQAssistant extracts meta data of java applications and writes them into Neo4j, a native graph database. This blog shows how these tools can be used to analyze java jar dependencies e.g. for version updates.

Table of Contents
Fast lane
Getting started
Get familiar with the data
Query the data
1. Query class dependencies
  1. Explanation
2. Query Artifacts
  1. Explanation
  2. Troubleshooting
Simple Data Refinement
Manual Data Refinement
1. Create relationships between Artifacts
  1. Explanation
Query refined data
1. Neo4j Graph Result
Summary
1. Simple
2. Manual
Updates
References

Fast lane

(Updated November 2023)

If you’d like to start with ready-to-use reports and a fully automated analysis pipeline then have a look at Code Graph Analysis Pipeline.

The installation of the newest version of the tools is done automatically including plugins and configuration. A huge amount of preexisting queries and reports also run fully automated and provide already a lot of insights about the code base.

If you on the other hand want to start small and learn step by step then continue reading.

Getting started

As described here, all you need to get started with jQAssistant is:

Download it
Scan for example the lib directory of jQAssistant itself using bin\jqassistant.cmd scan -f lib (Windows) or bin/jqassistant.sh scan -f lib (Linux)
Start the web application to display and query the collected data using bin\jqassistant.cmd server (Windows) or bin/jqassistant.sh server (Linux)
Open http://localhost:7474
Sign in without user and password
Click on the database symbol on the top left corner

Get familiar with the data

Neo4j is a graph database and uses Cypher as query language. For those who are used to SQL, “Comparing SQL with Cypher” might be a helpful guideline.

To get familiar with the data, use the database symbol on top left corner and select any of the node types to get some examples. Node symbols can be extended to show all relationships. Switching to table view shows all fields and their contents.

Query the data

Query class dependencies

Getting some dependencies between classes can be obtained with:

 MATCH (dependent:Class)-[:DEPENDS_ON]->(source:Class) 
RETURN source, dependent 
 LIMIT 25

Explanation

MATCH is somewhat comparable to SELECT in SQL. The variables dependent and source are references to nodes of type Class. Every class that depends on another class will be returned. If there is no relationship between classes, if there is another type of relationship (like “implements”) or if the dependency is in the opposite direction, then the nodes will be filtered out and will not be part of the result. Finally, all matching source and dependent class nodes will be returned, limited to at most 25.

Query Artifacts

Getting artifacts and types they require can be done using:

 MATCH (source:Artifact)-[:REQUIRES]->(required:Type) 
RETURN source, required 
 LIMIT 25

Explanation

Except for the types this query is pretty the same as the one above.

Troubleshooting

The next logical step would be MATCH (source:Artifact)-[:REQUIRES]->(required:Artifact). But this returns no results. The reason for that becomes clear while reading the jQAssistant User Manual: The scanner does only provide “REQUIRES” between an artifact and a type, not between two artifacts.

While analyzing jar files, each one is treated separately. Therefore, there will be no relationships between them. If two jars contain classes that depend on an external class, then the dependent class will appear multiple times. Caused by the way the jars are analyzed, there are multiple “Class” nodes with the same full qualified class name.

To summarize, the data is not yet ready to use for specific requirements.

(Updated December 2022)

The easiest way to close the gap between separately analyzed artifacts is provided by jQAssistant. The analyze command task can be configured to use -concepts to further enrich the nodes and their relationships. As described in Multiple jars with unique types (Stack Overflow), the classpath:Resolve Concept can be used to connect the types of the different artifacts. Furthermore, Concept dependency:Artifact adds the DEPENDS_ON relationship for Artifact nodes. Both of them combined into one command leads to:

bin/jqassistant.sh analyze -concepts classpath:Resolve dependency:Artifact

Further concepts and constraints can be found in the jQAssistant User Manual.

ⓘ Remember to stop the server before analyzing the data.

ⓘ Note that Simple Data Refinement leads to a DEPENDS_ON relationship between Artifact nodes. In Manual Data Refinement below we introduce a REQUIRES relationship.

Compared to classical relational databases, graph databases like Neo4j are by nature very flexible when it comes to relationships between nodes. The MERGE clause can be used to create a new relationship if it hadn’t been there yet. This is done while querying, which makes it possible to refine the data within a query and adapt it to the requirements.

Create relationships between Artifacts

The following query connects artifacts with required artifacts and classes with the same full qualified name (fqn). It returns a list of artifacts (jars) and their dependencies.

 MATCH (sourceArtifact:Artifact)-[:REQUIRES]->(requiredType:Type) 
      ,(dependencyType:Type)<-[:CONTAINS]-(dependencyArtifact:Artifact)
 WHERE dependencyType.fqn = requiredType.fqn
 MERGE (requiredType)-[:IS_SAME_AS]-(dependencyType)
 MERGE (sourceArtifact)-[:REQUIRES]->(dependencyArtifact)
RETURN sourceArtifact.fileName, dependencyArtifact.fileName

Explanation

The query above leads to a warning message like “This query builds a cartesian product…”. For every result in the first line all results of the second line will be returned. Without a WHERE clause (or other filter options), this would lead to an enormous amount of data. This is comparable to joins in SQL.

Since there is no relationship between types with the same full qualified name yet, they are queried using two comma separated MATCH clauses. The WHERE class ensures that there will be only one result in the second line for every result in the first line.

This is where MERGE comes into play. MERGE (requiredType)-[:IS_SAME_AS]-(dependencyType) creates an undirected relationship between types with the same full qualified name, so that later queries can be made without a potential cartesian product directly by querying the relationship.

It is also possible to add another MERGE clause to create one more relationship. (sourceArtifact)-[:REQUIRES]->(dependencyArtifact) adds the expected but missing directional relationship between an artifact and those artifacts it requires.

Query refined data

After adding relationships, artifacts that require other artifacts can easily be queried using:

 MATCH (source:Artifact)-[:REQUIRES]->(required:Artifact) 
RETURN source.fileName, required.fileName

With variable length relationships it is possible to query in a recursive manner. Here is an example that returns all artifacts that could be affected by a version update of e.g. /org.neo4j-neo4j-cypher-ir-3.5-3.5.14.jar:

 MATCH (changed:Artifact)<-[:REQUIRES*]-(dependent:Artifact)
 WHERE changed.fileName STARTS WITH "/org.neo4j-neo4j-cypher-ir"
RETURN changed, dependent LIMIT 10

Neo4j Graph Result

Neo4j Artifact Graph

Summary

Simple

(Updated December 2022)

Here are all steps using the simple data refinement in a nutshell:

Download jQAssistant
Scan a directory (e.g. “lib”) with jars: bin/jqassistant.sh scan -f lib (POSIX)
Analyze with concepts: bin/jqassistant.sh analyze -concepts classpath:Resolve dependency:Artifact (POSIX)
Start the web application: bin/jqassistant.sh server (POSIX)
Open http://localhost:7474
Sign in without user or password

Query the graph of all artifacts that would be affected by a version update of a given artifact:

 MATCH (changed:Artifact)<-[:DEPENDS_ON*]-(dependent:Artifact)
 WHERE changed.fileName STARTS WITH "/org.neo4j-neo4j-cypher-ir"
RETURN changed, dependent LIMIT 10

Manual

Here are all steps using the manual data refinement in a nutshell:

Download jQAssistant
Scan a directory (e.g. “lib”) with jars: bin/jqassistant.sh scan -f lib (POSIX)
Start the web application: bin/jqassistant.sh server (POSIX)
Open http://localhost:7474
Sign in without user or password

Create relationships between artifacts and types and return a list of dependent jars with the following query:

 MATCH (sourceArtifact:Artifact)-[:REQUIRES]->(requiredType:Type) 
      ,(dependencyType:Type)<-[:CONTAINS]-(dependencyArtifact:Artifact)
 WHERE dependencyType.fqn = requiredType.fqn
 MERGE (requiredType)-[:IS_SAME_AS]-(dependencyType)
 MERGE (sourceArtifact)-[:REQUIRES]->(dependencyArtifact)
RETURN sourceArtifact.fileName, dependencyArtifact.fileName

Query the graph of all artifacts that would be affected by a version update of a given artifact:

 MATCH (changed:Artifact)<-[:REQUIRES*]-(dependent:Artifact)
 WHERE changed.fileName STARTS WITH "/org.neo4j-neo4j-cypher-ir"
RETURN changed, dependent LIMIT 10

Updates

2022-12-19: Add jQAssistant command for simple data refinement to Java JAR Dependencies Blog
2023-11-19: Reference Code Graph Analysis Pipeline

References

tags: jqassistant - neo4j - cypher - java - jar - artifact - dependency

Hint: If you want to reach out to me without leaving a comment below, open a new discussion on GitHub.

Johtizen

software engineering craftmanship blog
Home / About

Analyze java dependencies with jQAssistant

Table of Contents

Fast lane

Getting started

Get familiar with the data

Query the data

Query class dependencies

Explanation

Query Artifacts

Explanation

Troubleshooting

Simple Data Refinement

Manual Data Refinement

Create relationships between Artifacts

Explanation

Query refined data

Neo4j Graph Result

Summary

Simple

Manual

Updates

References