symfinder Demos

This site references all demos of the symfinder toolchain.

Mapping process

symfinder is able to automatically map the vp-s and variants it identified in the codebase with feature traces if they are available.

This mapping is done in three steps:

First, the files containing traces are parsed and their traces are normalized to the class level (additional details on the format of the traces are available here);
Then, the JSON file output by symfinder and containing information on the identified symmetries is parsed and the mapping is done by exact matching the name of the classes;
Finally, the precision and recall measures are calculated.

1. Data normalization

Before executing the mapping process in ArgoUML and Sat4j, we normalized the granularity of traces for their domain features with the granularity of their potential vp-s with variants, so they all become of a common class level granularity. This normalization is necessary for two reasons. First, potential vp-s with variants are related only to the structural elements in code assets of a system, such as classes or methods for now. On the other hand, features in the ArgoUML’s ground truth have traces mostly to their refinements, specifically about 73% of them are at the statement level. Then, even though all features traces in Sat4j are only to the structural elements in code, less than 4% of them are at the method and field levels. Such a normalization also enables us to compare the observations made in both systems and to draw more general conclusions. Specifically, whenever a feature in the ground truth had one of its traces to a class refinement (i.e., referencing statements within a single class), complete method, or method refinement (i.e., referencing statements within a single method), we simplified that trace to the whole class. For example, feature Sequence in the ArgoUML’s ground truth has one of the trace links to https://github.com/but4reuse/argouml-spl-benchmark/blob/master/ArgoUMLSPLBenchmark/groundTruth/STATEDIAGRAM.txt

org.argouml.uml.diagram.DiagramFactory DiagramFactory() Refinement

This is a trace at the statement level within the method DiagramFactory(). In such a case, we truncated the trace to the whole class org.argouml.uml.diagram.DiagramFactory. Similarly, feature Deletion in the Sat4j’s ground truth has one of the trace links to https://deathstar3.github.io/symfinder-demo/JRN20-files/Features.pdf.

METHOD fixedSize(int) org.sat4j.minisat.core.Solver deletion/expert

This is a trace to the method fixedSize(int) within the Solver class. As with the ArgoUML’s method level traces, such a trace is simplified to the whole org.sat4j.minisat.core.Solver class. This still means that we consider all features’ traces, but we only change their granularity to class level. From the potential vp-s with variants in ArgoUML and Sat4j, we considered those at class level, and these also include all potential vp-s with variants at the method level.

Besides, in our earlier study with ArgoUML [1], we noticed that several of its considered packages are not completely annotated. Those packages that had few annotations affected the calculated precision and recall of our tooled approach. Therefore, we decided to restrict our study of ArgoUML to only its main argouml-app package, as only this package seems to be largely annotated. It actually contains over 94% of features traces. After this data normalization, the ground truth of ArgoUML has 8 from 11 features with 672 traces and 1 272 potential vp-s with variants, whereas Sat4j has 12 from 13 features with 113 traces and 225 potential vp-s with variants.

2. Mapping features traces

We automated the entire mapping process between domain features and potential vp-s with variants of a system. The mapping solution is a set of Python scripts deployed as a separate Docker container within the symfinder toolchain. After symfinder’s main execution, the mapping process consists in three steps. First, all feature traces are simplified, that is, normalized to class level granularity. For example, in the Sat4j’s ground truth there are these two following traces for the Solver and Unit Clause Provider features, respectively.

CLASS org.sat4j.tools.ManyCore org.sat4j.tools solver/user
METHOD provideUnitClauses(org.sat4j.specs.UnitPropagationListener) org.sat4j.tools.ManyCore unitclauseprovider/expert

The second trace that has a method granularity is simplified to the class granularity, that is, to org.sat4j.tools.ManyCore, which becomes similar to the first trace. We then use the exact string matching algorithm, provided by Python’s string equality, to find the mapping of each feature trace in the ground truth of ArgoUML or Sat4j to the potential vp-s or variants in their respective JSON file (the same one used for the visualization). Specifically, the JSON file is parsed to find a node with the same exact name as the given feature trace. For example, the two features traces in Sat4j are matched to the following node, which is the v_ManyCore variant visualized below.

sat4j-vis02

Finally, the potential vp-s, variants, and features traces, whether they have a mapping or not, are recorded. If a node in the JSON file is labeled with VP or VARIANT type and maps to the feature trace, it indicates that the node is an actual vp or variant and is considered as a true positive. Other nodes that are mapped but miss a VP or VARIANT label are false negatives, whereas those without a mapping but have a VP or VARIANT label are false positives.

// Excerpt from symfinder's JSON output
nodes: [
{
"name": "org.sat4j.tools.ManyCore",
"types": ["CLASS","METHOD_LEVEL_VP","VARIANT","HOTSPOT"],
// ommitted
},// ommitted
]

Then, based on this mapping, precision and recall are automatically calculated and reported as outputs.

3. Calculating precision and recall

Here is an example of mapping output:

Mapping on all vp-s
Number of VPs and variants linked to features (TP): 113
Number of VPs and variants not linked to features (FP): 112
Number of features traces not linked to any VP nor variant (FN): 0
Number of traces (TP + FN): 113
Number of VPs / variants (TP + FP): 225
Precision = TP / (TP + FP): 0.5022222222222222
Recall = TP / (TP + FN): 1.0

Mapping on hotspots only
Number of VPs and variants linked to features (TP): 48
Number of VPs and variants not linked to features (FP): 25
Number of features traces not linked to any VP nor variant (FN): 65
Number of traces (TP + FN): 113
Number of VPs / variants (TP + FP): 73
Precision = TP / (TP + FP): 0.6575342465753424
Recall = TP / (TP + FN): 0.4247787610619469

Two mappings are done. The first takes into account all the potential vp-s and variants identified by symfinder, whereas the second only considers nodes in zones of high density of symmetries.

Zones of high density of symmetries correspond to an aggregation of symmetries, i.e. vp-s with a particularly high number of variants, being at class or method level. The user has the ability, when analysing a project, to define a threshold of variants above which the vp and its variants will be considered as a zone of high density of symmetries by setting the nbVariantsThreshold parameter in symfinder’s configuration file.

Implementation

The mapping solution is a set a Python scripts deployed in a Docker container. Sources are here and code is organized as follow:

architecture

the common directory contains the sources for the actual mapping, which are common to all types of traces, and is part of the deathstar3/features-extractor image built from the Dockerfile;
the argoUML-files directory contains the traces of ArgoUML as well as the sources to parse them, and is part of the deathstar3/features-extractor-argouml image built from the Dockerfile-ArgoUML, using the deathstar3/features-extractor image as a parent;
the sat4j-files directory contains the JAR allowing to extract the traces from the Sat4j code base as well as the sources to parse them, and is part of the deathstar3/features-extractor-sat4j image built from the Dockerfile-Sat4j, using the deathstar3/features-extractor image as a parent.

Adapting symfinder for your own traces format

In order to parse other types of traces, one needs to:

create another directory for the new type of trace, and adapt the features_extractor.py script to the new syntax;
create a new Dockerfile similar to ArgoUML’s or Sat4j’s, using as parent deathstar3/features-extractor;
add the build of the image in the build.sh script;
adapt one of the shell scripts used to run the mapping already present at the root of the repository to run symfinder on the project and then the container from the image previously built.

References

[1] Johann Mortara, Xhevahire Tërnava, and Philippe Collet. 2020. Mapping Features to Automatically Identified Object-Oriented Variability Implementations: The case of ArgoUML-SPL. In Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems (VaMoS ’20), February 5–7, 2020, Magdeburg, Germany. ACM, New York, NY, USA, 9 pages.