Mapping process
symfinder is able to automatically map the vp-s and variants it identified in the codebase with feature traces if they are available.
This mapping is done in three steps:
- First, the files containing traces are parsed and their traces are normalized to the class level (additional details on the format of the traces are available here);
- Then, the JSON file output by symfinder and containing information on the identified symmetries is parsed and the mapping is done by exact matching the name of the classes;
- Finally, the precision and recall measures are calculated.
1. Data normalization
Before executing the mapping process in ArgoUML and Sat4j, we normalized the granularity of traces for their domain features with the granularity of their potential vp-s with variants, so they all become of a common class level granularity. This normalization is necessary for two reasons. First, potential vp-s with variants are related only to the structural elements in code assets of a system, such as classes or methods for now. On the other hand, features in the ArgoUML’s ground truth have traces mostly to their refinements, specifically about 73% of them are at the statement level. Then, even though all features traces in Sat4j are only to the structural elements in code, less than 4% of them are at the method and field levels. Such a normalization also enables us to compare the observations made in both systems and to draw more general conclusions. Specifically, whenever a feature in the ground truth had one of its traces to a class refinement (i.e., referencing statements within a single class), complete method, or method refinement (i.e., referencing statements within a single method), we simplified that trace to the whole class. For example, feature Sequence in the ArgoUML’s ground truth has one of the trace links to https://github.com/but4reuse/argouml-spl-benchmark/blob/master/ArgoUMLSPLBenchmark/groundTruth/STATEDIAGRAM.txt
org.argouml.uml.diagram.DiagramFactory DiagramFactory() Refinement
This is a trace at the statement level within the method
DiagramFactory()
. In such a case, we truncated the trace to the whole class
org.argouml.uml.diagram.DiagramFactory
. Similarly, feature Deletion
in
the Sat4j’s ground truth has one of the trace links to https://deathstar3.github.io/symfinder-demo/JRN20-files/Features.pdf.
METHOD fixedSize(int) org.sat4j.minisat.core.Solver deletion/expert
This is a trace to the method fixedSize(int)
within the Solver
class.
As with the ArgoUML’s method level traces, such a trace is simplified to
the whole org.sat4j.minisat.core.Solver
class. This still means that we
consider all features’ traces, but we only change their granularity to class level.
From the potential vp-s with variants in ArgoUML and Sat4j, we considered
those at class level, and these also include all potential vp-s
with variants at the method level.
Besides, in our earlier study with ArgoUML [1], we
noticed that several of its considered packages are not completely annotated.
Those packages that had few annotations affected the calculated precision and
recall of our tooled approach. Therefore, we decided to restrict our study of
ArgoUML to only its main argouml-app
package, as only this package seems
to be largely annotated. It actually contains over 94% of features traces.
After this data normalization, the ground truth of ArgoUML has 8 from 11
features with 672 traces and 1 272 potential vp-s with variants, whereas Sat4j
has 12 from 13 features with 113 traces and 225 potential vp-s with variants.
2. Mapping features traces
We automated the entire mapping process between domain features and potential
vp-s with variants of a system. The mapping solution is a set of
Python scripts deployed as a separate Docker container within the symfinder
toolchain.
After symfinder’s main execution, the mapping process consists in three steps. First,
all feature traces are simplified, that is, normalized to class level granularity.
For example, in the Sat4j’s ground truth there are these two following traces
for the Solver
and Unit Clause Provider
features, respectively.
CLASS org.sat4j.tools.ManyCore org.sat4j.tools solver/user
METHOD provideUnitClauses(org.sat4j.specs.UnitPropagationListener) org.sat4j.tools.ManyCore unitclauseprovider/expert
The second trace that has a method granularity is simplified to the class
granularity, that is, to org.sat4j.tools.ManyCore
, which becomes similar
to the first trace. We then use the exact string matching algorithm, provided
by Python’s string equality, to find the mapping of each feature trace in the
ground truth of ArgoUML or Sat4j to the potential vp-s or variants in their
respective JSON file (the same one used for the visualization). Specifically,
the JSON file is parsed to find a node with the same exact name as the given
feature trace. For example, the two features traces in Sat4j are matched to the
following node, which is the v_ManyCore
variant visualized below.
Finally, the potential vp-s, variants, and features traces, whether they have a
mapping or not, are recorded. If a node in the JSON file is labeled with VP
or VARIANT
type and maps to the feature trace, it indicates that the node is
an actual vp or variant and is considered as a true positive. Other nodes that
are mapped but miss a VP
or VARIANT
label are false negatives, whereas those
without a mapping but have a VP
or VARIANT
label are false positives.
// Excerpt from symfinder's JSON output
nodes: [
{
"name": "org.sat4j.tools.ManyCore",
"types": ["CLASS","METHOD_LEVEL_VP","VARIANT","HOTSPOT"],
// ommitted
},// ommitted
]
Then, based on this mapping, precision and recall are automatically calculated and reported as outputs.
3. Calculating precision and recall
Here is an example of mapping output:
Mapping on all vp-s
Number of VPs and variants linked to features (TP): 113
Number of VPs and variants not linked to features (FP): 112
Number of features traces not linked to any VP nor variant (FN): 0
Number of traces (TP + FN): 113
Number of VPs / variants (TP + FP): 225
Precision = TP / (TP + FP): 0.5022222222222222
Recall = TP / (TP + FN): 1.0
Mapping on hotspots only
Number of VPs and variants linked to features (TP): 48
Number of VPs and variants not linked to features (FP): 25
Number of features traces not linked to any VP nor variant (FN): 65
Number of traces (TP + FN): 113
Number of VPs / variants (TP + FP): 73
Precision = TP / (TP + FP): 0.6575342465753424
Recall = TP / (TP + FN): 0.4247787610619469
Two mappings are done. The first takes into account all the potential vp-s and variants identified by symfinder, whereas the second only considers nodes in zones of high density of symmetries.
Zones of high density of symmetries correspond to an aggregation of symmetries, i.e. vp-s with a particularly high number of variants, being at class or method level.
The user has the ability, when analysing a project, to define a threshold of variants above which the vp and its variants will be considered as a zone of high density of symmetries by setting the nbVariantsThreshold
parameter in symfinder’s configuration file.
Implementation
The mapping solution is a set a Python scripts deployed in a Docker container. Sources are here and code is organized as follow:
- the
common
directory contains the sources for the actual mapping, which are common to all types of traces, and is part of thedeathstar3/features-extractor
image built from theDockerfile
; - the
argoUML-files
directory contains the traces of ArgoUML as well as the sources to parse them, and is part of thedeathstar3/features-extractor-argouml
image built from theDockerfile-ArgoUML
, using thedeathstar3/features-extractor
image as a parent; - the
sat4j-files
directory contains the JAR allowing to extract the traces from the Sat4j code base as well as the sources to parse them, and is part of thedeathstar3/features-extractor-sat4j
image built from theDockerfile-Sat4j
, using thedeathstar3/features-extractor
image as a parent.
Adapting symfinder for your own traces format
In order to parse other types of traces, one needs to:
- create another directory for the new type of trace, and adapt the
features_extractor.py
script to the new syntax; - create a new Dockerfile similar to ArgoUML’s or Sat4j’s, using as parent
deathstar3/features-extractor
; - add the build of the image in the
build.sh
script; - adapt one of the shell scripts used to run the mapping already present at the root of the repository to run symfinder on the project and then the container from the image previously built.
References
[1] Johann Mortara, Xhevahire Tërnava, and Philippe Collet. 2020. Mapping Features to Automatically Identified Object-Oriented Variability Implementations: The case of ArgoUML-SPL. In Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems (VaMoS ’20), February 5–7, 2020, Magdeburg, Germany. ACM, New York, NY, USA, 9 pages.