Аннотация:Reliable evaluation of network mining tools implies significance and scalability testing. This is usually achieved by picking several graphs of various size from different domains. However, graph properties and thus evaluation results could be dramatically different from one domain to another. Hence the necessity of aggregating results over a multitude of graphs within each domain. The paper introduces an approach to automatically learn features of a directed graph from any domain and generate similar graphs while scaling input graph size with a real-valued factor. Generating multiple
graphs with similar size allows significance testing, while scaling graph size makes scalability evaluation possible. The proposed method relies on embedding an input graph into low-dimensional space, thus encoding graph features in a set of node vectors. Edge weights and node communities
could be imitated as well in optional steps. We demonstrate that embedding-based approach ensures variability of synthetic graphs while keeping degree and subgraphs distributions close to the original graphs. Therefore, the method could make significance and scalability testing of network algorithms more reliable without the need to collect additional data. We also show that embedding-based approach preserves various features in generated graphs which can’t be achieved by other generators imitating a given graph.