分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: In Canonical Workflow Framework for Research (CWFR) packages are relevant in two different directions. In data science, workflows are in general being executed on a set of files which have been aggregated for specific purposes, such as for training a model in deep learning. We call this type of package a data collection and its aggregation and metadata description is motivated by research interests. The other type of packages relevant for CWFR are supposed to represent workflows in a self-describing and self-contained way for later execution. In this paper, we will review different packaging technologies and investigate their usability in the context of CWFR. For this purpose, we draw on an exemplary use case and show how packaging technologies can support its realization. We conclude that packaging technologies of different flavors help on providing inputs and outputs for workflow steps in a machine-readable way, as well as on representing a workflow and all its artifacts in a self-describing and self-contained way
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: In this paper, we report upon our recent work aimed at improving and adapting machine learning algorithms to automatically classify nanoscience images acquired by the Scanning Electron Microscope (SEM). This is done by coupling supervised and unsupervised learning approaches. We first investigate supervised learning on a ten-category data set of images and compare the performance of the different models in terms of training accuracy. Then, we reduce the dimensionality of the features through autoencoders to perform unsupervised learning on a subset of images in a selected range of scales (from 1 m to 2 m). Finally, we compare different clustering methods to uncover intrinsic structures in the images.