Malware Detection by Semi-supervised Learning


u3003679 - Posted on 23 May 2016

Project Description: 

Due to the huge amount of malware samples, malware detection now is typically solved by utilizing machine learning tools. While the problem is: current works usually use just small labelled dataset to verify their models, because labeling data is very expensive involving both money and labor. The models generated tend to overfit and may not be effective in real circumstances.
The situation is we have a huge mount of data yet small of it is labeled. So how can we use the unlabeled dataset to give us a more robust model to classifying malware?
A potential answer is Semi-supervised Learning(SSL). SSL is based on both labeled dataset and unlabeled dataset. SSL is one of the various fields in Machine learning and is also a well-studied topic with sound foundations. Resorting to SSL is supposed to give us a robust model.

Researcher name: 
WangXin
Researcher department: 
Department of Computer Science
Researcher email: 
Research Project Details
Project Duration: 
20/05/2016 to 20/08/2016