Abstract:
Phishing is the way toward pulling individuals to dangerous attacks by
manipulating them visit false sites and enter secret information like credit-card
numbers, usernames and passwords. Those fake sites look very similar to the original
ones making it very difficult for individuals to notice the difference. Phishing is like a
game where artists try to get personal data from unaware clients. These messages often
seem to be shockingly genuine, the Web pages too, wherein the approach towards
clients to enter their personal data seem to be genuine. Phishing relates to fishing, but
rather than catching fish, phishers try to illegally get data from the users. That is why
it is very important to find a way that automatically detects such dangerous sites. The
first step towards solving such a problem is collection of phishing images in order to
extract different features able to make the right classification between dangerous and
non-dangerous websites. are used Two algorithms for feature extraction from images
dataset were used and they are called SIFT and SURF. As the data is extracted it is
then structured using OCR in order to use it for classification using Machine Learning
algorithms. Logistic Regression, SVM, Random Forest classifier algorithms were used
on the data consisting of the feature extracted from the images. Accuracy metrics were
used to quantify the classification performance of the algorithms. Random Forest
achieved relatively good performance of 84.15% correct predictions and F1 score max
value 89.93 %, and this was the best performance among the classifiers.