Fault diagnosis of gearboxes using wavelet support vector machine, least square support vector machine and wavelet packet transform
Mohammad Heidari^{1} , Hadi Homaei^{2} , Hossein Golestanian^{3} , Ali Heidari^{4}
^{1, 2, 3, 4}Faculty of Engineering, Shahrekord University, P.O. Box 115, Shahrekord, Iran
^{2}Corresponding author
Journal of Vibroengineering, Vol. 18, Issue 2, 2016, p. 860875.
Received 16 July 2015; received in revised form 2 September 2015; accepted 15 September 2015; published 31 March 2016
JVE Conferences
This work focuses on a method which experimentally recognizes faults of gearboxes using wavelet packet and two support vector machine models. Two wavelet selection criteria are used. Some statistical features of wavelet packet coefficients of vibration signals are selected. The optimal decomposition level of wavelet is selected based on the Maximum Energy to Shannon Entropy ratio criteria. In addition to this, Energy and Shannon Entropy of the wavelet coefficients are used as two new features along with other statistical parameters as input of the classifier. Eventually, the gearbox faults are classified using these statistical features as input to least square support vector machine (LSSVM) and wavelet support vector machine (WSVM). Some kernel functions and multi kernel function as a new method are used with three strategies for multi classification of gearboxes. The results of fault classification demonstrate that the WSVM identified the fault categories of gearbox more accurately and has a better diagnosis performance as compared to the LSSVM.
Keywords: gearbox, fault diagnosis, wavelet, support vector machine.
1. Introduction
Fault diagnosis of gearboxes is one of the most common and intricate challenges in plants. Analysis of vibration signal is a principal method for gearbox fault diagnosis. The procedure for a fault diagnosis of a gearbox can be stated in several steps: data acquisition, signal processing, feature selection and diagnostics [1, 2]. To analyze vibration signals, some methods such as time [3, 4], frequency [5], and timefrequency domain [6] have been investigated. Between these, wavelet transform [710] has progressed in the last two decades, and outweighs the other timefrequency ways, although it is lacking in a few aspects as well. Discrete wavelet transform is primarily considered as an efficient tool for vibration based signal processing for fault detection. Wavelet analysis could provide local features in both time and frequency domains and has the feature of multiscale, which enables wavelet analysis to distinguish the abrupt components of the vibration signal [11]. The foundations of Support Vector Machines (SVM) have been developed by Vapnik [12, 13] which is applied to both pattern recognition [1418] and regression forecasting [1924]. The effectiveness of wavelet based features for fault diagnosis of gears using SVM and proximal support vector machines has been revealed by Saravanan et al. [25]. Qu and Zuo [26] utilized a SVM to identify the wear degree of slurry pump. Sun et al. [27] predicted the remaining life of a bearing by establishing a SVRbased model. Hou and Li [28] optimised the parameters of SVR through an evolution strategy and formulated a SVRbased shortterm fault prediction strategy. Shen et al. [29] presented a novel intelligent gear fault diagnosis model based on empirical mode decomposition and multiclass transductive support vector machine. Xian and Zeng [30] developed an intelligent fault diagnosis procedure based on wavelet packet transform (WPT) and hybrid SVM. Zamanian and Ohadi [31] presented a method for feature extraction based on exact wavelet analysis to improve the fault diagnosis of gears. In their study, feature extraction was based on maximization of local Gaussian correlation function of wavelet coefficients. They used from a linear support vector machine to classify feature sets extracted with the presented method.
The rest of this paper is outlined as follows. Section 2 briefly describes the fundamental theory of wavelet packet decomposition and two wavelet selection criteria. The proposed new machine health status identification method is presented in Section 3, followed by the experimental verification tests using both bearing and gearbox datasets as stated in Section 4. In Section 5, the effect of different wavelet basis functions on the performance of the proposed scheme is discussed. Conclusions are drawn in Section 6.
2. Theoretical background
2.1. The review of wavelet packet transform
Wavelet packet transform is an extension of discrete wavelet transform. The signals are decomposed into a hierarchical structure of detail and approximations at limited levels as follows:
where ${D}_{i}\left(t\right)$ denotes the wavelet detail and ${A}_{j}\left(t\right)$ stands for the wavelet approximation at the $j$th level [1]. A wavelet packet is a function with three indices of integers $i$, $j$ and $k$ which are the modulation, scale and translation parameters, respectively:
The wavelet functions ${\psi}^{j}$ are determined as follows:
The original signal $f\left(t\right)$ is defied after $j$ level of decomposition as follows:
While the wavelet packet component signal ${f}_{j}^{i}\left(t\right)$ are stated by a linear combination of wavelet packet functions ${\psi}_{j,k}^{i}\left(t\right)$ as follows:
where the wavelet packet coefficients ${c}_{j,k}^{i}\left(t\right)$ are calculated by:
Providing that the wavelet packet functions satisfy the orthogonality:
Two wavelet selection criteria are used and compared to select a suitable wavelet for feature extraction of the problem.
2.2. Maximum relative wavelet energy criterion
Relative wavelet energy gives information about relative energy with associated frequency bands and can detect the degree of similarity between segments of a signal [32, 33]. The energy at each resolution level $n$, will be the energy content of signal at each resolution is estimated by:
where ‘$m$’ is the number of wavelet coefficients and ${C}_{n,i}$ is the $i$th wavelet coefficient of $n$th scale. The total energy can be calculated as follows:
The distribution of energy probability is defined as follows [33]:
where $\sum _{n}{p}_{n}=\text{1}$, and the distribution, ${p}_{n}$, is considered as a time scale density. The Total Energy is calculated for each scale and for vibration signals at different rotor speed and for different loading conditions using healthy and faulty gearbox conditions.
2.3. Maximum energy to Shannon entropy ratio criterion
A suitable wavelet is chosen as the base wavelet, which can extract the maximum amount of Energy while minimizing the Shannon entropy of the corresponding wavelet coefficients. The amount of the Energy and Shannon entropy of a signal’s wavelet coefficient is shown by Energy to Shannon Entropy ratio [34] and is given as:
In Eq. (12), the entropy of signal wavelet coefficients is given as follows:
The energy probability distribution of the wavelet coefficients (${p}_{i}$), is given by:
with $\sum _{i=1}^{m}{p}_{i}=\text{1}$, and ${p}_{i}{\mathrm{log}}_{2}{p}_{i}=\text{0}$ if ${p}_{i}=\text{0}$.
3. Review of machine learning techniques
3.1. Multi class support vector machine
The SVM is a supervised learning method based on statistical learning theory formulated by Vapnik [12]. The SVM maps the low dimensional data to the high dimensional feature space, and aims to solve a binary problem by searching an optimal hyper plane which can separate two datasets with the largest margin in the high dimensional space. The optimal hyper plane is established through a set of support vectors from the original datasets and these subsets form the boundary between the two classes. The classification function can be described as follows:
where the nonlinear mapping function $\u0424\left(x\right)$ maps the input feature vector in to a higher dimensional feature space, $b$ is the bias, $w$ is the weight vector. $b$ and $w$ are used to determine the position of the separating hyperplane. Some problems about multiclass classification have been researched [20, 21]. As seen before, really SVM is a binary classifier. However, rotating machinery may usually suffer more than two faults. To tackle this problem, in this paper three strategies, such as oneagainstone (OAO), oneagainstall (OAA) and one against others (OAOT) are used [35].
3.2. Least square support vector machine
LSSVM is a reformulation of standard SVM which was proposed by Suykens and Vandewalle [36]. In contrast to SVM, the LSSVM uses a least squares cost function and involves equality constraints instead of inequalities in the problem formulation. Given the training set ${\left\{\right({x}_{i},{y}_{i}\left)\right\}}_{i=1}^{n}$ with ${x}_{i}\in {R}^{n}$ and ${y}_{i}\in (1,1)$. To class the training set, LSSVM has to find the optimal (with maximum margin) separating hyper plane so that LSSVM has good generalization ability. All of the separating hyper planes have the following representation in the feature space: $y\left(x\right)={\omega}^{T}\u0424\left(x\right)+b$, where $\omega $ is the normal vector of the separating hyper plane. Margin maximization is obtained by minimizing the squared norm of $\omega $ while also minimizing the fitting error ${\zeta}_{i}$ of the training set. The resulting optimization problem of LSSVM can be formulated in the following form:
where $\stackrel{\xb4}{\gamma}$ is the regularization parameter. The Lagrangian comes in the form:
where ${\alpha}_{i}$ is the Lagrange multiplier. According to the conditions for optimality yield, the following equations must be satisfied: $\partial L/\partial \omega =0$; $\partial L/\partial b=0$; $\partial L/\partial {\alpha}_{i}=0$; and $\partial L/\partial {\zeta}_{i}=0$. Then a linear system for classification and regression can be obtained from the KarushKuhnTucker conditions [37]. Its solution is found by solving the system of linear equations expressed in matrix form as follows:
where $P=\left[\u0424\right({x}_{1}{)}^{T}{y}_{1},\dots ,\u0424\left({{x}_{l})}^{T}{y}_{l}\right]$, $\overrightarrow{1}=[{1,\dots ,1]}^{T}$, $Q=[{y}_{1},\dots ,{y}_{l}{]}^{T}$, $\alpha =[{\alpha}_{1}{,\dots ,{\alpha}_{l}]}^{T}$.
Then the regression function of LSSVM is obtained:
where the kernel function can be given by $K\left({x}_{i},x\right)={\u0424}^{T}\left({x}_{i}\right)\u0424\left(x\right)$ and it meets Mercer’s condition. In the process of fault diagnosis, it is very important to choose a reasonable kernel function for support vector machine. Different kernel functions will obtain different decision functions so that determine the operation performance for support vector machine. Generally, two kinds of kernels, i.e. local kernel and global kernel, are utilized to construct the decision functions [38]. A typical local kernel is radial basis function kernel, which is defined as follows:
where $\sigma $ is the width of the RBF kernel. A typical global kernel is the polynomial kernel, which is defined as follows:
where $d$ denotes the kernel parameter. In order to improve the classification performance and generalization ability for LSSVM, a multikernel $\left({K}_{m}\right)$ support vector machine (MSVM) is constructed in this study by a controlled parameter $\beta $ based on the local kernel function ${K}_{r}$ and global kernel function ${K}_{p}$:
where 0 $<\beta <$1 is the controlled parameter. To be an admissible kernel in SVM, kernels must satisfy Mercer’s Theorem. Since ${K}_{r}$ and ${K}_{p}$ all satisfy Mercer’s Theorem, therefore a convex combination of them also satisfy Mercer’s Theorem. In the MSVM model, there are four parameters: weight parameter $\beta $, penalty constant $C$, kernel parameters $\sigma $ and $d$. The weight parameter is used for weight assignment for different kernel function. The penalty constant is used for these samples misclassified by the optimal separating plane and its role is to strike a proper balance between the calculation complexity and the separating error. The kernel function parameters $\sigma $ and $d$ reflect the characteristics of the training data. All these parameters affect the generalization of MSVM and exert a considerable influence on the performance of MSVM. However, it is not known beforehand which parameters are best for a given problem. In this work, parameters in multikernel SVM are randomly selected. The LSSVM was initially proposed to deal with binary classification problems. Multiclassification problems can also be solved by combining a number of binary LSSVMs using any of a number of strategies, such as oneversusone, oneversusall and one against others. In this study, OAO, OAA and OAOT methods are used.
3.3. Wavelet support vector machine
The wavelet function group can be defined as:
where $x$, $a$, $c\in R$, $a$ is a dilation factor, and $c$ is a translation factor. Assuming that $\psi \left(x\right)$ is the wavelet function of 1D, the multidimensional wavelet function can be defined using tensor theory as:
where $x=\left({x}_{1},{x}_{2},,\dots ,{x}_{N}\right)\in {R}^{N}$ and, $N$ is the dimension number. Let $\psi \left(x\right)$ denotes a mother kernel function. Then dotproduct wavelet kernels are:
The decision function for classification is [39]:
where the ${x}_{i}^{j}$ denotes the $j$th component of the $i$th training example. The Mexican hat mother wavelet is $\psi \left(x\right)=\psi \left(1{x}^{2}\right)\mathrm{e}\mathrm{x}\mathrm{p}({x}^{2}/2)$, and the corresponding wavelet kernel function is:
Similar to Mexican hat wavelet kernel function, Morlet wavelet kernel is also an admissible SV kernel function. The Morlet function is defined as follows:
And the corresponding wavelet kernel function is:
In this paper, four kernel functions are used: wavelet Morlet, wavelet Mexican hat, Gaussian wavelet kernel and wavelet Shannon. The multiclass classification strategy, such as OAA, OAO and OAOT with different wavelet kernel functions is used for classification in this paper.
4. Experimental validation of the proposed intelligent machine fault diagnosis scheme
Rolling element bearings and gears are the most common and important components used in rotating machinery such as gearboxes. Faults occurring on the surface of these components could cause unexpected machine breakdown. Therefore, it is necessary to develop an effective intelligent gearbox fault diagnosis method. To verify the effectiveness of the proposed method, new gearbox datasets provided by the by Ottawa University in collaboration with the Prognostics and Health Management Society and the test rig experimental setup datasets collected in the Shahrekord University are analyzed.
4.1. Case 1. Ottawa gearbox vibration datasets
Data collected in this section come from Ottawa University gearbox under Prognostics and Health Management Society [40]. Data were sampled synchronously from accelerometers mounted on both the input and output shaft retaining plates of the gearbox. An attached tachometer generates one pulse per revolution providing very accurate zero crossing information. Data were collected at different variable shaft speed under high and low loading. The test runs include seven different combinations of faults and one faultfree reference run. The signals were sampled with sampling frequency 66.666 kHz and the sampling horizon was 4 s long.
4.2. Case 2. Shahrekord experimental setup
The experimental setup at Shahrekord University to collect dataset consists of a onestage gearbox with spur gears, a flywheel and an electrical motor. The test rig has been shown in Fig. 1. Vibration signals are obtained in the radial direction by mounting the accelerometer on the top of the gearbox. “Easy Viber” data collector and its software, “SpectraPro”, are used for data acquisition. The sensitivity and dynamic range of accelerometer probe are 100 mv/g and ±50 g. The signals are sampled at 16000 Hz lasting 4 s. In the present study, four pinion wheels are used. The vibration signal from accelerometer is captured for the following conditions: good gear, gear with tooth breakage, chipped tooth gear and eccentric gear. For bearing vibration signal acquisition five selfaligning ball bearings (1209 K) are used. One new bearing is considered as good bearing. In the other three bearings, some defects are created and then various bearings are installed and the raw vibration signals acquired on the bearing housing. So the vibration signals are captured for the following conditions: good bearing, bearing with spall on inner race, bearing with spall on outer race, bearing with spall on ball and bearing with combine defect.
Fig. 1. Fault simulator set up in Shahrekord University
5. Result and discussion
Based on Table 1, Daubechies wavelet (db44) and Meyer are selected as the best base wavelet among the other wavelets considered from the Maximum Relative Energy and Maximum Energy to Shannon Entropy criteria respectively. The wavelet packet coefficients of all signals with db44 and Meyer are calculated at the four eighth level of decomposition. After WPT, 2304 statistical features are extracted from the 256 nodes at eight decomposition levels. When applying wavelet transform to a signal, if the Shannon entropy measure of a particular scale is minimum then we can say that a major defect frequency component exists in the scale but, in the present study out of 256 scales considered, the scale having the Maximum Energy to Shannon Entropy of healthy condition is selected, and the statistical features of the wavelet packet coefficient corresponding to the selected level are calculated.
Table 1. Comparison of parameters for wavelet selection
Wavelet type

PHM gearbox dataset

Shahrekord gearbox dataset

Maximum relative wavelet energy

Energy to Shannon entropy ratio


Meyer

0.011569

101.54

symlet 16

0.013278

90.19

cofi5

0.016934

67.90

rbio6.8

0.017341

60.73

bior6.8

0.021121

58.63

db44

0.104178

48.55

Statistical moments like kurtosis, skewness and standard deviation are descriptors of the shape of the amplitude distribution of vibration data, and have some advantages over traditional time and frequency analysis, such as its lower sensitivity to the variations of load and speed. In the present paper, authors’ use statistical moments like standard deviation, crest factor, absolute mean amplitude value, variance, kurtosis, skewness and fourth central moment as features to effectively indicate early faults occurring in rolling element bearings and gears. In addition, energy and Shannon entropy of the wavelet coefficients are used as two new features along with other statistical parameters as input of the classifier. These statistical features are fed as input to the soft computing techniques like SVM for fault classification. Two cases of input data and feature sets are considered for classification. In case A, statistical parameters of wavelet packet transform are considered (for each type of the gearbox fault). Case B is related to the condition that statistical features in optimal level, which has been extracted based on the criteria of Maximum Energy to Shannon Entropy ratio, are considered (for each type of gearbox fault). In addition, energy and Shannon Entropy factors are used as two new features as features sets in this case. Table 2 shows the results of classification of gearbox with Maximum Energy to Shannon Entropy criterion. In the case B, by Maximum Energy to Shannon Entropy ratio criterion (Table 2), for test set, correctly classified instances for LSSVM and WSVM are 91.11 % and 95 % respectively. While using 10fold cross validation average classification accuracies are 90.55 % and 93.88 % for LSSVM and WSVM respectively.
Table 2. Classification performance (maximum energy to Shannon entropy criterion)
Parameters

LSSVM

WSVM


Test set

10fold cross validation

Test set

10fold cross validation


Correctly classified

Case A

160 (88.88 %)

156 (86.66%)

168 (93.33 %)

164 (91.11 %)

Case B

164 (91.11 %)

163 (90.55 %)

171 (95 %)

169 (93.88 %)


Incorrectly classified

Case A

20 (11.11 %)

24 (13.33 %)

12 (6.66 %)

16 (8.88 %)

Case B

16 (8.88 %)

17 (9.44 %)

9 (5 %)

11 (6.11 %)


Total number of instances

180

180

180

180


Training time (s)

Case A (LSSVM)

37.05


Case B (LSSVM)

15.47


Case A (WSVM)

137.41


Case B (WSVM)

84.73

Table 3 shows accuracy associated with each technique for fault classification with Maximum Relative Wavelet Energy criterion. The correctly classified instances using test set for LSSVM and WSVM are 87.77 % and 92.22 % respectively with two new features. For 10fold cross validation, average classification accuracies for LSSVM and WSVM are 86.11 % and 90.55 % respectively, which is slightly less than the previous case.
From Tables 2 and 3, we found that the Maximum Energy to Shannon Entropy criterion with two new features is better for fault classification of gearbox with respect to Maximum Relative Wavelet Energy criterion.
Table 3. Classification performance (maximum relative wavelet energy criterion)
Parameters

LSSVM

WSVM


Test set

10fold cross validation

Test set

10fold cross validation


Correctly classified

Case A

154 (85.55 %)

150 (83.33 %)

162 (90 %)

160 (88.88 %)

Case B

158 (87.77 %)

155 (86.11 %)

166 (92.22 %)

163 (90.55 %)


Incorrectly classified

Case A

26 (14.44 %)

30 (16.66 %)

18 (10 %)

20 (11.11 %)

Case B

22 (12.22 %)

25 (13.88 %)

14 (7.77 %)

17 (9.44 %)


Total number of instances

180

180

180

180


Training time (s)

Case A (LSSVM)

40.94


Case B (LSSVM)

17.79


Case A (WSVM)

144.28


Case B (WSVM)

94.05

Table 4. The classified result of experiment data using WSVM with three methods
Operating condition

Fault classification accuracy based on SVM with kernel (%)


Morlet
$c=$ 29.7, $a=$ 0.74

Mexican hat
$c=$ 38.7, $a=$ 0.83

Gaussian

Shannon


Out race fault

OAOT

95

94.50

93.10

88.40

OAA

94.55

93.65

92.35

83.40


OAO

90.50

85.60

85.60

82.40


Inner race fault

OAOT

95.10

95.33

92.10

90.15

OAA

94.50

94.50

91.65

87.12


OAO

91.50

88.55

88.50

85.50


Roller fault

OAOT

97.20

96.50

93.25

84.45

OAA

95.50

93.50

92.50

83.52


OAO

91.60

90.45

90.50

82.60


Combine fault

OAOT

96.10

95.15

93.35

85.00

OAA

96.50

94.50

91.50

84.74


OAO

92.75

92.40

92.40

82.15


Average accuracy (bearing)

OAOT

95.85

95.37

92.95

87.00

OAA

95.26

94.03

92.00

84.69


OAO

91.58

89.25

89.25

83.16


Chipped tooth gear

OAOT

97.80

96.60

96.60

85.56

OAA

97.50

91.85

91.44

85.50


OAO

86.01

85.52

85.00

82.50


Eccentric gear

OAOT

93.55

92.36

91.53

86.90

OAA

92.83

91.52

90.88

84.51


OAO

91.50

90.89

90.63

81.52


Brokentooth gear

OAOT

91.60

90.05

88.74

85.40

OAA

90.63

89.90

86.88

83.49


OAO

88.90

86.60

84.67

80.50


Good gearbox

OAOT

93.65

93.30

92.44

89.42

OAA

93.30

93.15

90.78

88.50


OAO

92.80

91.70

90.60

86.77


Average accuracy (gear)

OAOT

94.15

93.07

92.32

86.82

OAA

93.56

91.60

89.99

85.50


OAO

89.80

88.67

87.72

82.82

Furthermore, the accuracy comparison of WSVM with OAOT, OAA and OAO with Maximum Energy to Shannon Entropy is listed in Table 4. From Table 4, it is clear the proposed method based on wavelet support vector machine using the Morlet wavelet kernel has improved the classification accuracy by 9.97 % with respect to Haar wavelet kernel. In this case, the overall average classification accuracy is 99.67 %. From Table 4, we find that the classification accuracy with OAOT strategy is better than OAA and OAO. The classification accuracy with LSSVM and Maximum Energy to Shannon Entropy criterion is shown in Table 5. From Table 5, we find that, the classification accuracy with multi kernel by OAOT is better than RBF and polynomial kernels.
Table 5. The classified result of experiment data using LSSVM with three methods
Operating condition

Fault classification accuracy based on LSSVM with kernel (%)


Polynomial ($d$ = 3)

RBF ($C$ = 30, $\gamma $ = 2)

Multi kernel


Out race fault

OAOT

86.45

87.55

88.10

OAA

84.35

85.36

87.38


OAO

82.47

83.50

86.50


Inner race fault

OAOT

91.05

93.45

95.40

OAA

86.15

90.50

91.62


OAO

86.03

88.42

90.55


Roller fault

OAOT

84.23

85.01

87.10

OAA

83.40

85.14

90.50


OAO

82.54

83.08

87.52


Combine fault

OAOT

88.77

90.49

92.27

OAA

85.60

88.50

90.50


OAO

84.46

86.60

88.53


Average accuracy (bearing)

OAOT

87.62

89.12

90.71

OAA

84.87

87.37

90.00


OAO

83.87

85.40

88.27


Chipped tooth gear

OAOT

91.00

92.54

93.10

OAA

90.10

90.25

91.10


OAO

85.00

87.57

89.51


Eccentric gear

OAOT

90.25

91.18

91.70

OAA

88.20

88.75

89.55


OAO

85.44

87.47

89.52


Brokentooth gear

OAOT

85.55

86.82

87.10

OAA

85.42

86.00

86.50


OAO

85.46

85.60

88.33


Good gearbox

OAOT

92.50

93.56

94.15

OAA

91.22

92.58

93.20


OAO

90.50

91.53

92.07


Average accuracy (gear)

OAOT

89.82

91.02

91.51

OAA

88.73

89.39

90.08


OAO

86.60

88.04

89.85

Fig. 2 and 3 show the testing time and training time of WSVM and LSSVM with three strategies. We can observe that the training time in OAA is bigger than in OAO and OAOT under all kernel functions. As shown in Fig. 2, the performance of the Morlet kernel for machinery fault diagnosis is acceptable. From Fig. 2, we find that the Morlet kernel has the least testing and training time with respect to other kernel functions. It is clear from Fig. 3, the multi kernel has the least training and testing time with OAOT algorithm. Therefore, the OAOT strategy is better than OAO and OAA for the problem.
In the case of polynomial kernel, $d$ is the important parameter of polynomial kernel, and it is not known before hand how much value of $d$ is the best for classification problem. A 10fold crossvalidation is used to find the best value of $d$ and the one with lowest cross validation error is picked. We study the value of $d$ from the range $d=${1, 2,…, 8}, the accuracy of three strategies for the multiclass classification is compared in Fig. 4. From Fig. 4, we can know that in the case of OAOT algorithm, the accuracy of classification reaches the highest point (88.72 %) when $d=$3 and the lowest classification rate as $d=$1. With the grown of parameter $d$, the overfitting or underfitting problem is caused and the recognition rate degrades. Generally, the OAOT algorithm is better than OAO algorithm and OAA algorithm under the same value of $d$, and their best classification rate is 85.23 % and 86.80 %, respectively. Therefore, the optimal result of the polynomial kernel parameter is $d=$3.
Fig. 2. Training time and testing time for WSVM
a) Training time for WSVM
b) Testing time for WSVM
Fig. 3. Training time and testing time for LSSVM
a) Training time for LSSVM
b) Testing time for LSSVM
Fig. 5 shows that the accuracy of LSSVM using OAOT algorithm with the RBF kernel reaches the highest point (90.07 %) with $C=$30 and $\gamma =$2. Similarly, when we apply the RBF kernel to OAO algorithm and OAA algorithm, the best classification ratio is 86.72 % and 88.38 %, respectively.
From Table 5, in the case of multi kernel at LSSVM, we observe that the highest accuracy is 91.11 % with OAOT. Fig. 6 shows that the accuracy of WSVM using OAOT algorithm with Mexican hat kernel reaches the highest point (94.22 %) with $c=$38.7 and $a=$0.83. Similarly, when we apply the Mexican hat kernel to OAO algorithm and OAA algorithm, the best classification ratio is 88.96 % and 92.81 %, respectively. Fig. 7 shows that the accuracy of WSVM using OAOT algorithm with the Morlet kernel function reaches the highest point (95 %) with $c=$29.7 and $a=$0.74. Similarly, when we apply the Morlet kernel to OAO algorithm and OAA algorithm, the best classification ratio with same $a$, and $c$ is 90.69 % and 94.41 %, respectively. Fig. 8 shows that the accuracy of MSVM using OAOT algorithm with the Shannon kernel reaches the highest point (86.91 %) with $C=$50 and number of vanishing moment ($a=$0.4). Similarly, when we apply the Shannon kernel to OAO algorithm and OAA algorithm, the best classification ratio is 82.99 % and 85.09 %, respectively.
Fig. 4. Comparison of accuracy of three algorithms based on WPT feature extraction with different $d$ for polynomial kernel
Fig. 5. Comparison of accuracy using OAOT algorithm based on WPT feature extraction with RBF kernel in different ($C$, $\gamma $)
Fig. 6. Comparison of accuracy using OAOT algorithm based on WPT feature extraction with Mexican hat kernel in different ($c$, $a$)
Fig. 9 shows that the accuracy of MSVM using OAOT algorithm with the Gaussian kernel reaches the highest point (92.63 %) with $C=$100 and $a=$0.5. Also, when we apply the Gaussian kernel to OAO algorithm and OAA algorithm, the best classification ratio is 88.48 % and 90.99 %, respectively.
Fig. 7. Comparison of accuracy using OAOT algorithm based on WPT feature extraction with Morlet kernel in different ($c$, $a$)
Fig. 8. Comparison of accuracy using OAOT algorithm based on WPT feature extraction with Shannon kernel in different ($C$, $a$)
Fig. 9. Comparison of accuracy using OAOT algorithm based on WPT feature extraction with Gaussian kernel in different ($C$, $a$)
The authors declare that they do not have any conflict of interests in their submitted paper.
6. Conclusions
This study presents, a methodology for detection of gearbox faults by classifying them using two SVM model like WSVM and LSSVM. First, wavelet packet transform applied over the signal, employing the six mothers wavelet. Two wavelet selection criteria Maximum Energy to Shannon Entropy ratio and Maximum Relative Wavelet Energy are used and compared to select an appropriate wavelet for feature extraction. Results obtained from the two criteria show that the wavelet selected using Maximum Energy to Shannon Entropy ratio criterion gives better classification efficiency. Two soft computing methods were good, but the results of faults classification with WSVM are better than LSSVM. To find very efficient features for classification, Maximum Energy to Shannon Entropy ratio was employed to search for the optimal level decomposition level of wavelet packet and consequently the features were reduced. In addition, the Morlet, Mexican hat, Gaussian and Shannon wavelet kernel functions are used to construct the WSVM algorithms. The results show that the Morlet kernel is more accurate and faster than other wavelet kernel function for fault classification of gearbox. As a new idea, energy and Shannon entropy have been applied as two new features along with statistical parameters as input of SVM. The obtained results indicate that the accuracy of the classifier has been increased between 1 to 4 percentage points by considering these two features but the training time of SVM increased with optimal level decomposition and two new features.
Acknowledgements
The authors are grateful to the Shahrekord University of Iran for supporting the experimental tests of this research.
References
 Tran V. T., Yang B. S. An intelligent conditionbased maintenance platform for rotating machinery. Expert Systems with Applications, Vol. 39, 2012, p. 29772988. [Search CrossRef]
 Melter G., Dien N. P. Fault diagnosis in gears operating under nonstationary rotational speed using polar wavelet amplitude. Mechanical Systems and Signal Processing, Vol. 18, Issue 5, 2004, p. 985992. [Search CrossRef]
 McFadden P. D. A revised model for the extraction of periodic waveforms by time domain averaging. Mechanical Systems and Signal Processing, Vol. 7, 1993, p. 193203. [Search CrossRef]
 Combet F., Gelman L. An automated methodology for performing time synchronous averaging of a gearbox signal without speed sensor. Mechanical Systems and Signal Processing, Vol. 21, 2007, p. 25902606. [Search CrossRef]
 Minamihara H., Nishimura M., Takakuwa Y., Ohta M. A method of detection of the correlation function and frequency power spectrum for random noise or vibration with amplitude limitation. Journal of Sound and Vibration, Vol. 141, Issue 3, 1990, p. 425434. [Search CrossRef]
 Wang W. J., McFadden P. D. Early detection of gear failure by vibration analysis I. Calculation of the timefrequency distribution. Mechanical Systems and Signal Processing, Vol. 3, Issue 7, 1993, p. 193203. [Search CrossRef]
 Staszewski W. J., Tomlinson G. R. Application of the wavelet transform to fault detection in a spur gear. Mechanical System and Signal Processing, Vol. 8, 1994, p. 289307. [Search CrossRef]
 Paya B. A., Esat I. I. Artificial neural network based fault diagnostics of rotating machinery using wavelet transforms as a preprocessor. Mechanical Systems and Signal Processing, Vol. 11, Issue 5, 1997, p. 751765. [Search CrossRef]
 Tse P. W., Yang W. X., Tam H. Y. Machine fault diagnosis through an effective exact wavelet analysis. Journal of Sound and Vibration, Vol. 277, 2004, p. 10051024. [Search CrossRef]
 Wu J. D., Liu C. H. An expert system for fault diagnosis in internal combustion engines using wavelet packet transform and neural network. Expert Systems with Applications, Vol. 36, Issue 3, 2009, p. 42784286. [Search CrossRef]
 Cheng J., Yang Y., Yang Y. A rotating machinery fault diagnosis method based on local mean decomposition. Digital Signal Processing, Vol. 22, 2012, p. 356366. [Search CrossRef]
 Vapnik V. The Nature of Statistical Learning Theory. SpringerVerlag, New York, 1995. [Search CrossRef]
 Cortes C., Vapnik V. Support vector networks. Machine Learning, Vol. 20, 1995, p. 273297. [Search CrossRef]
 Bicego M., Figueiredo M. A. T. Soft clustering using weighted oneclass support vector machines. Pattern Recognition, Vol. 42, Issue 1, 2009, p. 2732. [Search CrossRef]
 Cao X. B., Xu Y. W., Chen D., Qiao H. Associated evolution of a support vector machinebased classifier for pedestrian detection. Information Sciences, Vol. 179, Issue 8, 2009, p. 10701077. [Search CrossRef]
 Lingras P., Butz C. Rough set based 1v1 and 1vr approaches to support vector machine multiclassification. Information Sciences, Vol. 177, Issue 18, 2007, p. 37823798. [Search CrossRef]
 Zhou S. M., Gan J. Q., Sepulved F. Classifying mental tasks based on features of higherorder statistics from EEG signals in braincomputer interface. Information Sciences, Vol. 178, Issue 6, 2008, p. 16291640. [Search CrossRef]
 Zhou S. M., John R. I., Wang X. Y., Garibaldi J. M. Compact fuzzy rules induction and feature extraction using SVM with particle swarms for breast cancer treatments. Proceedings of 2008 IEEE Congress on Evolutionary Computation (CEC), 2008, p. 14691475. [Search CrossRef]
 Bloch G., Lauer F., Colin G., Chamaillard Y. Support vector regression from simulation data and few experimental samples. Information Sciences, Vol. 178, Issue 20, 2008, p. 38133827. [Search CrossRef]
 Chuang C. C. Extended support vector interval regression networks for interval inputoutput data. Information Sciences, Vol. 178, Issue 3, 2008, p. 871891. [Search CrossRef]
 Jayadeva, Khemchandani R., Chandra S. Regularized least squares support vector regression for the simultaneous learning of a function and its derivatives. Information Sciences, Vol. 178, Issue 17, 2008, p. 34023414. [Search CrossRef]
 Wong W. T., Shih F. Y., Liu J. Shapebased image retrieval using support vector machines, Fourier descriptors and selforganizing maps. Information Sciences, Vol. 177, Issue 8, 2007, p. 18781891. [Search CrossRef]
 Yuan S. F., Chu F. L. Fault diagnostics based on particle swarm optimization and support vector machines. Mechanical Systems and Signal Processing, Vol. 21, Issue 4, 2007, p. 17871798. [Search CrossRef]
 Zhang J., Wang Y. A rough margin based support vector machine. Information Sciences, Vol. 178, Issue 9, 2008, p. 22042214. [Search CrossRef]
 Saravanan N., Kumar Siddabattuni V. N. S., Ramachandran K. I. A comparative study on classification of features by SVM and PSVM extracted using Morlet wavelet for fault diagnosis. Expert Systems with Applications, Vol. 35, 2008, p. 13511366. [Search CrossRef]
 Qu J., Zuo M. J. Support vector machine based data processing algorithm for wear degree classification of slurry pump systems. Measurement, Vol. 43, 2010, p. 781791. [Search CrossRef]
 Sun C., Zhang Z. S., He Z. J. Research on bearing life prediction based on support vector machine and its application. Journal of Physics: Conference Series, Vol. 305, 2011, p. 012028. [Search CrossRef]
 Hou S., Li Y. Shortterm fault prediction based on support vector machines with parameter optimization by evolution strategy. Expert Systems with Applications, Vol. 36, 2009, p. 1238312391. [Search CrossRef]
 Shen Z., Chen X., Zhang X., He Z. A novel intelligent gear fault diagnosis model based on EMD and multiclass TSVM. Measurement, Vol. 45, 2012, p. 3040. [Search CrossRef]
 Xian G. M., Zeng B. Q. An intelligent fault diagnosis method based on wavelet packer analysis and hybrid support vector machines. Expert Systems with Applications, Vol. 36, 2009, p. 1213112136. [Search CrossRef]
 Zamanian A. H., Ohadi A. Gear fault diagnosis based on Gaussian correlation of vibrations signals and wavelet coefficients. Applied Soft Computing, Vol. 11, 2011, p. 48074819. [Search CrossRef]
 Rosso O. A., Figliola A. Order/disorder in brain electrical activity. Revista Mexicana De Fisica, Vol. 50, 2004, p. 149155. [Search CrossRef]
 Rosso O. A., Blanco S., Yordanova J., Kolev V., Figliola A., Schurmann M., Basar E. Wavelet entropy: a new tool for analysis of short duration brain electrical signals. Journal of Neuroscience Methods, Vol. 105, 2001, p. 6575. [Search CrossRef]
 Yan R. Base Wavelet Selection Criteria for NonStationary Vibration Analysis in Bearing Health Diagnosis. Electronic Doctoral Dissertations for UMass Amherst, Paper AAI3275786, http://scholarworks.umass.edu/dissertations/AAI3275786, 2007. [Search CrossRef]
 Widodo A., Yang B. S. Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, Vol. 21, 2007, p. 25602574. [Search CrossRef]
 Suykens J. A. K., Vandewalle J. Multiclass least squares support vector machines. Proceedings of the International Joint Conference on Neural Networks (IJCNN99), Washington, DC, 2002, p. 900903. [Search CrossRef]
 Zhao S. L., Zhang Y. C. SVM classifier based fault diagnosis of the satellite attitude control system. International Conference on Intelligent Computation Technology and Automation, 2008, p. 907911. [Search CrossRef]
 Long B., Xian W., Li M., Wang H. Improved diagnostics for the incipient faults in analog circuits using LSSVM based on PSO algorithm with Mahalanobis distance. Neurocomputing, Vol. 133, 2014, p. 237248. [Search CrossRef]
 Liu Z., Cao H., Chen X., He Z., Shen Z. Multifault classification based on wavelet SVM with PSO algorithm to analyze vibration signals from rolling element bearings. Neurocomputing, Vol. 99, 2013, p. 399410. [Search CrossRef]
 Data Analysis Competition 2009. Prognostics and Health Management Society, http://www.phmsociety.org/competition/PHM/09/apparatus, 2012. [Search CrossRef]