References

Next: 16. Bagging, Boosting and Up: csahtml Previous: 15.8 Summary and Outlook

References

Aizerman, M., Braverman, E. and Rozonoer, L. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25: 821-837.

2

Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control, 19(6): 716-723.

3

Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68: 337-404.

4

Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probability Theory and Related Fields, 113: 301-415.

5

Bartlett, P., Bousquet, O. and Mendelson, S. (2002). Localized rademacher complexities. In Kivinen, J. and Sloan, R.H., (eds), Proceedings COLT, volume 2375 of Lecture Notes in Computer Science, pages 44-58. Springer, Heidelberg.

6

Bartlett, P.L., Long, P.M. and Williamson, R.C. (1996). Fat-shattering and the learnability of real-valued functions. Journal of Computer and System Sciences, 52(3): 434-452.

7

Bartlett, P.L. and Mendelson, S. (2002). Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3: 463-482.

8

Bennett, K.P. and Mangasarian, O.L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1: 23-34.

9

Bertsekas, D.P. (1995). Nonlinear Programming. Athena Scientific, Belmont, MA.

10

Bishop, C.M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.

11

Blankertz, B., Curio, G. and Müller, K-R. (2002). Classifying single trial EEG: Towards brain computer interfacing. In Diettrich, T.G., Becker, S., and Ghahramani, Z., (eds), Advances in Neural Information Proccessing Systems, 14: 157-164.

12

Boser, B.E., Guyon, I.M. and Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. In Haussler, D. (ed), Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp.144-152.

13

Bottou, L., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Jackel, L.D., LeCun, Y.A., Müller, U.A., Säckinger, E., Simard, P.Y. and Vapnik, V.N. (1994). Comparison of classifier methods: a case study in handwritten digit recognition. In Proceedings of the 12th International Conference on Pattern Recognition and Neural Networks, Jerusalem, pp.77-87, IEEE Computer Society Press.

14

Breiman, L., Friedman, J., Olshen, J. and Stone, C. (1984). Classification and Regression Trees, Wadsworth.

15

Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Furey, T.S., Ares, M. and Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data using support vector machines. Proceedings of the National Academy of Sciences, 97(1): 262-267.

16

Cauwenberghs, G. and Poggio, T. (2001). Incremental and decremental support vector machine learning. In Leen, T.K., Dietterich, T.G. and Tresp, V. (eds), Advances in Neural Information Processing Systems, 13: 409-415, MIT Press.

17

Cortes, C. and Vapnik, V.N. (1995). Support vector networks. Machine Learning, 20: 273-297.

18

Cristianini, N. and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines, Cambridge University Press, Cambridge, UK.

19

DeCoste, D. and Schölkopf, B. (2002) Training invariant support vector machines. Machine Learning, 46: 161-190. Also: Technical report JPL-MLTR-00-1, Jet Propulsion Laboratory, Pasadena.

20

Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Number 31 in Applications of Mathematics. Springer, New York.

21

Donoho, D., Johnstone, I., Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Annals of Statistics, 24: 508-539.

22

Drucker, H., Schapire, R. and Simard, P.Y. (1993). Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7: 705-719.

23

Duda, R.O., Hart, P.E. and Stork, D.G. (2001). Pattern classification, John Wiley & Sons, second edition.

24

Freund, Y. and Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1): 119-139.

25

Girosi, F. (1998). An equivalence between sparse approximation and support vector machines. Neural Computation, 10: 1455-1480.

26

Girosi, F., Jones, M. and Poggio, T. (1993). Priors, stabilizers and basis functions: From regularization to radial, tensor and additive splines. Technical Report A.I. Memo No. 1430, Massachusetts Institute of Technology.

27

Graepel, T., Herbrich, R. and Shawe-Taylor, J. (2000). Generalization error bounds for sparse linear classifiers. In Proc. COLT, pp. 298-303, Morgan Kaufmann, San Francisco.

28

Haussler, D. (1999). Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, UC Santa Cruz.

29

Herbrich, R., Graepel, T. and Campbell, C. (2001). Bayes point machines. Journal of Machine Learning Research, 1: 245-279.

30

Jaakkola, T.S., Diekhans, M. and Haussler, D. (2000). A discriminative framework for detecting remote protein homologies. J. Comp. Biol., 7: 95-114.

31

Joachims, T. (1997). Text categorization with support vector machines: Learning with many relevant features. Technical Report 23, LS VIII, University of Dortmund.

32

Joachims, T. (1999). Making large-scale SVM learning practical. In Schölkopf, B., Burges, C.J.C. and Smola, A.J. (eds), Advances in Kernel Methods - Support Vector Learning, pp. 169-184, Cambridge, MA, MIT Press.

33

Kivinen, J., Smola, A.J. and Williamson, R.C. (2001). Online learning with kernels. In Diettrich, T.G., Becker, S. and Ghahramani, Z. (eds), Advances in Neural Inf. Proc. Systems (NIPS 01), pp. 785-792.

34

Kolmogorov, A.N. (1941). Stationary sequences in hilbert spaces. Moscow University Mathematics, 2: 1-40.

35

Laskov, P. (2002). Feasible direction decomposition algorithms for training support vector machines. Machine Learning, 46: 315-349.

36

LeCun, Y.A., Jackel, L.D., Bottou, L., Brunot, A., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Müller, U.A., Säckinger, E., Simard, P.Y. and Vapnik, V.N. (1995). Comparison of learning algorithms for handwritten digit recognition. In Fogelman-Soulié, F. and Gallinari, P. (eds), Proceedings ICANN'95 - International Conference on Artificial Neural Networks, II: 53-60, Nanterre, France, EC2.

37

Lin, C.-J. (2001). On the convergence of the decomposition method for support vector machines. IEEE Trans. on Neural Networks, 12(6): 1288-1298.

38

Luenberger, D.G. (1973). Introduction to Linear and Nonlinear Programming. Addison-Wesley, Reading, MA.

39

Mallows, C.L. (1973). Some comments on cp. Technometrics, 15: 661-675.

40

Mercer, J. (1909). Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London, A 209: 415-446.

41

Mika, S. (2002). Kernel Fisher Discriminants. PhD thesis, University of Technology, Berlin, Germany.

42

Moody, J. and Darken, C. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2): 281-294.

43

Morozov, V.A. (1984). Methods for Solving Incorrectly Posed Problems. Springer.

44

Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K. and Schölkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2): 181-201.

45

Osuna, E., Freund, R. and Girosi, F. (1997a). An improved training algorithm for support vector machines. In Principe, J., Gile, L., Morgan, N. and Wilson, E. (eds), Neural Networks for Signal Processing VII - Proceedings of the 1997 IEEE Workshop, pp. 276-285, New York, IEEE.

46

Osuna, E., Freund, R. and Girosi, F. (1997b). Training support vector machines: An application to face detection. In Proceedings CVPR'97.

47

Parzen, E. (1962). On estimation of probability density function and mode. Annals of Mathematical Statistics, 33: 1065-1076.

48

Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In Schölkopf, B., Burges, C.J.C. and Smola, A.J. (eds), Advances in Kernel Methods - Support Vector Learning, pp. 185-208, MIT Press, Cambridge, MA.

49

Ralaivola, L. and d'Alché Buc, F. (2001). Incremental support vector machine learning: A local approach. Lecture Notes in Computer Science, 2130: 322-329, URL citeseer.nj.nec.com/ralaivola01incremental.html.

50

Rätsch, G. (1998). Ensemble learning methods for classification. Master's thesis, Dep. of Computer Science, University of Potsdam, Germany.

51

Rätsch, G. (2001). Robust Boosting via Convex Optimization. PhD thesis, University of Potsdam, Neues Palais 10, 14469 Potsdam, Germany.

52

Rätsch, G., Mika, S., Schölkopf, B. and Müller, K.-R. (2002). Constructing boosting algorithms from SVMs: an application to one-class classification. IEEE PAMI, 24(9): 1184-1199. Earlier version is GMD TechReport No. 119 (2000).

53

Rüping, S. (2002). Incremental learning with support vector machines. Technical Report TR-18, Universität Dortmund, SFB475.

54

Schölkopf, B., Burges, C.J.C. and Vapnik, V.N. (1995). Extracting support data for a given task. In Fayyad, U.M. and Uthurusamy, R. (eds), Proceedings, First International Conference on Knowledge Discovery & Data Mining, AAAI Press, Menlo Park, CA.

55

Schölkopf, B. (2001). The kernel trick for distances. In Leen, T.K., Diettrich, T.G. and Tresp, V. (eds), Advances in Neural Information Processing Systems 13. MIT Press.

56

Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7): 1443-1471.

57

Schölkopf, B., Simard, P.Y., Smola, A.J. and Vapnik, V.N. (1998a). Prior knowledge in support vector kernels. In Jordan, M., Kearns, M. and Solla, S. (eds), Advances in Neural Information Processing Systems, 10: 640-646, MIT Press, Cambridge, MA.

58

Schölkopf, B., Smola, A., Williamson, R.C. and Bartlett, P.L. (2000). New support vector algorithms. Neural Computation, 12: 1207-1245. also NeuroCOLT Technical Report NC-TR-1998-031.

59

Schölkopf, B. and Smola, A.J (2002). Learning with Kernels. MIT Press, Cambridge, MA.

60

Schölkopf, B., Smola, A.J. and Müller, K.-R. (1998b). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10: 1299-1319.

61

Shawe-Taylor, J., Bartlett, P.L. and Williamson, R.C. (1998) Structural risk minimization over data-dependent hierachies. IEEE Transactions on Information Theory, 44(5): 1926-1940.

62

Simard, P.Y., LeCun, Y.A., Denker, J.S. and Victorri, B. (1998). Transformation invariance in pattern recognition - tangent distance and tangent propagation. In Orr, G. and Müller, K.-R. (eds), Neural Networks: Tricks of the Trade, LNCS 1524: 239-274. Springer.

63

Smola, A.J., Schölkopf, B. and Müller, K.-R. (1998). The connection between regularization operators and support vector kernels. Neural Networks, 11: 637-649.

64

Sonnenburg, S., Rätsch, G., Jagota, A. and Müller, K.-R. (2002). New methods for splice-site recognition. In Dorronsoro, J.R. (ed), Proc. International conference on artificial Neural Networks - ICANN'02, pp. 329-336, LNCS 2415, Springer, Berlin.

65

Stitson, M., Gammerman, A., Vapnik, V.N., Vovk, V., Watkins, C. and Weston, J. (1997). Support vector regression with ANOVA decomposition kernels. Technical Report CSD-97-22, Royal Holloway, University of London.

66

Tax, D. and Laskov, P. (2003). Online SVM learning: from classification to data description and back. In Molina, C. et al. (ed), Proc. NNSP, pp.499-508.

67

Tax, D.M.J. and Duin, R.P.W. (2001). Uniform object generation for optimizing one-class classifiers. Journal for Machine Learning Research, pp. 155-173.

68

Tikhonov, A.N. and Arsenin, V.Y. (1977). Solutions of Ill-posed Problems. W.H. Winston, Washington, D.C.

69

Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S. and Müller, K.R. (2002). A new discriminative kernel from probabilistic models. Neural Computation, 14: 2397-2414.

70

Vapnik, V.N. (1982). Estimation of Dependences Based on Empirical Data. Springer, Berlin.

71

Vapnik, V.N. (1998). Statistical Learning Theory. Wiley, New York.

72

Vapnik, V.N. and Chervonenkis, A.Y. (1974). Theory of Pattern Recognition. Nauka, Moskow, Russian.

73

Vapnik, V.N. and Chervonenkis, A.Y. (1991). The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3): 283-305.

74

Wahba, G. (1980). Spline bases, regularization, and generalized cross-validation for solving approximation problems with large quantities of noisy data. In Proceedings of the International Conference on Approximation theory. Academic Press, Austin, Texas.

75

Warmuth, M.K, Liao, J., Rätsch, G., Mathieson. M., Putta, S. and Lemmem, C. (2003). Support Vector Machines for active learning in the drug discovery process. Journal of Chemical Information Sciences, 43(2): 667-673.

76

Watkins, C. (2000). Dynamic alignment kernels. In Smola, A.J., Bartlett, P.L., Schölkopf, B. and Schuurmans, D. (eds), Advances in Large Margin Classifiers, pp.39-50, MIT Press, Cambridge, MA.

77

Weston, J., Gammerman, A., Stitson, M., Vapnik, V.N., Vovk, V. and Watkins, C. (1999). Support vector density estimation. In Schölkopf, B., Burges, C.J.C. and Smola, A.J. (eds), Advances in Kernel Methods - Support Vector Learning, pp. 293-305, MIT Press, Cambridge, MA.

78

Williamson, R.C., Smola, A.J. and Schölkopf, B. (1998). Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. NeuroCOLT Technical Report NC-TR-98-019, Royal Holloway College, University of London, UK.

79

Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T. and Müller, K.-R. (2000). Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites. BioInformatics, 16(9): 799-807.

80

Zoutendijk, G. (1960). Methods of feasible directions, Elsevier.

Subsections

16. Bagging, Boosting and Ensemble Methods