Next: 16. Bagging, Boosting and
Up: csahtml
Previous: 15.8 Summary and Outlook
-
- 1
-
Aizerman, M., Braverman, E. and Rozonoer, L. (1964).
Theoretical foundations of the potential function method in pattern
recognition learning.
Automation and Remote Control, 25: 821-837.
- 2
-
Akaike, H. (1974).
A new look at the statistical model identification.
IEEE Trans. Automat. Control, 19(6): 716-723.
- 3
-
Aronszajn, N. (1950).
Theory of reproducing kernels.
Transactions of the American Mathematical Society, 68:
337-404.
- 4
-
Barron, A., Birgé, L. and Massart, P. (1999).
Risk bounds for model selection via penalization.
Probability Theory and Related Fields, 113: 301-415.
- 5
-
Bartlett, P., Bousquet, O. and Mendelson, S. (2002).
Localized rademacher complexities.
In Kivinen, J. and Sloan, R.H., (eds), Proceedings COLT, volume
2375 of Lecture Notes in Computer Science, pages 44-58. Springer, Heidelberg.
- 6
-
Bartlett, P.L., Long, P.M. and Williamson, R.C. (1996).
Fat-shattering and the learnability of real-valued functions.
Journal of Computer and System Sciences, 52(3): 434-452.
- 7
-
Bartlett, P.L. and Mendelson, S. (2002).
Rademacher and gaussian complexities: Risk bounds and structural
results.
Journal of Machine Learning Research, 3: 463-482.
- 8
-
Bennett, K.P. and Mangasarian, O.L. (1992).
Robust linear programming discrimination of two linearly inseparable sets.
Optimization Methods and Software, 1: 23-34.
- 9
-
Bertsekas, D.P. (1995).
Nonlinear Programming.
Athena Scientific, Belmont, MA.
- 10
-
Bishop, C.M. (1995).
Neural Networks for Pattern Recognition.
Oxford University Press.
- 11
-
Blankertz, B., Curio, G. and Müller, K-R. (2002).
Classifying single trial EEG: Towards brain computer interfacing.
In Diettrich, T.G., Becker, S., and Ghahramani, Z., (eds), Advances in Neural Information Proccessing Systems, 14: 157-164.
- 12
-
Boser, B.E., Guyon, I.M. and Vapnik, V.N. (1992).
A training algorithm for optimal margin classifiers.
In Haussler, D. (ed), Proceedings of the 5th Annual ACM
Workshop on Computational Learning Theory, pp.144-152.
- 13
-
Bottou, L., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Jackel, L.D.,
LeCun, Y.A., Müller, U.A., Säckinger, E., Simard, P.Y. and Vapnik, V.N. (1994).
Comparison of classifier methods: a case study in handwritten digit
recognition.
In Proceedings of the 12th International Conference on Pattern
Recognition and Neural Networks, Jerusalem, pp.77-87, IEEE Computer
Society Press.
- 14
-
Breiman, L., Friedman, J., Olshen, J. and Stone, C. (1984).
Classification and Regression Trees, Wadsworth.
- 15
-
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Furey, T.S.,
Ares, M. and Haussler, D. (2000).
Knowledge-based analysis of microarray gene expression data using
support vector machines.
Proceedings of the National Academy of Sciences, 97(1): 262-267.
- 16
-
Cauwenberghs, G. and Poggio, T. (2001).
Incremental and decremental support vector machine learning.
In Leen, T.K., Dietterich, T.G. and Tresp, V. (eds), Advances
in Neural Information Processing Systems, 13: 409-415, MIT Press.
- 17
-
Cortes, C. and Vapnik, V.N. (1995).
Support vector networks.
Machine Learning, 20: 273-297.
- 18
-
Cristianini, N. and Shawe-Taylor, J. (2000)
An Introduction to Support Vector Machines,
Cambridge University Press, Cambridge, UK.
- 19
-
DeCoste, D. and Schölkopf, B. (2002)
Training invariant support vector machines.
Machine Learning, 46: 161-190.
Also: Technical report JPL-MLTR-00-1, Jet Propulsion Laboratory,
Pasadena.
- 20
-
Devroye, L., Györfi, L. and Lugosi, G. (1996).
A Probabilistic Theory of Pattern Recognition.
Number 31 in Applications of Mathematics. Springer, New York.
- 21
-
Donoho, D., Johnstone, I., Kerkyacharian, G. and Picard, D. (1996).
Density estimation by wavelet thresholding.
Annals of Statistics, 24: 508-539.
- 22
-
Drucker, H., Schapire, R. and Simard, P.Y. (1993).
Boosting performance in neural networks.
International Journal of Pattern Recognition and Artificial
Intelligence, 7: 705-719.
- 23
-
Duda, R.O., Hart, P.E. and Stork, D.G. (2001).
Pattern classification,
John Wiley & Sons, second edition.
- 24
-
Freund, Y. and Schapire, R.E. (1997).
A decision-theoretic generalization of on-line learning and an
application to boosting.
Journal of Computer and System Sciences, 55(1): 119-139.
- 25
-
Girosi, F. (1998).
An equivalence between sparse approximation and support vector machines.
Neural Computation, 10: 1455-1480.
- 26
-
Girosi, F., Jones, M. and Poggio, T. (1993).
Priors, stabilizers and basis functions: From regularization to
radial, tensor and additive splines.
Technical Report A.I. Memo No. 1430, Massachusetts Institute of
Technology.
- 27
-
Graepel, T., Herbrich, R. and Shawe-Taylor, J. (2000).
Generalization error bounds for sparse linear classifiers.
In Proc. COLT, pp. 298-303, Morgan Kaufmann, San Francisco.
- 28
-
Haussler, D. (1999).
Convolution kernels on discrete structures.
Technical Report UCSC-CRL-99-10, UC Santa Cruz.
- 29
-
Herbrich, R., Graepel, T. and Campbell, C. (2001).
Bayes point machines.
Journal of Machine Learning Research, 1: 245-279.
- 30
-
Jaakkola, T.S., Diekhans, M. and Haussler, D. (2000).
A discriminative framework for detecting remote protein homologies.
J. Comp. Biol., 7: 95-114.
- 31
-
Joachims, T. (1997).
Text categorization with support vector machines: Learning with many
relevant features.
Technical Report 23, LS VIII, University of Dortmund.
- 32
-
Joachims, T. (1999).
Making large-scale SVM learning practical.
In Schölkopf, B., Burges, C.J.C. and Smola, A.J. (eds),
Advances in Kernel Methods - Support Vector Learning, pp. 169-184,
Cambridge, MA, MIT Press.
- 33
-
Kivinen, J., Smola, A.J. and Williamson, R.C. (2001).
Online learning with kernels.
In Diettrich, T.G., Becker, S. and Ghahramani, Z. (eds),
Advances in Neural Inf. Proc. Systems (NIPS 01), pp. 785-792.
- 34
-
Kolmogorov, A.N. (1941).
Stationary sequences in hilbert spaces.
Moscow University Mathematics, 2: 1-40.
- 35
-
Laskov, P. (2002).
Feasible direction decomposition algorithms for training support
vector machines.
Machine Learning, 46: 315-349.
- 36
-
LeCun, Y.A., Jackel, L.D., Bottou, L., Brunot, A., Cortes, C., Denker, J.S.,
Drucker, H., Guyon, I., Müller, U.A., Säckinger, E., Simard, P.Y. and
Vapnik, V.N. (1995).
Comparison of learning algorithms for handwritten digit recognition.
In Fogelman-Soulié, F. and Gallinari, P. (eds),
Proceedings ICANN'95 - International Conference on Artificial Neural
Networks, II: 53-60, Nanterre, France, EC2.
- 37
-
Lin, C.-J. (2001).
On the convergence of the decomposition method for support vector
machines.
IEEE Trans. on Neural Networks, 12(6): 1288-1298.
- 38
-
Luenberger, D.G. (1973).
Introduction to Linear and Nonlinear Programming.
Addison-Wesley, Reading, MA.
- 39
-
Mallows, C.L. (1973).
Some comments on cp.
Technometrics, 15: 661-675.
- 40
-
Mercer, J. (1909).
Functions of positive and negative type and their connection with the
theory of integral equations.
Philos. Trans. Roy. Soc. London, A 209: 415-446.
- 41
-
Mika, S. (2002).
Kernel Fisher Discriminants.
PhD thesis, University of Technology, Berlin, Germany.
- 42
-
Moody, J. and Darken, C. (1989).
Fast learning in networks of locally-tuned processing units.
Neural Computation, 1(2): 281-294.
- 43
-
Morozov, V.A. (1984).
Methods for Solving Incorrectly Posed Problems.
Springer.
- 44
-
Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K. and Schölkopf, B. (2001).
An introduction to kernel-based learning algorithms.
IEEE Transactions on Neural Networks, 12(2): 181-201.
- 45
-
Osuna, E., Freund, R. and Girosi, F. (1997a).
An improved training algorithm for support vector machines.
In Principe, J., Gile, L., Morgan, N. and Wilson, E. (eds),
Neural Networks for Signal Processing VII - Proceedings of the 1997 IEEE
Workshop, pp. 276-285, New York, IEEE.
- 46
-
Osuna, E., Freund, R. and Girosi, F. (1997b).
Training support vector machines: An application to face detection.
In Proceedings CVPR'97.
- 47
-
Parzen, E. (1962).
On estimation of probability density function and mode.
Annals of Mathematical Statistics, 33: 1065-1076.
- 48
-
Platt, J. (1999).
Fast training of support vector machines using sequential minimal
optimization.
In Schölkopf, B., Burges, C.J.C. and Smola, A.J. (eds),
Advances in Kernel Methods - Support Vector Learning, pp. 185-208,
MIT Press, Cambridge, MA.
- 49
-
Ralaivola, L. and d'Alché Buc, F. (2001).
Incremental support vector machine learning: A local approach.
Lecture Notes in Computer Science, 2130: 322-329,
URL citeseer.nj.nec.com/ralaivola01incremental.html.
- 50
-
Rätsch, G. (1998).
Ensemble learning methods for classification.
Master's thesis, Dep. of Computer Science, University of Potsdam, Germany.
- 51
-
Rätsch, G. (2001).
Robust Boosting via Convex Optimization.
PhD thesis, University of Potsdam, Neues Palais 10, 14469 Potsdam,
Germany.
- 52
-
Rätsch, G., Mika, S., Schölkopf, B. and Müller, K.-R. (2002).
Constructing boosting algorithms from SVMs: an application to
one-class classification.
IEEE PAMI, 24(9): 1184-1199.
Earlier version is GMD TechReport No. 119 (2000).
- 53
-
Rüping, S. (2002).
Incremental learning with support vector machines.
Technical Report TR-18, Universität Dortmund, SFB475.
- 54
-
Schölkopf, B., Burges, C.J.C. and Vapnik, V.N. (1995).
Extracting support data for a given task.
In Fayyad, U.M. and Uthurusamy, R. (eds), Proceedings, First
International Conference on Knowledge Discovery & Data Mining,
AAAI Press, Menlo Park, CA.
- 55
-
Schölkopf, B. (2001).
The kernel trick for distances.
In Leen, T.K., Diettrich, T.G. and Tresp, V. (eds),
Advances in Neural Information Processing Systems 13. MIT Press.
- 56
-
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C. (2001).
Estimating the support of a high-dimensional distribution.
Neural Computation, 13(7): 1443-1471.
- 57
-
Schölkopf, B., Simard, P.Y., Smola, A.J. and Vapnik, V.N. (1998a).
Prior knowledge in support vector kernels.
In Jordan, M., Kearns, M. and Solla, S. (eds),
Advances in Neural Information Processing Systems, 10: 640-646, MIT
Press, Cambridge, MA.
- 58
-
Schölkopf, B., Smola, A., Williamson, R.C. and Bartlett, P.L. (2000).
New support vector algorithms.
Neural Computation, 12: 1207-1245.
also NeuroCOLT Technical Report NC-TR-1998-031.
- 59
-
Schölkopf, B. and Smola, A.J (2002).
Learning with Kernels.
MIT Press, Cambridge, MA.
- 60
-
Schölkopf, B., Smola, A.J. and Müller, K.-R. (1998b).
Nonlinear component analysis as a kernel eigenvalue problem.
Neural Computation, 10: 1299-1319.
- 61
-
Shawe-Taylor, J., Bartlett, P.L. and Williamson, R.C. (1998)
Structural risk minimization over data-dependent hierachies.
IEEE Transactions on Information Theory, 44(5): 1926-1940.
- 62
-
Simard, P.Y., LeCun, Y.A., Denker, J.S. and Victorri, B. (1998).
Transformation invariance in pattern recognition - tangent distance
and tangent propagation.
In Orr, G. and Müller, K.-R. (eds), Neural Networks: Tricks
of the Trade, LNCS 1524: 239-274. Springer.
- 63
-
Smola, A.J., Schölkopf, B. and Müller, K.-R. (1998).
The connection between regularization operators and support vector kernels.
Neural Networks, 11: 637-649.
- 64
-
Sonnenburg, S., Rätsch, G., Jagota, A. and Müller, K.-R. (2002).
New methods for splice-site recognition.
In Dorronsoro, J.R. (ed), Proc. International conference on
artificial Neural Networks - ICANN'02, pp. 329-336, LNCS 2415,
Springer, Berlin.
- 65
-
Stitson, M., Gammerman, A., Vapnik, V.N., Vovk, V., Watkins, C. and Weston, J. (1997).
Support vector regression with ANOVA decomposition kernels.
Technical Report CSD-97-22, Royal Holloway, University of London.
- 66
-
Tax, D. and Laskov, P. (2003).
Online SVM learning: from classification to data description and
back.
In Molina, C. et al. (ed), Proc. NNSP, pp.499-508.
- 67
-
Tax, D.M.J. and Duin, R.P.W. (2001).
Uniform object generation for optimizing one-class classifiers.
Journal for Machine Learning Research, pp. 155-173.
- 68
-
Tikhonov, A.N. and Arsenin, V.Y. (1977).
Solutions of Ill-posed Problems.
W.H. Winston, Washington, D.C.
- 69
-
Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S. and Müller, K.R. (2002).
A new discriminative kernel from probabilistic models.
Neural Computation, 14: 2397-2414.
- 70
-
Vapnik, V.N. (1982).
Estimation of Dependences Based on Empirical Data.
Springer, Berlin.
- 71
-
Vapnik, V.N. (1998).
Statistical Learning Theory.
Wiley, New York.
- 72
-
Vapnik, V.N. and Chervonenkis, A.Y. (1974).
Theory of Pattern Recognition.
Nauka, Moskow, Russian.
- 73
-
Vapnik, V.N. and Chervonenkis, A.Y. (1991).
The necessary and sufficient conditions for consistency in the
empirical risk minimization method.
Pattern Recognition and Image Analysis, 1(3): 283-305.
- 74
-
Wahba, G. (1980).
Spline bases, regularization, and generalized cross-validation for
solving approximation problems with large quantities of noisy data.
In Proceedings of the International Conference on Approximation
theory. Academic Press, Austin, Texas.
- 75
-
Warmuth, M.K, Liao, J., Rätsch, G., Mathieson. M., Putta, S. and Lemmem, C.
(2003).
Support Vector Machines for active learning in the drug discovery process.
Journal of Chemical Information Sciences, 43(2): 667-673.
- 76
-
Watkins, C. (2000).
Dynamic alignment kernels.
In Smola, A.J., Bartlett, P.L., Schölkopf, B. and Schuurmans, D. (eds),
Advances in Large Margin Classifiers, pp.39-50, MIT Press, Cambridge,
MA.
- 77
-
Weston, J., Gammerman, A., Stitson, M., Vapnik, V.N., Vovk, V. and Watkins, C. (1999).
Support vector density estimation.
In Schölkopf, B., Burges, C.J.C. and Smola, A.J. (eds),
Advances in Kernel Methods - Support Vector Learning, pp. 293-305,
MIT Press, Cambridge, MA.
- 78
-
Williamson, R.C., Smola, A.J. and Schölkopf, B. (1998).
Generalization performance of regularization networks and support
vector machines via entropy numbers of compact operators.
NeuroCOLT Technical Report NC-TR-98-019, Royal Holloway
College, University of London, UK.
- 79
-
Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T. and
Müller, K.-R. (2000).
Engineering Support Vector Machine Kernels That Recognize
Translation Initiation Sites.
BioInformatics, 16(9): 799-807.
- 80
-
Zoutendijk, G. (1960).
Methods of feasible directions, Elsevier.
Subsections