Bibliography of Learning with Kernels

Bibliography

Download the BibTeX version of the bibliography (139kB).

Download the PDF version of the bibliography with numbering (250kB).

H. D. I. Abarbanel. Analysis of Observed Chaotic Data. Springer Verlag, New York, 1996.
F. Aidu and V. Vapnik. Estimation of probability density on the basis of the method of stochastic regularization. Avtomatika i Telemekhanika, 4:84-97, April 1989.
M. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821 -- 837, 1964.
H. Akaike. A new look at the statistical model identification. IEEE Trans. Automat. Control, 19(6):716-723, 1974.
A. Albert. Regression and the Moore-Penrose pseudoinverse. Academic Press, New York, NY, 1972.
K. S. Alexander. Probability inequalities for empirical processes and a law of the iterated logarithm. Annals of Probability, 12:1041-1067, 1984.
N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive Dimensions, Uniform Convergence, and Learnability. Journal of the ACM, 44(4):615-631, 1997.
E. Amaldi and V. Kann. The complexity and approximability of finding maximum feasible subsystems of linear relations. Theoretical Computer Science, 147:181-210, 1995.
E. Amaldi and V. Kann. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 1998. To appear.
S. Amari, N. Murata, K.-R Müller, M. Finke, and H. Yang. Asymptotic statistical theory of overtraining and cross-validation. IEEE Trans. on Neural Networks, 8(5):985-996, 1997.
J. K. Anlauf and M. Biehl. The adatron: an adaptive perceptron algorithm. Europhys. Letters, 10:687 -- 692, 1989.
M. Anthony and P. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. (to appear).
M. Anthony and N. Biggs. Computational Learning Theory, volume 30 of Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 1992.
M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207-217, 1993.
M. Anthony. Probabilistic analysis of learning in artificial neural networks: The PAC model and its variants. Neural Computing Surveys, 1:1-47, 1997. http://www.icsi.berkeley.edu/~jagota/NCS.
I.A. Antonov and V.M. Saleev. missing. USSR Computational Mathematics and Mathematical Physics, pages 86-112, 1979.
András Antos, Luc Devroye, and L. Györfi. Lower bounds for Bayes error estimation. IEEE Transactions on Patterns Analysis and Machine Intelligence, 21:643 -- 645, 1999.
N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337 -- 404, 1950.
R. Ash. Information Theory. Interscience Publishers, New York, 1965.
H. Baird. Document image defect models. In Proceddings, IAPR Workshop on Syntactic and Structural Pattern Recognition, pages 38 -- 46, Murray Hill, NJ, 1990.
A. Barron and T. Cover. Minimum complexity density estimation. IEEE Transactions on Information Theory, 37(4), 1991.
A. Barron. Predicted squared error: a criterion for automatic model selection. In S. Farlow, editor, Self-organizing Methods in Modeling. Marcel Dekker, New York, 1984.
A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transaction on Information Theory, 39(3):930-945, May 1993.
P. Bartlett and J. Shawe-Taylor. Generalization performance of support vector machines and other pattern classifiers. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods --- Support Vector Learning, pages 43-54, Cambridge, MA, 1999. MIT Press.
P. L. Bartlett and R. C. Williamson. The VC dimension and pseudodimension of two-layer neural networks with discrete inputs. Neural Computation, 8(3):625-628, 1996.
P. L. Bartlett, P. Long, and R. C. Williamson. Fat--Shattering and the Learnability of Real--Valued Functions. Journal of Computer and System Sciences, 52(3):434-452, 1996.
P. L. Bartlett, S. R. Kulkarni, and S. E. Posner. Covering numbers for real-valued function classes. IEEE Transactions on Information Theory, 43(5):1721-1724, 1997.
P. L. Bartlett. For valid generalization the size of the weights is more important than the size of the network. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 134, Cambridge, MA, 1997. MIT Press.
P. L. Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2):525-536, 1998.
E. B. Baum and D. Haussler. What size net gives valid generalization? Neural Computation, 1:151-160, 1989.
Y. Bengio, Y. LeCun, and D. Henderson. Globally trained handwritten word recognizer using spatial representation, convolutional neural networks and hidden markov models. In J. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, volume 5, pages 937-944, 1994.
K. P. Bennett and J. A. Blue. A support vector machine approach to decision trees. In Proceedings of IJCNN'98, pages 2396-2401, Anchorage, Alaska, 1997.
K. P. Bennett and E. J. Bredensteiner. A parametric optimization method for machine learning. INFORMS Journal on Computing, 9(3):311-318, 1997.
K. P. Bennett and E. J. Bredensteiner. Geometry in learning. In C. Gorini, E. Hart, W. Meyer, and T. Phillips, editors, Geometry at Work, Washington, D.C., 1998. Mathematical Association of America. Available http://www.math.rpi.edu/ sim bennek/geometry2.ps.
K. P. Bennett and A. Demiriz. Semi-supervised support vector machines. Unpublished manuscript based on talk given at Machines That Learn Conference, Snowbird, 1998.
K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23-34, 1992.
K. P. Bennett and O. L. Mangasarian. Multicategory separation via linear programming. Optimization Methods and Software, 3:27-39, 1993.
K. P. Bennett and O. L. Mangasarian. Serial and parallel multicategory discrimination. SIAM Journal on Optimization, 4(4):722-734, 1994.
K. P. Bennett, D. Hui, and L. Auslender. On support vector decision trees for database marketing. Department of Mathematical Sciences Math Report No. 98-100, Rensselaer Polytechnic Institute, Troy, NY 12180, March 1998. http://www.math.rpi.edu/~bennek/.
K. P. Bennett, D. H. Wu, and L. Auslender. On support vector decision trees for database marketing. R.P.I. Math Report No. 98-100, Rensselaer Polytechnic Institute, Troy, NY, 1998.
K. P. Bennett. Decision tree construction via linear programming. In M. Evans, editor, Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society Conference, pages 97-101, Utica, Illinois, 1992.
L. Bernhardt. Zur Klassifizierung vieler Musterklassen mit wenigen Merkmalen. In H. Kazmierczak, editor, 5. DAGM Symposium: Mustererkennung 1983, pages 255 -- 260, Berlin, 1983. VDE-Verlag.
D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1995.
R. Bhatia. Matrix Analysis. Springer Verlag, New York, 1997.
M. Bierlaire, Ph. Toint, and D. Tuyttens. On iterative algorithms for linear least squares problems with bound constraints. Linear Alebra Appl., pages 111-143, 1991.
L. Birgé. On estimating a density using hellinger distance and some other strange facts. Probability Theory and Related Fields, 71:271-291, 1986.
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
C. M. Bishop. Training with noise is equivalent to Tikohonov regularization. Neural Computation, 7:108-116, 1995.
V. Blanz, B. Schölkopf, H. Bülthoff, C. Burges, V. Vapnik, and T. Vetter. Comparison of view-based object recognition algorithms using realistic 3D models. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, and B. Sendhoff, editors, Artificial Neural Networks --- ICANN'96, pages 251 -- 256, Berlin, 1996. Springer Lecture Notes in Computer Science, Vol. 1112.
J. A. Blue. A Hybrid of Tabu Search and Local Descent Algorithms with Applications in Artificial Intelligence. PhD thesis, Rensselaer Polytechnic Institute, 1998.
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929-965, 1989.
S. Bochner. Vorlesungen ueber Fouriersche Integrale. In Akademische Verlagsgesellschaft, Leipzig, 1932.
W. M. Boothby. An introduction to differentiable manifolds and Riemannian geometry. Academic Press, 2nd edition, 1986.
B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144-152, Pittsburgh, PA, July 1992. ACM Press.
L. Bottou and V. N. Vapnik. Local learning algorithms. Neural Computation, 4(6):888-900, 1992.
L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, L. D. Jackel, Y. LeCun, U. A. Müller, E. Säckinger, P. Simard, and V. Vapnik. Comparison of classifier methods: a case study in handwritten digit recognition. In Proceedings of the 12th International Conference on Pattern Recognition and Neural Networks, Jerusalem, pages 77 -- 87. IEEE Computer Society Press, 1994.
H. Bourlard and N. Morgan. A continuous speech recognition system embedding MLP into HMM. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems, volume 2, pages 186-193. Morgan Kaufmann, San Mateo, CA, 1990.
P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In J. Shavlik, editor, Machine Learning Proceedings of the Fifteenth International Conference(ICML '98), pages 82-90, San Francisco, CA, 1998. Morgan Kaufmann. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps.Z.
P. S. Bradley and O. L. Mangasarian. Massive data discrimination via linear support vector machines. Technical Report Mathematical Programming Technical Report 98-05, University of Wisconsin-Madison, 1998. Submitted for publication.
P. S. Bradley, O. L. Mangasarian, and W. N. Street. Feature selection via mathematical programming. Technical Report 95-21, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 1995. To appear in INFORMS Journal on Computing 10, 1998.
P. S. Bradley, U. M. Fayyad, and O. L. Mangasarian. Data mining: Overview and optimization opportunities. Technical Report Mathematical Programming Technical Report 98-01, University of Wisconsin-Madison, 1998. Submitted for publication.
E. J. Bredensteiner and K. P. Bennett. Feature minimization within decision trees. Computational Optimization and Applications, 10:110-126, 1997.
E. J. Bredensteiner and K. P. Bennett. Multicategory classification by support vector machines. Computational Optimization and Applications, 1998. To appear.
E. J. Bredensteiner. Optimization Methods in Data Mining and Machine Learning. PhD thesis, Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, 1997.
C. Bregler and M. Omohundro. Surface learning with applications to lipreading. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Precessing Systems 6, San Mateo, CA, 1994. Morgan Kaufmann Publishers.
L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7:200-217, 1967.
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International, California, 1984.
L. Breiman. Bagging predictors. Technical Report 421, Department of Statistics, UC Berkeley, 1994. ftp://ftp.stat.berkeley.edu/pub/tech-reports/421.ps.Z.
L. Breiman. Bagging predictors. Machine Learning, 26(2):123-140, 1996.
J. Bromley and E. Säckinger. Neural-network and k-nearest-neighbor classifiers. Technical Report 11359-910819-16TM, AT&T, 1991.
R. Brown, P. Bryant, and H. D. I. Abarbanel. Computing the lyapunov spectrum of a dynamical system from observed time-series. Phys. Rev. Lett., 43(6):2787-2806, 1991.
J. M. Buhmann. Data clustering and learning. In M. A. Arbib, editor, The Handbook of Brain Theory and Neural Networks, pages 278-281. MIT Press, 1995.
J. R. Bunch and L. Kaufman. Some stable methods for calculating inertia and solving symmetric linear systems. Mathematics of Computation, 31:163-179, 1977.
C. J. C. Burges and B. Schölkopf. Improving the accuracy and speed of support vector learning machines. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 375-381, Cambridge, MA, 1997. MIT Press.
C. J. C. Burges and V. Vapnik. A new method for constructing artificial neural networks: Interim technical report, ONR contract N00014-94-C-0186. Technical report, AT&T Bell Laboratories, 1995.
C. J. C. Burges. Simplified support vector decision rules. In L. Saitta, editor, Proceedings, 13th Intl. Conf. on Machine Learning, pages 71-77, San Mateo, CA, 1996. Morgan Kaufmann.
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121-167, 1998.
C.J.C. Burges. Geometry and invariance in kernel based methods. In B. Schölkopf, C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods: Support Vector Learning. MIT Press, 1999.
V. Vovk C. Saunders, A. Gammermann. Ridge regression learning algorithm in dual variables. In J. Shavlik, editor, Machine Learning Proceedings of the Fifteenth International Conference(ICML '98), San Francisco, CA, 1998. Morgan Kaufmann.
S. Canu. Régularisation et l'apprentissage. work in progress, available from http://www.hds.utc.fr/WEB/scanu/regul.ps, October 1996.
B. Carl and I. Stephani. Entropy, compactness, and the approximation of operators. Cambridge University Press, Cambridge, UK, 1990.
B. Carl. Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in Banach spaces. Annales de l'Institut Fourier, 35(3):79-118, 1985.
Y. Censor and A. Lent. An iterative row-action method for interval convex programming. J. Optimization Theory and Applications, 34(3):321-353, 1981.
Y. Censor. Row-action methods for huge and sparse systems and their applications. SIAM Review, 23(4):444-467, 1981.
E. I. Chang and R. L. Lippmann. A boundary hunting radial basis function classifier which allocates centers constructively. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, San Mateo, CA, 1993. Morgan Kaufmann.
S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. Technical Report 479, Department of Statistics, Stanford University, May 1995.
S. Chen. Basis Pursuit. PhD thesis, Department of Statistics, Stanford University, November 1995.
V. Cherkassky and F. Mulier. Learning from Data --- Concepts, Theory and Methods. John Wiley & Sons, New York, 1998.
H. Chernoff. A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493-507, 1952.
C. R. Chester. Techniques in Partial Differential Equations. McGraw Hill, 1971.
E. T. Copson. Metric Spaces. Cambridge University Press, 1968.
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273 -- 297, 1995.
C. Cortes, L. D. Jackel, S. A. Solla, V. Vapnik, and J. S. Denker. Learning curves: Asymptotic values and rate of convergence. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 327-334. Morgan Kaufmann Publishers, Inc., 1994.
R. Courant and D. Hilbert. Methods of Mathematical Physics, volume 1. Interscience Publishers, Inc, New York, 1953.
T. M. Cover and P. E. Hart. Nearest neighbor pattern classifications. IEEE transaction on information theory, 13(1):21-27, 1967.
T. M. Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Elect. Comp., 14:326-334, 1965.
D. Cox and F. O'Sullivan. Asymptotic analysis of penalized likelihood and related estimators. Ann. Statist., 18:1676-1695, 1990.
CPLEX Optimization Incorporated, Incline Village, Nevada. Using the CPLEX Callable Library, 1994.
P. Craven and G. Wahba. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31:377-403, 1979.
N. Cristianini, J. Shawe-Taylor, and P. Sykacek. Bayesian classifiers are large margin hyperplanes in a hilbert space. In J. Shavlik, editor, Machine Learning: Proceedings of the Fifteenth International Conference, San Francisco, CA, 1998. Morgan Kaufmann.
S. B. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-28(4):357-366, 1980.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B, 39(1):1-22, 1977.
Luc Devroye and Gábor Lugosi. Lower bounds in pattern recognition and learning. Pattern Recognition, 28, 1995.
L. Devroye, L. Györfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Number 31 in Applications of mathematics. Springer, New York, 1996.
K. I. Diamantaras and S. Y. Kung. Principal Component Neural Networks. Wiley, New York, 1996.
R. Dietrich, M. Opper, and H. Sompolinsky. Statistical mechanics of support vector networks. Physical Review Letters, 82(14):2975-2978, 1999.
T. G. Dietterich. Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, 10:1895-1924, 1998.
S. P. Dirkse and M. C. Ferris. The PATH solver: A non-monotone stabilization scheme for mixed complementarity problems. Optimization Methods and Software, 5:123-156, 1995. ftp://ftp.cs.wisc.edu/tech-reports/reports/93/tr1179.ps.
K. Dodson and T. Poston. Tensor Geometry. Springer-Verlag, 2nd edition, 1991.
H. Drucker, R. Schapire, and P. Simard. Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7:705 -- 719, 1993.
H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, Cambridge, MA, 1997. MIT Press.
H. Drucker. Improving regressors using boosting techniques. In Proc. 14th International Conference on Machine Learning, pages 107-115. Morgan Kaufmann, 1997.
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, 1973.
R. M. Dudley. Real analysis and probability. Mathematics Series. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1989.
S. Dumais. Using SVMs for text categorization. IEEE Intelligent Systems, 13(4), 1998. In: M.A. Hearst, B. Schölkopf, S. Dumais, E. Osuna, and J. Platt: Trends and Controversies --- Support Vector Machines.
N. Dunford and J. T. Schwartz. Linear Operators Part II: Spectral Theory, Self Adjoint Operators in Hilbert Space. Number VII in Pure and Applied Mathematics. John Wiley & Sons, New York, 1963.
N. Dunkin, J. Shawe-Taylor, and P. Koiran. A new incremental learning technique. In Neural Nets, WIRN Vietri-96, Proceedings of the 8th Italian Workshop on Neural Nets, 1997.
N. Dyn. Interpolation of scattered data by radial functions. In C. K. Chui, L. L. Schumaker, and F. I. Utreras, editors, Topics in multivariate approximation. Academic Press, New York, 1987.
J. P. Eckmann and D. Ruelle. Ergodic theory of chaos and strange attractors. Rev. Modern Phys., 57(3):617-656, 1985.
K. Efetov. Supersymmetry in Disorder and Chaos. Cambridge University Press, Cambridge, 1997.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:247-261, 1989.
L. Elden and L. Wittmeyer-Koch. Numerical Analysis: An Introduction. Academic Press, Cambrigde, 1990.
M. Ferraro and T. M. Caelli. Lie transformation groups, integral transforms, and invariant pattern recognition. Spatial Vision, 8:33 -- 44, 1994.
M. C. Ferris and T. S. Munson. Interfaces to PATH 3.0: Design, implementation and usage. Computational Optimization and Applications, 12:207-227, 1999. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/97-12.ps.
V. Fontaine, H. Leich, and J. Hennebert. Influence of vector quantization on isolated word recognition. In M. J. J. Holt, C. F. N. Cowan, P. M. Grant, and W. A. Sandham, editors, Signal Processing VII, Theories and Applications. Proceedings of EUSIPCO-94. Seventh European Signal Processing Conference, volume 1, pages 115-18, Lausanne, Switzerland, 1994. Eur. Assoc. Signal Process.
National Institute for Standards and Technology. Speaker recognition workshop. Technical report, Maritime Institute of Technology, March 1996.
R. Fourer, D. Gay, and B. Kernighan. AMPL A Modeling Language for Mathematical Programming. Boyd and Frazer, Danvers, Massachusetts, 1993.
M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3:95-110, 1956.
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Computational Learning Theory: Eurocolt '95, pages 23-37. Springer-Verlag, 1995.
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning, pages 148-146. Morgan Kaufmann, 1996.
Y. Freund and R.E. Schapire. Large margin classification using the perceptron algorithm. In J. Shavlik, editor, Machine Learning: Proceedings of the Fifteenth International Conference, San Francisco, CA, 1998. Morgan Kaufmann.
J. H. Friedman and W. Stuetzle. Projection pursuit regression. J. American Statistical Association, 76(376):817-823, December 1981.
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Technical report, Stanford University, 1998.
J. H. Friedman. Multivariate adaptive regression splines. Annals of Statistics, 19(1):1-141, 1991.
J. H. Friedman. Another approach to polychotomous classification. Technical report, Department of Statistics and Stanford Linear Accelerator Center, Stanford University, 1996.
J. Friedman. Greedy function approximation: a gradient boosting machine. Technical report, Stanford University, 1999.
T.-T. Frieß , N. Cristianini, and C. Campbell. The kernel adatron algorithm: A fast and simple learning procedure for support vector machines. In 15th Intl. Conf. Machine Learning. Morgan Kaufman Publishers, 1998.
T. Frieß. personal communication. 1998.
E. Gardner. The space of interactions in neural networks. Journal of Physics A, 21:257-70, 1988.
E. Gardner. The space of interactions in neural networks. Journal of Physics A, 21:257-70, 1988.
P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, 1981.
P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale constrained optimization. Technical Report NA-97-2, Dept. of Mathematics, U.C. San Diego, 1997.
D. Girard. Asymptotic optimality of the fast randomized versions of GCV and C_L in ridge regression and regularization. Ann. Statist., 19:1950-1963, 1991.
D. Girard. Asymptotic comparison of (partial) cross-validation, GCV and randomized GCV in nonparametric regression. Ann. Statist., 126:315-334, 1998.
F. Girosi and T. Poggio. Representation properties of networks: Kolmogorov's theorem is irrelevant. Neural Computation, 1(4):465-469, 1989.
F. Girosi, M. Jones, and T. Poggio. Priors, stabilizers and basis functions: From regularization to radial, tensor and additive splines. A.I. Memo No. 1430, MIT, 1993.
F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219-269, 1995.
F. Girosi. An equivalence between sparse approximation and support vector machines. Neural Computation, 10(6):1455-1480, 1998.
H. Gish and M. Schmidt. Text-independent speaker identification. IEEE Signal Processing Magazine, pages 18-32, 1994.
F. Glover. Improved linear programming models for discriminant analysis. Decision Sciences, 21:771-785, 1990.
W. Gochet, A. Stam, V. Srinivasan, and S. Chen. Multigroup discriminant analysis using linear programming. Operations Research, 45(2):213-559, 1997.
H. Goldstein. Classical Mechanics. Addison-Wesley, Reading, MA, 1986.
M. Golea, P. L. Bartlett, W. S. Lee, and L. Mason. Generalization in decision trees and DNF: Does size matter? In Advances in Neural Information Processing Systems 10, 1998.
G. Golub and U. von Matt. Generalized cross-validation for large-scale problems. J. Comput. Graph. Statist., 6:1-34, 1997.
J. Gong, G. Wahba, D. Johnson, and J. Tribbia. Adaptive tuning of numerical weather prediction models: simultaneous estimation of weighting, smoothing and physical parameters. Monthly Weather Review, 125:210-231, 1998.
Y. Gordon, H. König, and C. Schütt. Geometric and probabilistic estimates for entropy and approximation numbers of operators. Journal of Approximation Theory, 49:219-239, 1987.
R. P. Gorman and T. J. Sejnowsky. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks, 1, 1988.
T. Graepel and K. Obermayer. Fuzzy topographic kernel clustering. In W. Brauer, editor, Proceedings of the 5th GI Workshop Fuzzy Neuro Systems '98, pages 90-97, 1998.
R. E. Greene. Isometric Embeddings of Riemannian and Pseudo-Riemannian Manifolds. American Mathematical Society, 1970.
C. Gu and G. Wahba. Semiparametric analysis of variance with tensor product thin plate splines. J. Royal Statistical Soc. Ser. B, 55:353-368, 1993.
L. Gurvits. A note on a scale-sensitive dimension of linear bounded functionals in banach spaces. In Proceedings of Algorithm Learning Theory, ALT-97, pages 352-363. Springer Verlag, 1997.
I. Guyon, V. Vapnik, B. Boser, L. Bottou, and S. A. Solla. Structural risk minimization for character recognition. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 4. Proceedings of the 1991 Conference, pages 471-479, San Mateo, CA, 1992. Morgan Kaufmann.
I. Guyon, B. Boser, and V. Vapnik. Automatic capacity tuning of very large VC-dimension classifiers. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 147-155. Morgan Kaufmann, San Mateo, CA, 1993.
I. Guyon, N. Matic, and V. Vapnik. Discovering informative patterns and data cleaning. In U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smythand R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 181 -- 203. MIT Press, Cambridge, MA, 1996.
I. Guyon, J. Makhoul, R. Schwartz, and V. Vapnik. What size test set gives good error rate estimates. IEEE Pattern Analysis and Machine Intelligence, 20(1):52-64, 1998.
M. Hamermesh. Group theory and its applications to physical problems. Addison Wesley, Reading, MA, 2 edition, 1962. Reprint by Dover, New York, NY.
F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel. Robust statistics. Wiley, New York, NY, 1986.
J. B. Hampshire and A. Waibel. A novel objective function for improved phoneme recognition using time-delay neural networks. IEEE Trans. Neural Networks, 1:216 -- 228, 1990.
W. Härdle. Applied Nonparametric Regression. Cambridge University Press, Cambridge, 1990.
D. Harrison and D. L. Rubinfeld. Hedonic prices and the demand for clean air. In J. Environ. Economics & Management, volume 5, pages 81-102, 1978. Original source of the Boston Housing data, actually from ftp://ftp.ics.uci.com/pub/machine-learning-databases/housing.
T. Hastie and W. Stuetzle. Principal curves. JASA, 84:502 -- 516, 1989.
T. J. Hastie and R. J. Tibshirani. Generalized Additive Models, volume 43 of Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1990.
T. Hastie, R. Tibshirani, and A. Buja. Flexible discriminant analysis. JASA, 89:1255 -- 1270, 1994.
S. Haykin, S. Puthusserypady, and P. Yee. Reconstruction of underlying dynamics of an observed chaotic process. Technical Report 353, Comm. Res. Lab., McMaster University, 1997.
S. Haykin. Neural Networks : A Comprehensive Foundation. Macmillan, New York, 1994.
C. Hildreth. A quadratic programming procedure. Naval Research Logistics Quarterly, 4:79-85, 1957.
T. K. Ho and E. Kleinberg. Building projectable classifiers for arbitrary complexity. In Proceedings of the 12th International Conference on Pattern Recognition, Vienna, pages 880-885, 1996.
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13-30, 1963.
A. E. Hoerl and R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55-67, 1970.
T. Hofmann and J. M. Buhmann. Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(1):1-25, 1997.
Horn and Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1985.
H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417-441 and 498-520, 1933.
P. J. Huber. Robust statistics: a review. Ann. Statist., 43:1041, 1972.
P. J. Huber. Robust Statistics. John Wiley and Sons, New York, 1981.
A. M. Hughes. The Complete Database Marketer. Irwin Prof. Publishing, Chicago, 1996.
M. Hutchinson. A stochastic estimator for the trace of the influence matrix for Laplacian smoothing splines. Commun. Statist.-Simula., 18:1059-1076, 1989.
IBM Corporation. IBM optimization subroutine library guide and reference. IBM Systems Journal, 31, 1992. SC23-0519.
T. S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics, 1999.
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. Technical Report 23, LS VIII, University of Dortmund, 1997.
T. Joachims. Text categorization with support vector machines. In European Conference on Machine Learning (ECML), 1998.
T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods --- Support Vector Learning, pages 169-184, Cambridge, MA, 1999. MIT Press.
I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.
B.-H. Juang, W. Chou, and C.-H. Lee. Minimum Classification Error Rate Methods for Speech Recognition. IEEE Transactions on Speech and Audio Processing, 5(3):257-265, 1997.
J. Karhunen and J. Joutsensalo. Generalizations of principal component analysis, optimization problems, and neural networks. Neural Networks, 8(4):549-562, 1995.
K. Karhunen. Zur Spektraltheorie stochastischer Prozesse. Ann. Acad. Sci. Fenn., 34, 1946.
Marek Karpinski and Angus Macintyre. Polynomial bounds for VC dimension of sigmoidal and general Pfaffian neural networks. Journal of Computer and System Sciences, 54(1):169-176, February 1997.
W. Karush. Minima of functions of several variables with inequalities as side constraints. Master's thesis, Dept. of Mathematics, Univ. of Chicago, 1939.
Y. Katznelson. An introduction to harmonic analysis. John Wiley and Sons, New York, 1968.
L. Kaufman. Solving the quadratic programming problem arising in support vector classification. Technical report, January 1997.
L. Kaufmann. Solving the quadratic programming problem arising in support vector classification. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods --- Support Vector Learning, pages 147-168, Cambridge, MA, 1999. MIT Press.
M. J. Kearns, R. E. Schapire, and L. M. Sellie. Toward efficient agnostic learning. Machine Learning, 17(2):115-141, 1994.
M. Kearns. A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. Neural Computation, 9(5):1143-1161, 1997.
M. Kennel, R. Brown, and H. D. I. Abarbanel. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys. Rev. A., 45:3403-3411, 1992.
L.G. Khachiyan and M.J. Todd. On the complexity of approximating the maximal inscribed ellipsoid for a polytope. Mathematical Programming, 61:137-159, 1993.
G. Kimeldorf and G. Wahba. A correspondence between Bayesian estimation of stochastic processes and smoothing by splines. Ann. Math. Statist., 41:495-502, 1970.
G. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33:82-95, 1971.
M. Kirby and L. Sirovich. Application of the Karhunen-Loève procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1):103-108, January 1990.
N. Klasner and H. U. Simon. From noise-free to noise-tolerant and from on-line to batch learning. In Proceedings of the Eighth Annual Conference on Computational Learning Theory, pages 250-257, Santa Cruz, CA, July 1995. ACM Press.
J. Kohlmorgen, K.-R. Müller, and K. Pawelzik. Analysis of drifting dynamics with neural network hidden markov models. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, Cambridge, MA, 1998. MIT Press.
T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:59 -- 69, 1982.
P. Koiran and E. D. Sontag. Neural networks with quadratic VC dimension. Journal of Computer and System Sciences, 54(1):190-198, 1997.
A. N. Kolmogorov and S. V. Fomin. Introductory Real Analysis. Prentice-Hall, Inc., 1970.
A. N. Kolmogorov. Three approaches to the quantitative definition of information. Problems of Information Transmission, 1:1 -- 7, 1965.
H. König. Eigenvalue Distribution of Compact Operators. Birkhäuser, Basel, 1986.
W. Krauth and M. Mézard. Learning algorithms with optimal stability in neural networks. J. Phys. A, 20:L745-L752, 1987.
U. Kreßel and J. Schürmann. Pattern classification techniques based on function approximation. In H. Bunke and P. S. P. Wang, editors, Handbook on Optical Character Recognition and Document Analysis, pages 49 -- 78. World Scientific Publishing Company, Singapore, 1997.
U. Kreßel. The impact of the learning-set size in handwritten-digit recognition. In T. Kohonen et al., editor, Artificial Neural Networks --- ICANN'91, pages 1685 -- 1689, Amsterdam, 1991. North-Holland.
U. Kreßel. Polynomial classifiers and support vector machines. In W. Gerstner et al., editor, Artificial Neural Networks --- ICANN'97, pages 397 -- 402, Berlin, 1997. Springer Lecture Notes in Computer Science, Vol. 1327.
H. W. Kuhn and A. W. Tucker. Nonlinear programming. In Proc. 2^nd Berkeley Symposium on Mathematical Statistics and Probabilistics, pages 481-492, Berkeley, 1951. University of California Press.
R. Kühn and J.L. van Hemmen. Collective phenomena in neural networks. In E. Domany J.L. van Hemmen and K. Schulten, editors, Physics of Neural Networks I. Springer Verlag, New York, 1996.
P.F. Lambert. Designing pattern categorizers with extremal paradigm information. In S. Watanabe, editor, Methodologies of Pattern Recognition, pages 359-391, New York, NY, 1969. Academic Press.
P.R. Lampert. Designing pattern categories with extremal paradigm information. In M.S. Watanabe, editor, Methodologies of Pattern Recognition. Academic Press, N.Y., 1969.
Large-Scale Numerical Optimization, Philadelphia, Pennsylvania, 1990. SIAM.
J. Larsen and L. K. Hansen. Linear unlearning for cross-validation. Advances in Computational Mathematics, 5:269-280, 1996.
Y. LeCun. MNIST handwritten digit database. Available as http://www.research.att.com/ simyann /ocr/mnist/.
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. J. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1:541 -- 551, 1989.
Y. LeCun, L. D. Jackel, L. Bottou, A. Brunot, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Müller, E. Säckinger, P. Simard, and V. Vapnik. Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman-Soulié and P. Gallinari, editors, Proceedings ICANN'95 --- International Conference on Artificial Neural Networks, volume II, pages 53 -- 60, Nanterre, France, 1995. EC2. The MNIST benchmark data is available from http://www.research.att.com/ simyann /ocr/mnist/.
W. S. Lee, P. L. Bartlett, and R. C. Williamson. On efficient agnostic learning of linear combinations of basis functions. In Proc. 8th Annu. Workshop on Comput. Learning Theory, pages 369-376. ACM Press, New York, NY, 1995.
W. S. Lee, P. L. Bartlett, and R. C. Williamson. The importance of convexity in learning with squared loss. IEEE Transactions on Information Theory, 1998. to appear.
K. C. Li. Asymptotic optimality of C_L and generalized cross validation in ridge regression with application to spline smoothing. Ann. Statist., 14:1101-1112, 1986.
W. Liebert, K. Pawelzik, and H. G. Schuster. Optimal embeddings of chaotic attractors from topological considerations. Europhys. Lett., 14:521 -- 526, 1991.
B. Lillekjendlie, D. Kugiumtzis, and N. Christophersen. Chaotic time series: Part II. system identification and prediction. Modeling, Identification and Control, 15(4):225-243, 1994.
N. Littlestone and M. Warmuth. Relating data compression and learnability. Technical report, University of California Santa Cruz, 1986.
P.M. Long. The complexity of learning according to two models of a drifting environment. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 116-125. ACM Press, 1998.
E. N. Lorenz. Deterministic nonperiodic flow. J. Atmos. Sci., 20:130-141, 1963.
G. Loy and P. L. Bartlett. Generalization and the size of the weights: an experimental study. In Proceedings of the Eighth Australian Conference on Neural Networks, pages 60-64, 1997.
D. G. Luenberger. Introduction to Linear and Nonlinear Programming. Addison-Wesley, Reading, MA, 1973.
C.-S. Lui, C.-H. Lee, W. Chou, B.-H. Juang, and A. E. Rosenberg. A study on minimum error discriminative training for speaker recognition. Journal of the Acoustical Society of America, 97(1):637-648, 1995.
D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4:415-447, 1992.
D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4:720-736, 1992.
D. J. C. MacKay. A practical Bayesian framework for backprop networks. Neural Computation, 4:448-472, 1992.
M. C. Mackey and L. Glass. Oscillation and chaos in physiological control systems. Science, 197:287-289, 1977.
S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998.
O. L. Mangasarian and R. Meyer. Nonlinear perturbations of linear programs. SIAM Journal on Control and Optimization, 17(6):745-752, 1979.
O. L. Mangasarian, R. Setiono, and W. H. Wolberg. Pattern recognition via linear programming: Theory and application to medical diagnosis. pages 22-31, 1990.
O. L. Mangasarian. Multi-surface method of pattern separation. IEEE Transactions on Information Theory, IT-14:801-807, 1968.
O. L. Mangasarian. Misclassification minimization. J. Global Optimization, 5:309-323, 1994.
O. L. Mangasarian. Nonlinear Programming. SIAM, Philadelphia, PA, 1994.
O. L. Mangasarian. Mathematical programming in data mining. Data Mining and Knowledge Discovery, 42(1):183-201, 1997.
O. L. Mangasiarian. Linear and nonlinear separation of patterns by linear programming. Operations Research, 13:444-452, 1965.
M. Marchand, M. Golea, and P. Ruján. Convergence theorem for sequential learning in two layer perceptrons. Europhysics Letters, 11:487-492, 1989.
MATLAB. User's Guide. The MathWorks, Inc., Natick, MA 01760, 1992.
J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London, A 209:415-446, 1909.
C. J. Merz and P. M. Murphy. UCI repository of machine learning databases, 1998. [http://www.ics.uci.edu/ simmlearn /MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
M. Mézard, G. Parisi, and M.G. Virasoro. Spin Glass Theory and Beyond. World Scientific, Singapore, 1987.
C. A. Micchelli. Interpolation of scattered data: distance matrices and conditionally positive definite functions. Constructive Approximation, 2:11-22, 1986.
S. Mika, B. Schölkopf, A. Smola, K.-R. Müller, M. Scholz, and G. Rätsch. Kernel PCA and de-noising in feature spaces. In Advances in Neural Information Processing Systems 11, 1998.
J. Moody and C. Darken. Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2):281-294, 1989.
J. Moody. The effective number of parameters: An analysis of generalization and regularization in non-linear learning systems. In S. J. Hanson J. Moody and R. P. Lippman, editors, Advances in Neural information processings systems 4, pages 847-854, San Mateo, CA, 1992. Morgan Kaufman.
J. More and G. Toraldo. On the solution of large quadratic programming problems with B ound constraints. SIAM Optimization, 1(1):93-113, 1991.
V. A. Morozov. Methods for Solving Incorrectly Posed Problems. Springer Verlag, 1984.
F. Mosteller and R. Rourke. Sturdy Statistics. Addison-Wesley, Reading, MA, 1973.
S. Mukherjee, E. Osuna, and F. Girosi. Nonlinear prediction of chaotic time series using a support vector machine. In J. Principe, L. Gile, N. Morgan, and E. Wilson, editors, Neural Networks for Signal Processing VII --- Proceedings of the 1997 IEEE Workshop, pages 511 -- 520, New York, 1997. IEEE.
B. Müller and J. Reinhardt. Neural Networks: An Introduction. Springer Verlag, 1990.
K.-R. Müller, J. Kohlmorgen, and K. Pawelzik. Analysis of switching dynamics with competing neural networks. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E78-A(10):1306-1315, 1995.
K.-R. Müller, A. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik. Predicting time series with support vector machines. In W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, editors, Artificial Neural Networks --- ICANN'97, pages 999 -- 1004, Berlin, 1997. Springer Lecture Notes in Computer Science, Vol. 1327.
P. M. Murphy and D. W. Aha. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, California, 1992.
B. A. Murtagh and M. A. Saunders. MINOS 5.4 user's guide. Technical Report SOL 83.20, Stanford University, 1993.
S. Murthy, S. Kasif, and S. Salzberg. A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2:1-32, 1994.
K.-G. Murthy. Linear Programming. John Wiley & Sons, New York, New York, 1983.
I. Nagayama and N. Akamatsu. Approximation of chaotic behavior by using neural network. IEICE Trans. Inf. & Syst., E77-D(4), 1994.
J. Nash. The embedding problem for riemannian manifolds. Annals of Mathematics, 63:20 -- 63, 1956.
R. Neal. Priors for infinite networks. Technical Report CRG-TR-94-1, Dept. of Computer Science, University of Toronto, 1994.
R. Neal. Bayesian Learning in Neural Networks. Springer Verlag, 1996.
N. J. Nilsson. Learning machines: Foundations of Trainable Pattern Classifying Systems. McGraw-Hill, 1965.
P. Niyogi and F. Girosi. On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Computation, 8:819-842, 1996.
S. Odewahn, E. Stockwell, R. Pennington, R Humphreys, and W Zumach. Automated star/galaxy discrimination with neural networks. Astronomical Journal, 103(1):318-331, 1992.
Irvine University of California. Machine learning repository, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.
E. Oja. A simplified neuron model as a principal component analyzer. J. Math. Biology, 15:267 -- 273, 1982.
E. Oja. Subspace methods of pattern recognition. John Wiley, New York, NY, 1983.
M. Okamoto. Some inequalities relating to the partial sum of binomial probabilities. Annals of the Institue of Statistical Mathematics, 10:29-35, 1958.
P. J. Olver. Applications of Lie Groups to Differential Equations. Springer-Verlag, 1986.
M. Opper and D. Haussler. Generalization performance of bayes optimal classification algorithm for learning a perceptron. Physical Review Letters, 66:2677, 1991.
M. Opper and W. Kinzel. Physics of generalization. In E. Domany J. L. van Hemmen and K. Schulten, editors, Physics of Neural Networks III. Springer Verlag, New York, 1996.
M. Opper and O. Winther. Gaussian processes for classification. Submited to Neural Computation, 1999.
M. Opper and O. Winther. Mean field methods for classification with gaussian processes. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, Cambridge, MA, 1999. MIT Press.
M. Opper, P. Kuhlmann, and A. Mietzner. Convexity, internal representations and the statistical mechanics of neural networks. Europhysics Letters, 37(1):31-36, 1997.
M. Opper. Learning in neural networks: Solvable dynamics. Europhysics Letters, 8(4):389-392, 1989.
M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio. Pedestrian detection using wavelet templates. In Proc. Computer Vision and Pattern Recognition, pages 193-199, Puerto Rico, June 16-20 1997.
E. Osuna and F. Girosi. Reducing run-time complexity in SVMs. In Proceedings of the 14th Int'l Conf. on Pattern Recognition, Brisbane, Australia, 1998. To appear.
E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. In J. Principe, L. Gile, N. Morgan, and E. Wilson, editors, Neural Networks for Signal Processing VII --- Proceedings of the 1997 IEEE Workshop, pages 276 -- 285, New York, 1997. IEEE.
E. Osuna, R. Freund, and F. Girosi. Support vector machines: Training and applications. AI Memo 1602, Massachusetts Institute of Technology, 1997.
E. Osuna, R. Freund, and F. Girosi. Training support vector machines: An application to face detection. In Proc. Computer Vision and Pattern Recognition '97, pages 130-136, 1997.
N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw. Geometry from a time series. Phys. Rev. Lett., 45:712-716, 1980.
E. Parzen. An approach to time series analysis. Ann. Math. Statist., 32:951-989, 1962.
E. Parzen. On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33(3):1065-1076, 1962.
E. Parzen. Statistical inference on time series by rkhs methods. In R. Pyke, editor, Proceedings 12th Biennial Seminar, Montreal, 1970. Canadian Mathematical Congress. 1-37.
K. Pawelzik, J. Kohlmorgen, and K.-R. Müller. Annealed competition of experts for a segmentation and classification of switching dynamics. Neural Computation, 8(2):342-358, 1996.
K. Pawelzik, K.-R. Müller, and J. Kohlmorgen. Prediction of mixtures. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, and B. Sendhoff, editors, Artificial Neural Networks --- ICANN'96, pages 127-133, Berlin, 1996. Springer Lecture Notes in Computer Science, Vol. 1112.
K. Pearson. On lines and planes of closest fit to points in space. Philosophical Magazine, 2 (sixth series):559-572, 1901.
J. C. Platt. A resource-allocating network for function interpolation. Neural Computation, 3(2):213-225, 1991.
J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, 1998.
J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods --- Support Vector Learning, pages 185-208, Cambridge, MA, 1999. MIT Press.
T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9), September 1990.
T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978-982, 1990.
T. Poggio. On optimal nonlinear associative recall. Biological Cybernetics, 19:201-209, 1975.
M. Pontil and A. Verri. Properties of support vector machines. Neural Computation, 10:955-974, 1997.
M. J. D. Powell. Radial basis functions for multivariable interpolation: A review. In Algorithms for Approximation, J.C. Mason and M.G. Cox (Eds.), pages 143-167. Oxford Clarendon Press, 1987.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing (2nd ed.). Cambridge University Press, Cambridge, 1992.
J. C. Principe and J. M. Kuo. Dynamic modeling of chaotic time series with neural networks. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Precessing Systems 7, San Mateo, CA, 1995. Morgan Kaufmann Publishers.
R. T. Prosser. The varepsilon --Entropy and varepsilon --Capacity of Certain Time--Varying Channels. Journal of Mathematical Analysis and Applications, 16:553-573, 1966.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
J. R. Quinlan. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pages 725-730, Menlo Park, 1996. AAAI Press / MIT Press.
C. Rasmussen. Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression. PhD thesis, Department of Computer Science, University of Toronto, 1996. ftp://ftp.cs.toronto.edu/pub/carl/thesis.ps.gz.
G. Rätsch. Ensemble learning for classification. Master's thesis, University of Potsdam, 1998. in German.
M. Reed and B. Simon. Method of modern mathematical physics. Vol. 1: Functional Analysis. Academic Press, San Diego, 1980.
D. Reilly, L. N. Cooper, and C. Elbaum. A neural model for category learning. Biol. Cybern., 45:35 -- 41., 1982.
D. A. Reynolds. A Gaussian Mixture Modeling Approach to Text-Independent Speaker Identification. PhD thesis, Georgia Institute of Technology, 1992.
D. A. Reynolds. Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17:91-108, 1995.
D. A. Reynolds. Comparison of background normalization methods for text-independent speaker verification. In Proc. Eurospeech '97, pages 963-966, Rhodes, Greece, September 1997.
G. Ridgeway, D. Madigan, and T. Richardson. Boosting methodology for regression problems. In Proc. 15th International Conference on Machine Learning. Morgan Kaufmann, 1998.
B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1996.
J. Rissanen. Modeling by shortest data description. Automatica, 14:465-471, 1978.
H. J. Ritter, T. M. Martinetz, and K. J. Schulten. Neuronale Netze: Eine Einführung in die Neuroinformatik selbstorganisierender Abbildungen. Addison-Wesley, Munich, Germany, 1990.
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386-408, 1958.
A. Roy and S. Mukhopadhyay. Iterative generation of higher-order nets in polynomial time using linear programming. IEEE Transactions on Neural Networks, 8(2):402-412, 1997.
A. Roy, L. S. Kim, and S. Mukhopadhyay. A polynomial time algorithm for the construction and training of a class of multilayer perceptrons. Neural Networks, 6:535-545, 1993.
A. Roy, S. Govil, and R. Miranda. An algorithm to generate radial basis function (RBF)-like nets for classification problems. Neural Networks, 8(2):179-202, 1995.
P. Ruján and M. Marchand. Learning by minimizing resources in neural networks. Complex Systems, 3:229-242, 1989.
P. Ruján. Playing billiard in version space. Neural Computation, 9:197-238, 1996.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. Rumelhart and J. McClelland, editors, Parallel Distributed Processing, volume 1, pages 318-362. MIT Press, Cambridge, MA, 1986.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Nature, 323(9):533-536, October 1986.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Parallel Distributed Processing. MIT Press, Cambridge, MA, 1986.
D.E. Rumelhart, J.L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, volume 1. MIT Press, Cambridge, MA, 1986.
M. A. Saunders S. S. Chen, D. L. Donoho. Atomic decomposition by basis pursuit. Technical Report Dept. of Statistics Technical Report, Stanford University, 1996.
S. Saitoh. Theory of Reproducing Kernels and its Applications. Longman Scientific & Technical, Harlow, England, 1988.
T. D. Sanger. Optimal unsupervised learning in a single-layer linear feedforward network. Neural Networks, 2:459-473, 1989.
T. Sauer, J. A. Yorke, and M. Casdagli. Embedology. J. Stat. Phys., 65:579-616, 1991.
C. Saunders, M. O. Stitson, J. Weston, L. Bottou, B. Schölkopf, and A. Smola. Support vector machine - reference manual. Technical Report CSD-TR-98-03, Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX, UK, 1998. TR available as http://www.dcs.rhbnc.ac.uk/research/compint/areas/comp_learn/sv/pub/ report98-03.ps; SVM available at http://svm.dcs.rhbnc.ac.uk/.
R. J. Schalkoff. Digital Image Processing and Computer Vision. John Wiley and Sons, Inc., 1989.
R. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Proc. 11th Annual Conference on Computational Learning Theory, pages 80-91, New York, NY, 1998. ACM Press.
R. Schapire, Y. Freund, P. Bartlett, and W. Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 1998. (To appear. An earlier version appeared in: D.H. Fisher, Jr. (ed.), Proceedings ICML97, Morgan Kaufmann.).
R. Schapire. The strength of weak learnability. Machine Learning, 5(2):197-227, 1990.
M. Schmidt and H. Gish. Speaker identification via support vector classifiers. In Proc. ICASSP '96, pages 105-108, Atlanta, GA, May 1996.
M. Schmidt. Identifying speakers with support vectors networks. In Proceedings of Interface '96, Sydney, July 1996.
M. Schmidt. Private communication, 1998.
I. Schoenberg. Positive definite functions on spheres. Duke Math. J., 9:96-108, 1942.
B. Schölkopf, C. Burges, and V. Vapnik. Extracting support data for a given task. In U. M. Fayyad and R. Uthurusamy, editors, Proceedings, First International Conference on Knowledge Discovery & Data Mining. AAAI Press, Menlo Park, CA, 1995.
B. Schölkopf, C. Burges, and V. Vapnik. Incorporating invariances in support vector learning machines. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, and B. Sendhoff, editors, Artificial Neural Networks --- ICANN'96, pages 47 -- 52, Berlin, 1996. Springer Lecture Notes in Computer Science, Vol. 1112.
B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Technical Report 44, Max-Planck-Institut für biologische Kybernetik, 1996.
B. Schölkopf, K. Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik. Comparing support vector machines with gaussian kernels to radial basis function classifiers. A.I. Memo No. 1599, Massachusetts Institute of Techology, 1996.
B. Schölkopf, A. Smola, and K.-R. Müller. Kernel principal component analysis. In W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, editors, Artificial Neural Networks --- ICANN'97, pages 583 -- 588, Berlin, 1997. Springer Lecture Notes in Computer Science, Vol. 1327.
B. Schölkopf, K. Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Trans. Sign. Processing, 45:2758 -- 2765, 1997.
B. Schölkopf, P. Bartlett, A. Smola, and R. Williamson. Shrinking the tube: a new support vector regression algorithm. In Advances in Neural Information Processing Systems 11, 1998.
B. Schölkopf, P. Bartlett, A. Smola, and R. Williamson. Support vector regression with automatic accuracy control. In L. Niklasson, M. Bodén, and T. Ziemke, editors, Proceedings of the 8th International Conference on Artificial Neural Networks, Perspectives in Neural Computing, pages 147 -- 152, Berlin, 1998. Springer Verlag.
B. Schölkopf, P. Knirsch, A. Smola, and C. Burges. Fast approximation of support vector kernel expansions, and an interpretation of clustering as approximation in feature spaces. In P. Levi, M. Schanz, R.-J. Ahlers, and F. May, editors, Mustererkennung 1998 --- 20. DAGM-Symposium, Informatik aktuell, pages 124 -- 132, Berlin, 1998. Springer.
B. Schölkopf, S. Mika, A. Smola, G. Rätsch, and K.-R. Müller. Kernel PCA pattern reconstruction via approximate pre-images. In L. Niklasson, M. Bodén, and T. Ziemke, editors, Proceedings of the 8th International Conference on Artificial Neural Networks, Perspectives in Neural Computing, pages 147 -- 152, Berlin, 1998. Springer Verlag.
B. Schölkopf, P. Simard, A. Smola, and V. Vapnik. Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, pages 640 -- 646, Cambridge, MA, 1998. MIT Press.
B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299-1319, 1998.
B. Schölkopf, A. Smola, K.-R. Müller, C. Burges, and V. Vapnik. Support vector methods in learning and feature extraction. In T. Downs, M. Frean, and M. Gallagher, editors, Proceedings of the Ninth Australian Conference on Neural Networks, pages 72 -- 78, Brisbane, Australia, 1998. University of Queensland.
B. Schölkopf, A. Smola, K.-R. Müller, C. Burges, and V. Vapnik. Support vector methods in learning and feature extraction. Australian Journal on Intelligent Information Processing Systems, 1998. Special issue with selected papers of ACNN'98; accepted for publication.
B. Schölkopf, A. Smola, R. Williamson, and P. Bartlett. New support vector algorithms. NeuroCOLT Technical Report NC-TR-98-031, Royal Holloway College, University of London, UK, 1998.
B. Schölkopf, C. J. C. Burges, and A. J. Smola. Advances in Kernel Methods --- Support Vector Learning. MIT Press, Cambridge, MA, 1999.
B. Schölkopf, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Generalization bounds via eigenvalues of the Gram matrix. Submitted to COLT99, February 1999.
B. Schölkopf. Künstliches Lernen. In S. Bornholdt and P. H. Feindt, editors, Komplexe adaptive Systeme (Forum für Interdisziplinäre Forschung Bd. 15), pages 93 -- 117. Röll, Dettelbach, 1996.
B. Schölkopf. Support Vector Learning. R. Oldenbourg Verlag, Munich, 1997.
B. Schölkopf. Support-Vektor-Lernen. In G. Hotz, H. Fiedler, P. Gorny, W. Grass, S. Hölldobler, I. O. Kerner, and R. Reischuk, editors, Ausgezeichnete Informatikdissertationen 1997, pages 135 -- 150. Teubner, Stuttgart, 1998.
B. Schölkopf. SVMs --- a practical consequence of learning theory. IEEE Intelligent Systems, 13:18 -- 21, 1998. In: Trends and Controversies --- Support Vector Machines.
J. C. Schouten, F. Takens, and C. M. van den Bleek. Estimation of the dimension of a noisy attractor. Physical Review E, 50(3):1851-1860, 1994.
J. Schürmann. Pattern Classification. Wiley Interscience, New York, NY, 1996.
J. Schürmann. Pattern Classification: a unified view of statistical and neural approaches. Wiley, New York, 1996.
H. Schwenk and Y. Bengio. Training methods for adaptive boosting of neural networks. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press, 1998.
D. W Scott. Multivariate Density Estimation. Wiley-Interscience, New York, 1992.
J. Segman, J. Rubinstein, and Y. Y. Zeevi. The canonical coordinates method for pattern deformation: theoretical and computational considerations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14:1171 -- 1183, 1992.
H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review A, 45(8):6056-6091, 1992.
J. Shawe-Taylor and N. Cristianini. Data-dependent structural risk minimisation for perceptron decision trees. In Advances in Neural Information Processing Systems 10, 1998.
J. Shawe-Taylor and N. Cristianini. Robust bounds on the generalization from the margin distribution. NeuroCOLT Technical Report NC-TR-98-029, Royal Holloway College, University of London, UK, 1998.
J. Shawe-Taylor and Nello Cristianini. Robust bounds on generalization from the margin distribution. NeuroCOLT Technical Report NC-TR-1998-020, ESPRIT NeuroCOLT2 Working Group, http://www.neurocolt.com, 1998.
J. Shawe-Taylor and N. Cristianini. Margin distribution bounds on generalization. In Proceedings of the European Conference on Computational Learning Theory, EuroCOLT'99, 1999.
J. Shawe-Taylor, P. Bartlett, R. Williamson, and M. Anthony. A framework for structural risk minimization. In COLT, 1996.
J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926-1940, 1998.
P. Simard, B. Victorri, Y. LeCun, and J. Denker. Tangent prop --- a formalism for specifying selected invariances in an adaptive network. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 4, pages 895-903, San Mateo, CA, 1992. Morgan Kaufmann.
P. Simard, Y. LeCun, and J. Denker. Efficient pattern recognition using a new transformation distance. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 50-58, San Mateo, CA, 1993. Morgan Kaufmann.
Hans Ulrich Simon. General bounds on the number of examples needed for learning probabilistic concepts. J. of Comput. Syst. Sci., 52(2):239-254, 1996. Earlier version in 6th COLT, 1993.
A. Skorokhod and M. Yadrenko. On absolute continuity of measures corresponding to homogeneous Gaussian fields. Theory of Probability and its Applications, XVIII:27-40, 1973.
F. W. Smith. Pattern classifier design by linear programming. IEEE Transactions on Computers, C-17:367-372, 1968.
A. Smola and B. Schölkopf. On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Technical Report 1064, GMD, 1997.
A. Smola and B. Schölkopf. From regularization operators to support vector kernels. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, pages 343 -- 349, Cambridge, MA, 1998. MIT Press.
A. Smola and B. Schölkopf. On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica, 22:211 -- 231, 1998.
A. Smola and B. Schölkopf. A tutorial on support vector regression. Statistics and Computing, 1998. Invited paper, in press.
A. Smola, T. Frieß, and B. Schölkopf. Semiparametric support vector and linear programming machines. In Advances in Neural Information Processing Systems 11, 1998.
A. Smola, N. Murata, B. Schölkopf, and K.-R. Müller. Asymptotically optimal choice of varepsilon -loss for support vector machines. In L. Niklasson, M. Bodén, and T. Ziemke, editors, Proceedings of the 8th International Conference on Artificial Neural Networks, Perspectives in Neural Computing, Berlin, 1998. Springer Verlag. In press.
A. Smola, B. Schölkopf, and K.-R. Müller. The connection between regularization operators and support vector kernels. Neural Networks, 11:637-649, 1998.
A. Smola, B. Schölkopf, and K.-R. Müller. Convex cost functions for support vector regression. In L. Niklasson, M. Bodén, and T. Ziemke, editors, Proceedings of the 8th International Conference on Artificial Neural Networks, Perspectives in Neural Computing, Berlin, 1998. Springer Verlag.
A. Smola, B. Schölkopf, and K.-R. Müller. General cost functions for support vector regression. In T. Downs, M. Frean, and M. Gallagher, editors, Proc. of the Ninth Australian Conf. on Neural Networks, pages 79 -- 83, Brisbane, Australia, 1998. University of Queensland.
A. Smola, R. Williamson, and B. Schölkopf. Generalization bounds for convex combinations of kernel functions. In Advances in Neural Information Processing Systems 11, 1998. Submitted.
A. J. Smola. Regression estimation with support vector learning machines. Diplomarbeit, Technische Universität München, 1996.
A. J. Smola. Learning with Kernels. PhD thesis, Technische Universität Berlin, 1998.
J. Stewart. Positive definite funcions and generalizations, an historical survey. Rocky Mountain Journal of Mathematics, 6(3):409-434, 1978.
M. O. Stitson, J. A. E. Weston, A. Gammerman, V. Vovk, and V. Vapnik. Theory of support vector machines. Technical Report CSD-TR-96-17, Royal Holloway, University of London, December 1996.
M. Stitson, A. Gammerman, V. Vapnik, V. Vovk, C. Watkins, and J. Weston. Support vector regression with ANOVA decomposition kernels. Technical Report CSD-TR-97-22, Royal Holloway, University of London, 1997.
D. L. Swets and J. Weng. Using discriminant Eigenfeatures for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18:831 -- 836, 1996.
F. Takens. Detecting strange attractors in fluid turbulence. In D. Rand and L. S. Young, editors, Dynamical Systems and Turbulence, pages 366-381. Springer-Verlag, Berlin, 1981.
M. Talagrand. Sharper bounds for gaussian and empirical processes. Annals of Probability, 22:28-76, 1994.
M. Talagrand. The Glivenko--Cantelli problem, ten years later. Journal of Theoretical Probability, 9(2):371-384, 1996.
W. Thomas. Database marketing: Dual approach outdoes response modeling. Database Marketing News, page 26, June 1996.
R. Tibshirani. Regression selection and shrinkage via the lasso. Technical report, Department of Statistics, University of Toronto, June 1994. ftp://utstat.toronto.edu/pub/tibs/lasso.ps.
A. N. Tikhonov and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, Washington, D.C., 1977.
R. Vanderbei. LOQO: An interior point code for quadratic programming. Technical Report SOR 94-15, Princeton University, 1994.
R. J. Vanderbei. LOQO user's manual -- version 3.10. Technical Report SOR-97-08, Princeton University, Statistics and Operations Research, 1997. Code available at http://www.princeton.edu/~rvdb/.
V. Vapnik and A. Chervonenkis. A note on one class of perceptrons. Automation and Remote Control, 25, 1964.
V. Vapnik and A. Chervonenkis. Uniform convergence of frequencies of occurence of events to their probabilities. Dokl. Akad. Nauk SSSR, 181:915 -- 918, 1968.
V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264-280, 1971.
V. Vapnik and A. Chervonenkis. Theory of Pattern Recognition [in Russian]. Nauka, Moscow, 1974. (German Translation: W. Wapnik & A. Tscherwonenkis, Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979).
V. Vapnik and A. Chervonenkis. Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory of Probability and its Applications, 26(3):532-553, 1981.
V. Vapnik and A. Chervonenkis. The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3):283-305, 1991.
V. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24, 1963.
V. Vapnik, E. Levin, and Y. Le Cun. Measuring the VC-dimension of a learning machine. Neural Computation, 6(5):851-876, 1994.
V. Vapnik, S. Golowich, and A. Smola. Support vector method for function approximation, regression estimation, and signal processing. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281-287, Cambridge, MA, 1997. MIT Press.
V. Vapnik. Estimation of Dependences Based on Empirical Data [in Russian]. Nauka, Moscow, 1979. (English translation: Springer Verlag, New York, 1982).
V. Vapnik. Inductive principles of statistics and learning theory. In P. Smolensky, M. C. Mozer, and D. E. Rumelhart, editors, Mathematical Perspectives on Neural Networks. Lawrence Erlbaum, Mahwah, NJ, 1995.
V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.
V. Vapnik. Structure of statistical learning theory. In A. Gammerman, editor, Computational and Probabalistic Reasoning, chapter 1. Wiley, Chichester, 1996.
V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. forthcoming.
T. Vetter, T. Poggio, and H. Bülthoff. The importance of symmetry and virtual views in three-dimensional object recognition. Current Biology, 4:18 -- 23, 1994.
M. Vidyasagar. A Theory of Learning and Generalization. Springer, New York, 1997.
N. Ya. Vilenkin. Special Functions and the Theory of Group Representations, volume 22 of Translations of Mathematical Monographs. American Mathematical Society Press, Providence, NY, 1968.
M. Villalobos and G. Wahba. Inequality constrained multivariate smoothing splines with application to the estimation of posterior probabilities. J. Am. Statist. Assoc., 82:239-248, 1987.
G. Wahba, Y. Wang, C. Gu, R. Klein, and B. Klein. Structured machine learning for `soft' classification with smoothing spline ANOVA and stacked tuning, testing and evaluation. In J. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 415-422. Morgan Kauffman, 1994.
G. Wahba, D. Johnson, F. Gao, and J. Gong. Adaptive tuning of numerical weather prediction models: randomized GCV in three and four dimensional data assimilation. Mon. Wea. Rev., 123:3358-3369, 1995.
G. Wahba, Y. Wang, C. Gu, R. Klein, and B. Klein. Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. Ann. Statist., 23:1865-1895, 1995.
G. Wahba. Convergence rates of certain approximate solutions to Fredholm integral equations of the first kind. Journal of Approximation Theory, 7:167 -- 185, 1973.
G. Wahba. Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. Roy. Stat. Soc. Ser. B, 40:364-372, 1978.
G. Wahba. Spline interpolation and smoothing on the sphere. SIAM J. Sci. Stat. Comput., 2:5-16, 1981.
G. Wahba. Constrained regularization for ill posed linear operator equations, with applications in meteorology and medicine. In S. Gupta and J. Berger, editors, Statistical Decision Theory and Related Topics, III, Vol.2, pages 383-418. Academic Press, 1982.
G. Wahba. Erratum: Spline interpolation and smoothing on the sphere. SIAM J. Sci. Stat. Comput., 3:385-386, 1982.
G. Wahba. Erratum: Spline interpolation and smoothing on the sphere. SIAM J. Sci. Stat. Comput., 3:385-386, 1982.
G. Wahba. A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann. Statist., 13:1378-1402, 1985.
G. Wahba. Multivariate thin plate spline smoothing with positivity and other linear inequality constraints. In E. Wegman and D. dePriest, editors, Statistical Image Processing and Graphics, pages 275-290. Marcel Dekker, 1985.
G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
G. Wahba. Multivariate function and operator estimation, based on smoothing splines and reproducing kernels. In M. Casdagli and S. Eubank, editors, Nonlinear Modeling and Forecasting, SFI Studies in the Sciences of Complexity, Proc. Vol XII, pages 95-112. Addison-Wesley, 1992.
G. Wahba. Support vector machines, reproducing kernel hilbert spaces and the randomized GACV. Technical Report 984, Department of Statistics, University of Wisconsin, Madison, 1997. NIPS 97 Workshop on Support Vector Machines.
G. Wahba. The bias-variance tradeoff and the randomized GACV. In D. A. Cohn M. S. Kearns, S. A. Solla, editor, Advances in Neural Information Processing Systems, volume 11. MIT Press, Cambridge, MA, 1999. To appear.
G. Wahba. Support vector machines, reproducing kernel hilbert spaces and the randomized GACV. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods --- Support Vector Learning, pages 69-88, Cambridge, MA, 1999. MIT Press.
T. L. H. Watkin, A. Rau, and M. Biehl. The statistical mechanics of learning a rule. Reviews of Modern Physics, 65:499-556, 1993.
T.L.H. Watkin, A. Rau, and M. Biehl. The statistical mechanics of learning a rule. Reviews of Modern Physics, 65:499-556, 1993.
T. Watkin. Optimal learning with a neural network. Europhysics Letters, 21:871, 1993.
A. S. Weigend and N. A. Gershenfeld (Eds.). Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, 1994. Santa Fe Institute Studies in the Sciences of Complexity.
P. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard, 1974.
J. Werner. Optimization - Theory and Applications. Vieweg, 1984.
J. Weston and C. Watkins. Multi-class support vector machines. Technical Report CSD-TR-98-04, Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX, UK, 1998.
J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, and C. Watkins. Density estimation using support vector machines. Technical Report CSD-TR-97-23, Royal Holloway, University of London, 1997.
H. Widom. Asymptotic behaviour of eigenvalues of certain integral operators. Archive for Rational Mechanics and Analysis, 17:215-229, 1964.
R. A. Wilkinson, J. Geist, S. Janet, P. J. Grother, C. J. C. Burges, R. Creecy, B. Hammond, J. J. Hull, N. J. Larsen, T. P. Vogl, and C. L. Wilson. The first census optical character recognition system conference. Technical Report NISTIR 4912, National Institute of Standards and Technology (NIST), Gaithersburg, 1992.
C. K. I. Williams. Computation with infinite networks. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 295-301, Cambridge, MA, 1997. MIT Press.
C. K. I. Williams. Prediction with gaussian processes: From linear regression to linear prediction and beyond. In M. I. Jordan, editor, Learning and Inference in Graphical Models. Kluwer, 1998. To appear. Also: Technical Report NCRG/97/012, Aston University.
R. Williamson, A. Smola, and B. Schölkopf. Entropy numbers, operators and support vector kernels. In Advances in Neural Information Processing Systems 11, 1998. Submitted.
R. C. Williamson, A. J. Smola, and B. Schölkopf. Generalization bounds for regularization networks and support vector machines via entropy numbers of compact operators. IEEE Transactions on Information Theory, 1998. Submitted.
R. C. Williamson, A. J. Smola, and B. Schölkopf. Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. NeuroCOLT Technical Report NC-TR-98-019, Royal Holloway College, University of London, UK, 1998.
R. C. Williamson, A. J. Smola, and B. Schölkopf. A Maximum Margin Miscellany. Typescript, March 1998.
W. H. Wolberg and O. L. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences,U.S.A., 87:9193-9196, 1990.
A. Wolf, J. B. Swift, H. L. Swinney, and J. A. Vastano. Determining lyapunov exponents from a time series. Physica D, 16:285-317, 1985.
P. Wolfe. A duality theorem for nonlinear programming. Quarterly of Applied Mathematics, 19:239-244, 1961.
D. Xiang and G. Wahba. A generalized approximate cross validation for smoothing splines with non-Gaussian data. Statistica Sinica, 6:675-692, 1996.
D. Xiang and G. Wahba. Approximate smoothing spline methods for large data sets in the binary case. Technical Report 982, Department of Statistics, University of Wisconsin, Madison WI, 1997. To appear in the Proceedings of the 1997 ASA Joint Statistical Meetings, Biometrics Section, pp 94-98 (1998).
D. Xiang. Model Fitting and Testing for Non-Gaussian Data with a Large Data Set. PhD thesis, Technical Report 957, University of Wisconsin-Madison, Madison WI, 1996.
P. V. Yee. Regularized Radial Basis Function Netowrks: Theory and Applications to Probability Estimation, Classification, and Time Series Prediction. PhD thesis, Dept. of ECE, McMaster University, Hamilton, Canada, 1998.
A. Yuille and N. Grzywacz. The motion coherence theory. In Proceedings of the International Conference on Computer Vision, pages 344-354, Washington, D.C., December 1988. IEEE Computer Society Press.
E. C. Zachmanoglou and D. W. Thoe. Introduction to Partial Differential Equations with Applications. Dover, Mineola, N.Y., 1986.
X. Zhang and J. Hutchinson. Simple architectures on fast machines: practical issues in nonlinear time series prediction. In A. S. Weigend and N. A. Gershenfeld, editors, Time Series Prediction: Forecasting the Future and Understanding the Past. Santa Fe Institute, Addison-Wesley, 1994.
G. Zoutendijk. Methods of Feasible Directions: a Study in Linear and Non-line ar Programming. Elsevier, 1970.