
Training Dynamics of Deep Networks using Stochastic Gradient Descent via Neural Tangent Kernel
Stochastic Gradient Descent (SGD) is widely used to train deep neural ne...
read it

Polynomial Convergence of Gradient Descent for Training OneHiddenLayer Neural Networks
We analyze Gradient Descent applied to learning a bounded target functio...
read it

How implicit regularization of Neural Networks affects the learned function – Part I
Today, various forms of neural networks are trained to perform approxima...
read it

Gradient Descent Learns Onehiddenlayer CNN: Don't be Afraid of Spurious Local Minima
We consider the problem of learning a onehiddenlayer neural network wi...
read it

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
Linear interpolation between initial neural network parameters and conve...
read it

Implicit regularization for deep neural networks driven by an OrnsteinUhlenbeck like process
We consider deep networks, trained via stochastic gradient descent to mi...
read it

A Comparative Analysis of the Optimization and Generalization Property of Twolayer Neural Network and Random Feature Models Under Gradient Descent Dynamics
A fairly comprehensive analysis is presented for the gradient descent dy...
read it
The Dynamics of Gradient Descent for Overparametrized Neural Networks
We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve zero training error and generalize well if the initial conditions are chosen appropriately. Here, through a Lyapunov analysis, we show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution subject to the condition that there is no training error when using the linear approximation to the neural network. To illustrate the application of this result, we show that the GD converges to a prediction function that generalizes well, thereby providing an alternative proof of the generalization results in Arora et al. (2019).
READ FULL TEXT
Comments
There are no comments yet.