## Posts

### DNN Training with Frank–Wolfe [research]

*TL;DR: This is an informal discussion of our recent paper Deep Neural Network Training with Frank–Wolfe by Sebastian Pokutta, Christoph Spiegel, and Max Zimmer, where we study the general efficacy of using Frank–Wolfe methods for the training of Deep Neural Networks with constrained parameters. Summarizing the results, we (1) show the general feasibility of this markedly different approach for first-order based training of Neural Networks, (2) demonstrate that the particular choice of constraints can have a drastic impact on the learned representation, and (3) show that through appropriate constraints one can achieve performance exceeding that of unconstrained stochastic Gradient Descent, matching state-of-the-art results relying on $L^2$-regularization.*### Projection-Free Adaptive Gradients for Large-Scale Optimization [research]

*TL;DR: This is an informal summary of our recent paper Projection-Free Adaptive Gradients for Large-Scale Optimization by Cyrille Combettes, Christoph Spiegel, and Sebastian Pokutta. We propose to improve the performance of state-of-the-art stochastic Frank-Wolfe algorithms via a better use of first-order information. This is achieved by blending in adaptive gradients, a method for setting entry-wise step-sizes that automatically adjust to the geometry of the problem. Computational experiments on convex and nonconvex objectives demonstrate the advantage of our approach.*### Accelerating Domain Propagation via GPUs [research]

*TL;DR: This is an informal discussion of our recent paper Accelerating Domain Propagation: an Efficient GPU-Parallel Algorithm over Sparse Matrices by Boro Sofranac, Ambros Gleixner, and Sebastian Pokutta. In the paper, we present a new algorithm to perform domain propagation of linear constraints on GPUs efficiently. The results show that efficient implementations of Mixed-integer Programming (MIP) methods are possible on GPUs, even though the success of using GPUs in MIPs has traditionally been limited. Our algorithm is capable of performing domain propagation on the GPU exclusively, without the need for synchronization with the CPU, paving the way for the usage of this algorithm in a new generation of MIP methods that run on GPUs.*### Join CO@Work and EWG-POR – online and for free! [news]

*TL;DR: Announcement for CO@WORK and EWG-POR. Fully online and participation is free.*### Projection-Free Optimization on Uniformly Convex Sets [research]

*TL;DR: This is an informal summary of our recent paper Projection-Free Optimization on Uniformly Convex Sets by Thomas Kerdreux, Alexandre d’Aspremont, and Sebastian Pokutta. We present convergence analyses of the Frank-Wolfe algorithm in settings where the constraint sets are uniformly convex. Our results generalize different analyses of [P], [DR], [D], and [GH] when the constraint sets are strongly convex. For instance, the $\ell_p$ balls are uniformly convex for all $p > 1$, but strongly convex for $p\in]1,2]$ only. We show in these settings that uniform convexity of the feasible region systematically induces accelerated convergence rates of the Frank-Wolfe algorithm (with short steps or exact line-search). This shows that the Frank-Wolfe algorithm is not just adaptive to the sharpness of the objective [KDP] but also to the feasible region.*### Second-order Conditional Gradient Sliding [research]

*TL;DR: This is an informal summary of our recent paper Second-order Conditional Gradient Sliding by Alejandro Carderera and Sebastian Pokutta, where we present a second-order analog of the Conditional Gradient Sliding algorithm [LZ] for smooth and strongly-convex minimization problems over polytopes. The algorithm combines Inexact Projected Variable-Metric (PVM) steps with independent Away-step Conditional Gradient (ACG) steps to achieve global linear convergence and local quadratic convergence in primal gap. The resulting algorithm outperforms other projection-free algorithms in applications where first-order information is costly to compute.*### On the unreasonable effectiveness of the greedy algorithm [research]

*TL;DR: This is an informal summary of our recent paper On the Unreasonable Effectiveness of the Greedy Algorithm: Greedy Adapts to Sharpness with Mohit Singh, and Alfredo Torrico, where we adapt the sharpness concept from convex optimization to explain the effectiveness of the greedy algorithm for submodular function maximization.*### An update on SCIP [news]

*TL;DR: A quick update on what is on the horizon for SCIP.*### Psychedelic Style Transfer [research]

*TL;DR: We point out how to make psychedelic animations from discarded instabilities in neural style transfer. This post builds upon a remark we made in our recent paper Interactive Neural Style Transfer with Artists. In this paper, we questioned several simple evaluation aspects of neural style transfer methods. Also, it is our second series of interactive painting experiments where style transfer outputs constantly influence a painter, see the other series here. See also our medium post.*### Boosting Frank-Wolfe by Chasing Gradients [research]

*TL;DR: This is an informal summary of our recent paper Boosting Frank-Wolfe by Chasing Gradients by Cyrille Combettes and Sebastian Pokutta, where we propose to speed-up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient. This is achieved by chasing the negative gradient direction in a matching pursuit-style, while still remaining projection-free. Although the idea is reasonably natural, it produces very significant results.*### Non-Convex Boosting via Integer Programming [research]

*TL;DR: This is an informal summary of our recent paper IPBoost – Non-Convex Boosting via Integer Programming with Marc Pfetsch, where we present a non-convex boosting procedure that relies on integer programing. Rather than solving a convex proxy problem, we solve the actual classification problem with discrete decisions. The resulting procedure achieves performance at par or better than Adaboost however it is robust to label noise that can defeat convex potential boosting procedures.*### Approximate Carathéodory via Frank-Wolfe [research]

*TL;DR: This is an informal summary of our recent paper Revisiting the Approximate Carathéodory Problem via the Frank-Wolfe Algorithm with Cyrille W Combettes. We show that the Frank-Wolfe algorithm constitutes an intuitive and efficient method to obtain a solution to the approximate Carathéodory problem and that it also provides improved cardinality bounds in particular scenarios.*### SCIP x Raspberry Pi: SCIP on Edge [random]

*TL;DR: Running SCIP on a Raspberry Pi 4 with relatively moderate performance losses (compared to a standard machine) of a factor of 3-5 brings Integer Programming into the realm of Edge Computing.*### Universal Portfolios: how to (not) get rich [research]

*TL;DR: How to (not) get rich? Running Universal Portfolios online with Online Convex Optimization techniques.*### Toolchain Tuesday No. 6 [random]

*TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. This time around will be about privacy tools. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.*### Conditional Gradients and Acceleration [research]

*TL;DR: This is an informal summary of our recent paper Locally Accelerated Conditional Gradients [CDP] with Alejandro Carderera and Jelena Diakonikolas, showing that although optimal global convergence rates cannot be achieved for Conditional Gradients [CG], acceleration can be achieved after a burn-in phase independent of the accuracy $\epsilon$, giving rise to asymptotically optimal rates, accessing the feasible region only through a linear minimization oracle.*### Cheat Sheet: Acceleration from First Principles [research]

*TL;DR: Cheat Sheet for a derivation of acceleration from optimization first principles.*### Blended Matching Pursuit [research]

*TL;DR: This is an informal summary of our recent paper Blended Matching Pursuit with Cyrille W. Combettes, showing that the blending approach that we used earlier for conditional gradients can be carried over also to the Matching Pursuit setting, resulting in a new and very fast algorithm for minimizing convex functions over linear spaces while maintaining sparsity close to full orthogonal projection approaches such as Orthogonal Matching Pursuit.*### Sharpness and Restarting Frank-Wolfe [research]

*TL;DR: This is an informal summary of our recent paper Restarting Frank-Wolfe with Alexandre D’Aspremont and Thomas Kerdreux, where we show how to achieve improved convergence rates under sharpness through restarting Frank-Wolfe algorithms.*### Cheat Sheet: Subgradient Descent, Mirror Descent, and Online Learning [research]

*TL;DR: Cheat Sheet for non-smooth convex optimization: subgradient descent, mirror descent, and online learning. Long and technical.*### Mixing Frank-Wolfe and Gradient Descent [research]

*TL;DR: This is an informal summary of our recent paper Blended Conditional Gradients with Gábor Braun, Dan Tu, and Stephen Wright, showing how mixing Frank-Wolfe and Gradient Descent gives a new, very fast, projection-free algorithm for constrained smooth convex minimization.*### The Zeroth World [random]

*TL;DR: On the impact of AI on society and economy and its potential to enable a zeroth world with unprecedented economic output.*### Toolchain Tuesday No. 5 [random]

*TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.*### Cheat Sheet: Smooth Convex Optimization [research]

*TL;DR: Cheat Sheet for smooth convex optimization and analysis via an idealized gradient descent algorithm. While technically a continuation of the Frank-Wolfe series, this should have been the very first post and this post will become the Tour d’Horizon for this series. Long and technical.*### Toolchain Tuesday No. 4 [random]

*TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.*### Emulating the Expert [research]

*TL;DR: This is an informal summary of our recent paper An Online-Learning Approach to Inverse Optimization with Andreas Bärmann, Alexander Martin, and Oskar Schneider, where we show how methods from online learning can be used to learn a hidden objective of a decision-maker in the context of Mixed-Integer Programs and more general (not necessarily convex) optimization problems.*### Toolchain Tuesday No. 3 [random]

*TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.*### Cheat Sheet: Hölder Error Bounds for Conditional Gradients [research]

*TL;DR: Cheat Sheet for convergence of Frank-Wolfe algorithms (aka Conditional Gradients) under the Hölder Error Bound (HEB) condition, or how to interpolate between convex and strongly convex convergence rates. Continuation of the Frank-Wolfe series. Long and technical.*### Toolchain Tuesday No. 2 [random]

### Cheat Sheet: Linear convergence for Conditional Gradients [research]

*TL;DR: Cheat Sheet for linearly convergent Frank-Wolfe algorithms (aka Conditional Gradients). What does linear convergence mean for Frank-Wolfe and how to achieve it? Continuation of the Frank-Wolfe series. Long and technical.*### Training Neural Networks with LPs [research]

*TL;DR: This is an informal summary of our recent paper Principled Deep Neural Network Training through Linear Programming with Dan Bienstock and Gonzalo Muñoz, where we show that the computational complexity of approximate Deep Neural Network training depends polynomially on the data size for several architectures by means of constructing (relatively) small LPs.*### Toolchain Tuesday No. 1 [random]

### Cheat Sheet: Frank-Wolfe and Conditional Gradients [research]

*TL;DR: Cheat Sheet for Frank-Wolfe and Conditional Gradients. Basic mechanics and results; this is a rather long post and the start of a series of posts on this topic.*### Tractability limits of small treewidth [research]

*TL;DR: This is an informal summary of our recent paper New Limits of Treewidth-based tractability in Optimization with Yuri Faenza and Gonzalo Muñoz, where we provide almost matching lower bounds for extended formulations that exploit small treewidth to obtain smaller formulations. We also show that treewidth in some sense is the only graph-theoretic notion that appropriately captures sparsity and tractability in a broader algorithmic setting.*### On the relevance of AI and ML research in academia [random]

*TL;DR: Is AI and ML research in academia relevant and necessary? Yes.*### Collaborating online, in real-time, with math-support and computations [random]

*TL;DR: Using atom + teletype + markdown as real-time math collaboration environment.*

subscribe via RSS