One trivial observation at a time

Posts

Jul 9, 2026
Neural Concept Verifiers: Proving with Concepts, not Pixels [research]
TL;DR: This is a short summary of our paper Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings by Berkant Turan, Suhrab Asadulla, David Steinmann, Kristian Kersting, Wolfgang Stammer, and Sebastian Pokutta. The paper was accepted as a spotlight at ICML 2026 (top ~2.2% of submissions). In a nutshell, we move Prover-Verifier Games from pixel space to concept space: a prover selects a sparse set of human-readable concepts, a verifier must classify using only those concepts, and an adversarial prover tests whether the verifier can be fooled. The result is a scalable route to verifiable, nonlinear concept-based classification on datasets such as CLEVR-Hans, CIFAR-100, ImageNet-1k, and COCOLogic.
Jun 12, 2026
The Zeroth World: A Seven-Year Update [random]
TL;DR: Seven years ago I argued that AI could enable a zeroth world: economies operating at a multiple of first-world productivity, the way the first world operates at a multiple of the third. The update: the technology arrived faster than I expected, the productivity gains are real but unevenly distributed, and the zeroth world is being built right now, in the US and in China. Europe is perfectly on track to watch from the sidelines, which is particularly bitter because, demographically, Europe needs it more than anyone else. The newest twist: frontier cyber-defense capability is now allocated by invitation and Europe is mostly not on the list.
May 14, 2026
The Hidden Cost of Tokenization [research]
TL;DR: This is a short summary of our position paper The Hidden Cost of Tokenization: Why (most) Non-English Speakers Pay More for Less by Jennifer Haase and Sebastian Pokutta. The basic point is simple: tokenization is not a neutral preprocessing step. It determines how much users pay, how much context they get, how much compute is burned, and potentially how well a model can reason in a language. The same semantic content can require 1.3x, 5x, or even more than 10x as many tokens depending on the language-tokenizer pairing.
Apr 23, 2026
Not every discovery needs an LLM [research]
TL;DR: AI-driven scientific discovery with systems such as AlphaTensor, AlphaEvolve, etc is en vogue and we are also heavily invested in that space. However, these LLM- and RL-driven approaches, while impressive often in themselves, are not always the right tool to get the job done. In this post I will talk about three recent projects from our group that take different angles on LLM- and RL-driven mathematical discovery systems. Two of them revisit classical problems made famous through these AI systems and show that classical structured search matches or beats them on these specific problem classes; with a tiny fraction of the compute. The third example goes the other way around: AI not as discovery engine, but an AI-accelerated subroutine sits inside a classical structured search and lets us settle a thirty-year-old conjecture on real algebraic plane curves of degree seven. This points to an interesting shift in the recent AI vs. classical discussion: it is not so much about whether or not to use AI, but where in the stack AI belongs.
Apr 13, 2026
An uncommon approach to lower bounds for Frank-Wolfe on strongly convex sets [research]
TL;DR: This is a short summary of our paper Lower Bounds for Frank-Wolfe on Strongly Convex Sets by Jannis Halbey, Daniel Deza, Max Zimmer, Christophe Roux, Bartolomeo Stellato, and Sebastian Pokutta. We prove a matching $\Omega(1/\sqrt{\varepsilon})$ lower bound for Frank-Wolfe on strongly convex sets, showing that Garber and Hazan’s 2015 upper bound is tight. The construction of the lower bound deviates from the standard route quite a bit: instead of searching for worst-case initializations, we build them backward from the optimum.
Mar 18, 2026
The Agentic Researcher [research]
TL;DR: This is a summary of our recent paper The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning by Max Zimmer, Nico Pelleriti, Christophe Roux, and Sebastian Pokutta augmented with some personal perspectives and thoughts. The main point is not that AI can now “do research” in some vague science-fiction sense. The more useful point is that, with the right workflow, general CLI coding agents can already act like research associates: they can write proofs, formulate conjectures, implement ideas, run experiments, document failures, verify intermediate claims, and keep going for hours, while the researcher remains responsible for idea generation, creativity, direction, judgment, and final verification.
Mar 10, 2026
The Box You Can't Open Twice: Newcomb's Paradox Through Four Mathematical Lenses [random]
TL;DR: Newcomb’s paradox — should you take one box or two? — splits rational decision-makers almost evenly. There are four natural mathematical frameworks (causal inference, algorithmic self-reference, statistical counterfactuals, and online learning) that give different answers, and the disagreement reveals deep structural tensions in what it means to choose rationally.
Mar 6, 2026
Do LLM Outputs Mirror Their Internal Semantic Maps? A Large-Scale Behavioral Probing Study [research]
TL;DR: How faithfully does an LLM’s text output reflect the semantic geometry encoded in its hidden states? Forced-choice behavioral probing recovers substantially more internal similarity structure than open-ended generation, and behavioral features improve prediction of unseen hidden-state similarities above lexical and cross-model baselines.
Dec 15, 2025
Between Theory and Reality: How Schools Grapple with Heterogeneity and Where AI Fits [research]
TL;DR: Rising classroom heterogeneity and workload make AI in schools inevitable; with FACET we explore how evidence-based, teacher-centered AI can support meaningful differentiation and AI literacy without replacing human judgment.
Dec 2, 2025
SCIP Optimization Suite 10.0: Exact Solving, Better Decompositions, and a More Productive Ecosystem [research]
TL;DR: SCIP Optimization Suite 10.0 brings a numerically exact solving mode for rational MILPs, noticeable performance gains for MILP/MINLP, stronger presolving and symmetry handling, better heuristics and conflict analysis, IIS detection, and major updates to GCG, PaPILO, PySCIPOpt, and MIP-DD.
Oct 14, 2025
2025 Nobel Prize in Economics: Innovation, Creative Destruction, and Sustainable Growth — and What It Means for Germany [random]
TL;DR: The 2025 Nobel Prize in Economics honors Joel Mokyr, Philippe Aghion, and Peter Howitt for explaining how innovation drives sustained growth. Mokyr identifies the historical preconditions that allow innovation to accumulate; Aghion and Howitt formalize how creative destruction underpins modern growth. Together, their work clarifies why growth cannot be taken for granted—and what kinds of policies Germany now needs to secure its economic future.
Sep 28, 2025
Committing to Secrets via Hashing [random]
TL;DR: Cryptography is often thought of in terms of encryption (hiding messages) or signatures (proving authenticity). But there’s another fundamental building block that underpins many protocols, from digital lotteries to zero-knowledge proofs: the commitment. A commitment is a way to promise something now without revealing it, yet in such a way that you cannot change your mind later. It can be efficiently implemented via hashing.
Sep 2, 2025
Little’s Law and Conference Reviewing: the Queueing Perspective [random]
TL;DR: This is the queueing model perspective of the “paper pool” conference reviewing model with math and numbers based on Little’s Law. Think of it as supplementary material to the post on David’s blog on current ML/AI conferences; you might want to read that one first.
Aug 14, 2025
Why the hell does nobody build more affordable housing in Berlin?! [random]
TL;DR: High construction costs and interest rates create tight margins for new housing in Berlin, making affordable projects barely viable even without profits—minimum rents required often strain affordability, explaining parts of the building slowdown.
May 16, 2025
A New Default Open-Loop Step-Size for Frank-Wolfe? [research]
TL;DR: In our recent paper Adaptive Open-Loop Step-Sizes for Accelerated Convergence Rates of the Frank-Wolfe Algorithm with Elias Wirth and Javier Peña, we explore a new “log-adaptive” open-loop step-size for the Frank-Wolfe algorithm, $\eta_t = \frac{2 + \log(t+1)}{t+2 + \log(t+1)}$, which is adaptable both to favorable function properties and the feasible region in terms of convergence rates. In particular, it matches and often surpasses traditional fixed-parameter open-loop step-sizes across various settings, without needing prior knowledge of problem parameters, both in theory and computations.
Feb 10, 2025
FrankWolfe.jl: An Update on the Julia Package [research]
TL;DR: This blog is a brief update on recent developments in $\texttt{FrankWolfe.jl}$, our open-source Julia package for Frank-Wolfe and Conditional Gradient methods. The latest version introduces new algorithmic variants, improved step-size strategies, and expanded applications across constrained optimization problems. We also briefly discuss its integration with companion libraries and use cases in real-world problems.
Feb 3, 2025
Estimating Rare Probabilities [random]
TL;DR: When dealing with rare events (e.g., surgical fatalities) where no failures have been observed (so far), traditional statistical methods fail. This post explores practical approaches to estimate these probabilities.
Nov 19, 2024
On the Byzantine-Resilience of Distillation-Based Federated Learning [research]
TL;DR: This is an informal summary of our recent paper On the Byzantine-Resilience of Distillation-Based Federated Learning by Christophe Roux, Max Zimmer and Sebastian Pokutta. We analyze the byzantine robustness of FedDistill, a federated learning paradigm where clients share predictions on a public dataset instead of model parameters. Our findings reveal that FedDistill is remarkably resilient to byzantine attacks, in which clients send malicious predictions to disrupt the training process. We introduce two new, more effective attacks in this context and propose a novel defense mechanism to enhance the robustness of knowledge distillation-based FL. Additionally, we offer a general framework to obfuscate attacks, making them harder to detect.
Nov 8, 2024
The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses [research]
TL;DR: This post examines when backdoor-based watermarks, adversarial defenses, or transferable attacks are possible in machine learning. We demonstrate that for all learning tasks, at least one of these schemes exists: a watermark, an adversarial defense, or a transferable attack—the latter tied to cryptography and able to bypass defenses while mimicking legitimate data. For a deeper dive, see our recent paper on arXiv.
Oct 14, 2024
A short intro to Special Relativity [research]
TL;DR: This is a brief intro to special relativity for non-physicists.
Sep 10, 2024
A Secant Method Line Search for Frank-Wolfe algorithms [research]
TL;DR: This is a brief post about a line search based on the secant method for Frank-Wolfe algorithms. This line search is now available in the FrankWolfe.jl package.
Jul 28, 2024
Extending the Continuum of Six-Colorings [research]
TL;DR: This is an informal summary of our recent paper Extending the Continuum of Six-Colorings by Konrad Mundinger, Sebastian Pokutta, Christoph Spiegel, and Max Zimmer. We present two novel six-colorings of the Euclidean plane that avoid monochromatic pairs of points at unit distance in five colors and monochromatic pairs at another specified distance in the sixth color. Our results significantly expand the known range for these colorings, representing the first improvement in 30 years. The constructions were derived using a custom machine learning approach.
Jan 6, 2024
ZIB's Anniversary: Celebrating 40 Years of Innovation in Mathematics and Computer Science [news]
The Zuse Institute Berlin (ZIB) is celebrating its 40th anniversary in 2024. This is an informal heads-up. Stay tuned for more detailed information and save-the-dates on ZIB’s homepage in the weeks to come.
Aug 17, 2023
Accelerated and Sparse Algorithms for Approximate Personalized PageRank and Beyond [research]
TL;DR: This is an informal discussion of our recent paper Accelerated and Sparse Algorithms for Approximate Personalized PageRank and Beyond by David Martinez-Rubio, Elias Wirth, and Sebastian Pokutta. We answer the open question to the COLT 2022 open problem by Fountoulakis and Yang [FY] in the affirmative, presenting sparse accelerated optimization algorithms for the $\ell_1$-regularized personalized PageRank problem.
Aug 5, 2023
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging [research]
TL;DR: This is an informal summary of our recent paper Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging by Max Zimmer, Christoph Spiegel, and Sebastian Pokutta. Recent work [WM] demonstrated that generalization performance can be significantly improved by merging multiple models fine-tuned from the same base model. However, averaging sparse models leads to a decrease in overall sparsity, due to different sparse connectivities. We discovered that by exploring a single retraining phase of Iterative Magnitude Pruning (IMP) [HPTD] with various hyperparameters, we can create models suitable for averaging while retaining the same sparse connectivity from the pruned base model. Building on this idea, we introduce Sparse Model Soups (SMS), an IMP-variant where every prune-retrain cycle starts with the averaged model from the previous phase. SMS not only maintains the sparsity pattern but also offers full modularity and parallelizability, and substantially enhances the performance of IMP.
Jul 21, 2023
How I Learned to Stop Worrying and Love Retraining [research]
TL;DR: This is an informal summary of our recent ICLR2023 paper How I Learned to Stop Worrying and Love Retraining by Max Zimmer, Christoph Spiegel, and Sebastian Pokutta, in which we reassess Iterative Magnitude Pruning (IMP) [HPTD]. Recent works [RFC, LH] demonstrate how the learning rate schedule during retraining crucially influences recovery from pruning-induced model degradation. We extend these findings, proposing a linear learning rate schedule with an adaptively chosen initial value, which we find to significantly improve post-retraining performance. We also challenge commonly held beliefs of IMP’s performance and efficiency by introducing BIMP, a budgeted IMP-variant. Our study not only enhances understanding of the retraining phase, but it also questions the prevailing belief that we should strive to avoid the need for retraining.
Jul 4, 2023
Alternating Linear Minimization [research]
TL;DR: This is an informal overview of our recent paper Alternating Linear Minimization: Revisiting von Neumann’s alternating projections by Gábor Braun, Sebastian Pokutta, and Robert Weismantel where we replace the projections in von Neumann’s alternating projections algorithm with linear minimizations, which are much cheaper, while maintaining essentially the same convergence rate.
May 5, 2023
Improved local models and new Bell inequalities via Frank-Wolfe algorithms [research]
TL;DR: This is an informal overview of our recent paper Improved local models and new Bell inequalities via Frank-Wolfe algorithms by Sébastien Designolle, Gabriele Iommazzo, Mathieu Besançon, Sebastian Knebel, Patrick Gelß, and Sebastian Pokutta, where we use Frank-Wolfe algorithms to improve several non-locality constants.
Dec 29, 2022
Sh**t you can do with the euclidean norm [research]
TL;DR: Some of my favorite arguments all following from a simple expansion of the euclidean norm and averaging.
Nov 27, 2022
Monograph on Conditional Gradients and Frank-Wolfe methods [research]
TL;DR: Finally we finished our monograph on Conditional Gradients and Frank-Wolfe methods.
Aug 27, 2022
Boscia.jl - a new Mixed-Integer Convex Programming (MICP) solver [research]
TL;DR: This is an informal overview of our new Mixed-Integer Convex Programming (MICP) solver Julia package Boscia.jl and associated preprint Convex integer optimization with Frank-Wolfe methods by Deborah Hendrych, Hannah Troppens, Mathieu Besançon, and Sebastian Pokutta.
Jul 5, 2022
Acceleration of Frank-Wolfe algorithms with open loop step-sizes [research]
TL;DR: This is an informal discussion of our recent paper Acceleration of Frank-Wolfe algorithms with open loop step-sizes by Elias Wirth, Thomas Kerdreux, and Sebastian Pokutta. In the paper, we study accelerated convergence rates for the Frank-Wolfe algorithm (FW) with open loop step-size rules, characterize settings for which FW with open loop step-size rules is non-asymptotically faster than FW with line search or short-step, and provide a partial answer to an open question in kernel herding.
May 21, 2022
Pairwise Conditional Gradients without Swap Steps [research]
TL;DR: This is an informal summary of our recent article Sparser Kernel Herding with Pairwise Conditional Gradients without Swap Steps by Kazuma Tsuji, Ken’ichiro Tanaka, and Sebastian Pokutta, which was accepted at ICML 2022. In this article we present a modification of the pairwise conditional gradient algorithm that removes the dreaded swap steps. The resulting algorithm can even be applied to infinite dimensional feasible regions and promotes high levels of sparsity, which is useful in applications such as e.g., kernel herding.
May 7, 2022
Quantum Computing for the Uninitiated: The Basics [research]
TL;DR: Cheat Sheet for Quantum Computing. Target audience is non-physicists and no physics background required. This is the very first post in the series presenting the very basics to get started. Long and technical.
Feb 20, 2022
Conditional Gradients for the Approximately Vanishing Ideal [research]
TL;DR: This is an informal discussion of our recent paper Conditional Gradients for the Approximately Vanishing Ideal by Elias Wirth and Sebastian Pokutta. In the paper, we present a new algorithm, the Conditional Gradients Approximately Vanishing Ideal algorithm (CGAVI), for the construction of a set of generators of the approximately vanishing ideal of a finite data set $X \subseteq \mathbb{R}^n$. The novelty of our approach is that CGAVI constructs the set of generators by solving instances of convex optimization problems with the Pairwise Frank-Wolfe algorithm (PFW).
Dec 1, 2021
Fast algorithms for fair packing and its dual [research]
TL;DR: This is an informal summary of our recent article Fast Algorithms for Packing Proportional Fairness and its Dual by Francisco Criado, David Martínez-Rubio, and Sebastian Pokutta. In this article we present a distributed, accelerated and width-independent algorithm for the $1$-fair packing problem, which is the proportional fairness problem $\max_{x\in\mathbb{R}^n_{\geq 0}} \sum_i \log(x_i)$ under positive linear constraints, also known as the _packing proportional fairness_ problem. We improve over the previous best solution [DFO20] by means of acceleration. We also study the dual of this problem, and give a Multiplicative Weights (MW) based algorithm making use of the geometric particularities of the problem. Finally, we study a connection to the Yamnitsky-Levin simplices algorithm for general purpose linear feasibility and linear programming.
Oct 9, 2021
Simple steps are all you need [research]
TL;DR: This is an informal summary of our recent paper, to appear in NeurIPS’21, Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions by Alejandro Carderera, Mathieu Besançon, and Sebastian Pokutta, where we present a monotonous version of the Frank-Wolfe algorithm, which together with the simple step size $\gamma_t = 2/(2+t)$ achieves a $\mathcal{O}\left( 1/t \right)$ convergence in primal gap and Frank-Wolfe gap when minimizing Generalized Self-Concordant (GSC) functions over compact convex sets.
Jun 18, 2021
New(!!) NeurIPS 2021 competition: Machine Learning for Discrete Optimization (ML4CO) [news]
TL;DR: This year at NeurIPS 2021 there is brand new competition: improving integer programming solvers by machine learning. The learning problems are quite different than your usual suspects requiring non-standard learning approaches.
May 6, 2021
Learning to Schedule Heuristics in Branch and Bound [research]
TL;DR: This is an informal discussion of our recent paper Learning to Schedule Heuristics in Branch and Bound by Antonia Chmiela, Elias Khalil, Ambros Gleixner, Andrea Lodi, and Sebastian Pokutta. In this paper, we propose the first data-driven framework for scheduling heuristics in a MIP solver. By learning from data describing the performance of primal heuristics, we obtain a problem-specific schedule of heuristics that collectively find many solutions at minimal cost. We provide a formal description of the problem and propose an efficient algorithm for computing such a schedule.
Apr 20, 2021
FrankWolfe.jl: A high-performance and flexible toolbox for Conditional Gradients [research]
TL;DR: We present $\texttt{FrankWolfe.jl}$, an open-source implementation in Julia of several popular Frank-Wolfe and Conditional Gradients variants for first-order constrained optimization. The package is designed with flexibility and high-performance in mind, allowing for easy extension and relying on few assumptions regarding the user-provided functions. It supports Julia’s unique multiple dispatch feature, and interfaces smoothly with generic linear optimization formulations using $\texttt{MathOptInterface.jl}$.
Apr 3, 2021
Linear Bandits on Uniformly Convex Sets [research]
TL;DR: This is an informal summary of our recent paper Linear Bandit on Uniformly Convex Sets by Thomas Kerdreux, Christophe Roux, Alexandre d’Aspremont, and Sebastian Pokutta. We show that the strong convexity of the action set $\mathcal{K}\subset\mathbb{R}^n$ in the context of linear bandits leads to a gain of a factor of $\sqrt{n}$ in the pseudo-regret bounds. This improvement was previously known in only two settings: when $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$ [BCY]. When the action set is $q$-uniformly convex (with $q\geq 2$) but not necessarily strongly convex, we obtain pseudo-regret bounds of the form $\mathcal{O}(n^{1/q}T^{1/p})$ (with $1/p+1/q=1$), i.e., with a dimension dependency smaller than $\sqrt{n}$.
Jan 16, 2021
CINDy: Conditional gradient-based Identification of Non-linear Dynamics [research]
TL;DR: This is an informal summary of our recent paper CINDy: Conditional gradient-based Identification of Non-linear Dynamics – Noise-robust recovery by Alejandro Carderera, Sebastian Pokutta, Christof Schütte and Martin Weiser where we propose the use of a Conditional Gradient algorithm (more concretely the Blended Conditional Gradients [BPTW] algorithm) for the sparse recovery of a dynamic. In the presence of noise, the proposed algorithm presents superior sparsity-inducing properties, while ensuring a higher recovery accuracy, compared to other existing methods in the literature, most notably the popular SINDy [BPK] algorithm, based on a sequentially-thresholded least-squares approach.
Nov 11, 2020
DNN Training with Frank–Wolfe [research]
TL;DR: This is an informal discussion of our recent paper Deep Neural Network Training with Frank–Wolfe by Sebastian Pokutta, Christoph Spiegel, and Max Zimmer, where we study the general efficacy of using Frank–Wolfe methods for the training of Deep Neural Networks with constrained parameters. Summarizing the results, we (1) show the general feasibility of this markedly different approach for first-order based training of Neural Networks, (2) demonstrate that the particular choice of constraints can have a drastic impact on the learned representation, and (3) show that through appropriate constraints one can achieve performance exceeding that of unconstrained stochastic Gradient Descent, matching state-of-the-art results relying on $L^2$-regularization.
Oct 21, 2020
Projection-Free Adaptive Gradients for Large-Scale Optimization [research]
TL;DR: This is an informal summary of our recent paper Projection-Free Adaptive Gradients for Large-Scale Optimization by Cyrille Combettes, Christoph Spiegel, and Sebastian Pokutta. We propose to improve the performance of state-of-the-art stochastic Frank-Wolfe algorithms via a better use of first-order information. This is achieved by blending in adaptive gradients, a method for setting entry-wise step-sizes that automatically adjust to the geometry of the problem. Computational experiments on convex and nonconvex objectives demonstrate the advantage of our approach.
Sep 20, 2020
Accelerating Domain Propagation via GPUs [research]
TL;DR: This is an informal discussion of our recent paper Accelerating Domain Propagation: an Efficient GPU-Parallel Algorithm over Sparse Matrices by Boro Sofranac, Ambros Gleixner, and Sebastian Pokutta. In the paper, we present a new algorithm to perform domain propagation of linear constraints on GPUs efficiently. The results show that efficient implementations of Mixed-integer Programming (MIP) methods are possible on GPUs, even though the success of using GPUs in MIPs has traditionally been limited. Our algorithm is capable of performing domain propagation on the GPU exclusively, without the need for synchronization with the CPU, paving the way for the usage of this algorithm in a new generation of MIP methods that run on GPUs.
Aug 30, 2020
Join CO@Work and EWG-POR – online and for free! [news]
TL;DR: Announcement for CO@WORK and EWG-POR. Fully online and participation is free.
Jul 27, 2020
Projection-Free Optimization on Uniformly Convex Sets [research]
TL;DR: This is an informal summary of our recent paper Projection-Free Optimization on Uniformly Convex Sets by Thomas Kerdreux, Alexandre d’Aspremont, and Sebastian Pokutta. We present convergence analyses of the Frank-Wolfe algorithm in settings where the constraint sets are uniformly convex. Our results generalize different analyses of [P], [DR], [D], and [GH] when the constraint sets are strongly convex. For instance, the $\ell_p$ balls are uniformly convex for all $p > 1$, but strongly convex for $p\in]1,2]$ only. We show in these settings that uniform convexity of the feasible region systematically induces accelerated convergence rates of the Frank-Wolfe algorithm (with short steps or exact line-search). This shows that the Frank-Wolfe algorithm is not just adaptive to the sharpness of the objective [KDP] but also to the feasible region.
Jun 20, 2020
Second-order Conditional Gradient Sliding [research]
TL;DR: This is an informal summary of our recent paper Second-order Conditional Gradient Sliding by Alejandro Carderera and Sebastian Pokutta, where we present a second-order analog of the Conditional Gradient Sliding algorithm [LZ] for smooth and strongly-convex minimization problems over polytopes. The algorithm combines Inexact Projected Variable-Metric (PVM) steps with independent Away-step Conditional Gradient (ACG) steps to achieve global linear convergence and local quadratic convergence in primal gap. The resulting algorithm outperforms other projection-free algorithms in applications where first-order information is costly to compute.
Jun 3, 2020
On the unreasonable effectiveness of the greedy algorithm [research]
TL;DR: This is an informal summary of our recent paper On the Unreasonable Effectiveness of the Greedy Algorithm: Greedy Adapts to Sharpness with Mohit Singh, and Alfredo Torrico, where we adapt the sharpness concept from convex optimization to explain the effectiveness of the greedy algorithm for submodular function maximization.
May 15, 2020
An update on SCIP [news]
TL;DR: A quick update on what is on the horizon for SCIP.
Apr 9, 2020
Psychedelic Style Transfer [research]
TL;DR: We point out how to make psychedelic animations from discarded instabilities in neural style transfer. This post builds upon a remark we made in our recent paper Interactive Neural Style Transfer with Artists. In this paper, we questioned several simple evaluation aspects of neural style transfer methods. Also, it is our second series of interactive painting experiments where style transfer outputs constantly influence a painter, see the other series here. See also our medium post.
Mar 16, 2020
Boosting Frank-Wolfe by Chasing Gradients [research]
TL;DR: This is an informal summary of our recent paper Boosting Frank-Wolfe by Chasing Gradients by Cyrille Combettes and Sebastian Pokutta, where we propose to speed-up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient. This is achieved by chasing the negative gradient direction in a matching pursuit-style, while still remaining projection-free. Although the idea is reasonably natural, it produces very significant results.
Feb 13, 2020
Non-Convex Boosting via Integer Programming [research]
TL;DR: This is an informal summary of our recent paper IPBoost – Non-Convex Boosting via Integer Programming with Marc Pfetsch, where we present a non-convex boosting procedure that relies on integer programing. Rather than solving a convex proxy problem, we solve the actual classification problem with discrete decisions. The resulting procedure achieves performance at par or better than Adaboost however it is robust to label noise that can defeat convex potential boosting procedures.
Nov 30, 2019
Approximate Carathéodory via Frank-Wolfe [research]
TL;DR: This is an informal summary of our recent paper Revisiting the Approximate Carathéodory Problem via the Frank-Wolfe Algorithm with Cyrille W Combettes. We show that the Frank-Wolfe algorithm constitutes an intuitive and efficient method to obtain a solution to the approximate Carathéodory problem and that it also provides improved cardinality bounds in particular scenarios.
Sep 29, 2019
SCIP x Raspberry Pi: SCIP on Edge [random]
TL;DR: Running SCIP on a Raspberry Pi 4 with relatively moderate performance losses (compared to a standard machine) of a factor of 3-5 brings Integer Programming into the realm of Edge Computing.
Aug 29, 2019
Universal Portfolios: how to (not) get rich [research]
TL;DR: How to (not) get rich? Running Universal Portfolios online with Online Convex Optimization techniques.
Aug 19, 2019
Toolchain Tuesday No. 6 [random]
TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. This time around will be about privacy tools. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.
Jul 4, 2019
Conditional Gradients and Acceleration [research]
TL;DR: This is an informal summary of our recent paper Locally Accelerated Conditional Gradients [CDP] with Alejandro Carderera and Jelena Diakonikolas, showing that although optimal global convergence rates cannot be achieved for Conditional Gradients [CG], acceleration can be achieved after a burn-in phase independent of the accuracy $\epsilon$, giving rise to asymptotically optimal rates, accessing the feasible region only through a linear minimization oracle.
Jun 10, 2019
Cheat Sheet: Acceleration from First Principles [research]
TL;DR: Cheat Sheet for a derivation of acceleration from optimization first principles.
May 27, 2019
Blended Matching Pursuit [research]
TL;DR: This is an informal summary of our recent paper Blended Matching Pursuit with Cyrille W. Combettes, showing that the blending approach that we used earlier for conditional gradients can be carried over also to the Matching Pursuit setting, resulting in a new and very fast algorithm for minimizing convex functions over linear spaces while maintaining sparsity close to full orthogonal projection approaches such as Orthogonal Matching Pursuit.
May 2, 2019
Sharpness and Restarting Frank-Wolfe [research]
TL;DR: This is an informal summary of our recent paper Restarting Frank-Wolfe with Alexandre D’Aspremont and Thomas Kerdreux, where we show how to achieve improved convergence rates under sharpness through restarting Frank-Wolfe algorithms.
Feb 27, 2019
Cheat Sheet: Subgradient Descent, Mirror Descent, and Online Learning [research]
TL;DR: Cheat Sheet for non-smooth convex optimization: subgradient descent, mirror descent, and online learning. Long and technical.
Feb 18, 2019
Mixing Frank-Wolfe and Gradient Descent [research]
TL;DR: This is an informal summary of our recent paper Blended Conditional Gradients with Gábor Braun, Dan Tu, and Stephen Wright, showing how mixing Frank-Wolfe and Gradient Descent gives a new, very fast, projection-free algorithm for constrained smooth convex minimization.
Feb 6, 2019
The Zeroth World [random]
TL;DR: On the impact of AI on society and economy and its potential to enable a zeroth world with unprecedented economic output.
Dec 23, 2018
Toolchain Tuesday No. 5 [random]
TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.
Dec 7, 2018
Cheat Sheet: Smooth Convex Optimization [research]
TL;DR: Cheat Sheet for smooth convex optimization and analysis via an idealized gradient descent algorithm. While technically a continuation of the Frank-Wolfe series, this should have been the very first post and this post will become the Tour d’Horizon for this series. Long and technical.
Dec 4, 2018
Toolchain Tuesday No. 4 [random]
TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.
Nov 26, 2018
Emulating the Expert [research]
TL;DR: This is an informal summary of our recent paper An Online-Learning Approach to Inverse Optimization with Andreas Bärmann, Alexander Martin, and Oskar Schneider, where we show how methods from online learning can be used to learn a hidden objective of a decision-maker in the context of Mixed-Integer Programs and more general (not necessarily convex) optimization problems.
Nov 20, 2018
Toolchain Tuesday No. 3 [random]
TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.
Nov 12, 2018
Cheat Sheet: Hölder Error Bounds for Conditional Gradients [research]
TL;DR: Cheat Sheet for convergence of Frank-Wolfe algorithms (aka Conditional Gradients) under the Hölder Error Bound (HEB) condition, or how to interpolate between convex and strongly convex convergence rates. Continuation of the Frank-Wolfe series. Long and technical.
Oct 23, 2018
Toolchain Tuesday No. 2 [random]
TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.
Oct 19, 2018
Cheat Sheet: Linear convergence for Conditional Gradients [research]
TL;DR: Cheat Sheet for linearly convergent Frank-Wolfe algorithms (aka Conditional Gradients). What does linear convergence mean for Frank-Wolfe and how to achieve it? Continuation of the Frank-Wolfe series. Long and technical.
Oct 12, 2018
Training Neural Networks with LPs [research]
TL;DR: This is an informal summary of our recent paper Principled Deep Neural Network Training through Linear Programming with Dan Bienstock and Gonzalo Muñoz, where we show that the computational complexity of approximate Deep Neural Network training depends polynomially on the data size for several architectures by means of constructing (relatively) small LPs.
Oct 9, 2018
Toolchain Tuesday No. 1 [random]
TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.
Oct 5, 2018
Cheat Sheet: Frank-Wolfe and Conditional Gradients [research]
TL;DR: Cheat Sheet for Frank-Wolfe and Conditional Gradients. Basic mechanics and results; this is a rather long post and the start of a series of posts on this topic.
Sep 22, 2018
Tractability limits of small treewidth [research]
TL;DR: This is an informal summary of our recent paper New Limits of Treewidth-based tractability in Optimization with Yuri Faenza and Gonzalo Muñoz, where we provide almost matching lower bounds for extended formulations that exploit small treewidth to obtain smaller formulations. We also show that treewidth in some sense is the only graph-theoretic notion that appropriately captures sparsity and tractability in a broader algorithmic setting.
Sep 15, 2018
On the relevance of AI and ML research in academia [random]
TL;DR: Is AI and ML research in academia relevant and necessary? Yes.
Aug 20, 2018
Collaborating online, in real-time, with math-support and computations [random]
TL;DR: Using atom + teletype + markdown as real-time math collaboration environment.