# Upcoming ISE Seminars

This talk describes a new class of splitting methods for monotone operator problems developed by the author and P. Combettes. It can solve problems involving any finite number of operators and can operate in an asynchronous parallel manner. It has a unique feature for a decomposition algorithm: it does not need to visit every monotone operator at each iteration. The coordination step is based on the synchronous projective splitting technique developed by B. F. Svaiter and the author. One application of the method is an algorithm resembling the multi-block alternating direction method of multipliers (ADMM), but capable of highly asynchronous operation without restrictive assumptions on the problem or proximal parameters. Time permitting, we may also discuss an application to stochastic programming.

We present a branch-and-cut algorithm for solving discrete bilevel linear programs where the upper-level variables are binary and the lowerlevel variables are either pure integer or pure binary. This algorithm performs local search to find improved bilevel-feasible solutions. We strengthen the relaxed node subproblems in the branch-and-cut search tree by generating cuts to eliminate all of the upper level integer feasible solutions explored during the local search step. Furthermore, to avoid repeatedly solving similar bilevel integer problems in the local search, we derive structural properties of the value functions of bilevel integer programs. Most notably, we generalize the integer complementary slackness theorem to bilevel integer programs. We also propose three efficient approaches for computing the bilevel programming value functions. Our computational results show that the proposed branch-and-cut approach enhanced with the implementation of value functions and local search can solve discrete bilevel programming instances with up to 200 variables and 150 constraints in a reasonable amount of time.

We present a framework for a class of sequential decision-making problems in the context of max-min bilevel programming, where a leader and a follower repeatedly interact. At each period, the leader allocates resources to disrupt the performance of the follower (e.g., as in defender-attacker or interdiction problems), who in turn minimizes some cost function over a set of activities that depends on the leader’s decision. While the follower has complete knowledge of his problem, the leader has only partial information, and needs to learn about the cost parameters, available resources, and the follower’s activities from the feedback generated by the follower’s actions. We measure policy performance in terms of its time-stability, defined as the number of periods it takes for the leader to match the actions of an oracle with complete information. In particular, we propose a class of greedy and robust policies and show that these policies are weakly optimal, eventually match the oracle's actions, and provide a real-time certificate of optimality. We also study a lower bound on any policy performance based on the notion of a semi-oracle. Our numerical experiments demonstrate that the proposed policies consistently outperform reasonable benchmark, and perform fairly close to the semi-oracle. This is a joint work with Juan Borrero (Oklahoma State University) and Denis Saure (University of Chile).

For convex optimization problems deterministic first order methods have linear convergence provided that the objective function is smooth (Lipschitz continuous gradient) and strongly convex. Moreover, under the same conditions – smoothness and strong convexity – stochastic first order methods have sublinear convergence rates. However, in many applications (machine learning, statistics, control, signal processing) the smoothness/strong convexity conditions do not hold; but the objective function still has a special structure (e.g. composition of a strongly convex function with a linear map). In this talk we replace the smoothness/strong convexity assumptions with several other conditions, that are less conservative, for which we prove that several (stochastic) first order methods are converging linearly. We also provide necessary conditions for linear convergence of (stochastic) gradient method. Finally, we provide examples of several functional classes satisfying our new conditions and discuss several applications of these results (Lasso problem, linear systems, linear programming, convex feasibility, etc).

This technical talk will show live calculations in Mathematica 11 and other Wolfram technologies relevant to courses and research. Specific topics include:* Visualize data, functions, surfaces, and more in 2D or 3D* Store and share documents locally or in the Wolfram Cloud* Use the Predictive Interface to get suggestions for the next useful calculation or function options* Access trillions of bits of on-demand data* Easily turn static examples into mouse-driven, dynamic applications* Get deep support for specialized areas including machine learning, time series, image processing, parallelization, and control systems, and 3D printing with no add-ons required* Compute in the Wolfram CloudCurrent users will benefit from seeing the many improvements and new features of Mathematica 11 (https://www.wolfram.com/mathematica/new-in-11/), but prior knowledge of Mathematica is not required.

Empirical risk minimization (ERM) problems express optimal classifiers as solutions of optimization problems in which the objective is the sum of a very large number of sample costs. Established approaches to solve ERM rely on computing stochastic gradient directions by accessing a single summand at each iteration. Despite the efficiency of individual iterations, these methods can be slow to converge and have convergence rates that are linear at best. In this talk we discuss approaches to adapt Newton and quasi-Newton methods for ERM problems. In the incremental quasi-Newton method we exploit memory to store curvature approximation matrices. We show that these curvature approximations succeed in approximating the Hessian and thereby lead to superlinear convergence. In the Adaptive Newton method we consider subsets of training samples that are augmented geometrically by a factor of two. Each time the training set is augmented we perform a single Newton step. We show that it is possible to achieve statistical accuracy with just two passes over the dataset.

We present a model for clustering which combines two criteria: Given a collection of objects with pairwise similarity measure, the problem is to find a cluster that is as dissimilar as possible from the complement, while having as much similarity as possible within the cluster. The two objectives are combined either as a ratio or with linear weights. The ratio problem, and its linear weighted version, are solved by a combinatorial algorithm within the complexity of a single minimum s,t-cut algorithm. This problem (HNC) is closely related to the NP-hard problem of normalized cut that is often used in image segmentation and for which heuristic solutions are generated with the eigenvector technique (spectral method).

It is demonstrated that HNC is a form of relaxation of normalized cut and its generalization to "q-normalized cut". We study the relationship between this HNC relaxation and the spectral method and demonstrate a number of advantages for the combinatorial algorithm. These advantages include a better approximation, in practice, of the normalized cut objective for image segmentation benchmark problems.

HNC can be utilized as a supervised or unsupervised machine learning technique. It has been used for data mining, and its comparison to leading machine learning techniques on datasets selected from the UCI data mining benchmark and other benchmarks indicates that its use of pairwise comparisons is powerful in improving performance accuracy. HNC is currently the leading neuron segmentation algorithm in the Neurofinder benchmark for cell identification in calcium imaging movies. Time permitting, we will discuss our recently developed methods employed to make the family of HNC algorithms scalable for massive scale data sets.

Please join the ISE Department at the annual ISE Banquet! This year's cocktail reception & banquet will be held in the ASA Packer Dining Room, University Center. The Cocktail Reception will begin at 5:30 pm and then the banquet at 6:30 pm!

If you are interested in attending, please register here: https://www.eventville.com/catalog/eventregistration1.asp?eventid=1012138.

MOSEK is a software package for solving large scale sparse optimization problems. To be precise MOSEK is capable of solving linear, convex quadratic and conic quadratic optimization problems possibly having some integer constrained variables. In addition MOSEK can solve continuous semidefinite optimization problems.

In this presentation we will review what is new and improved in the recently released version 8. We will also present computational results that documents upgrading to MOSEK version 8 provides a genuine enhancement of the numerical stability and performance.

The results also documents that MOSEK is one of the best if not the best semidefinite optimizer.

Product and content personalization is now ubiquitous in e-commerce. Available transactional data is typically too sparse for this task. As such, companies today seek to use a variety of information on the interactions between a product and a customer to drive personalization decisions. We formalize this problem as one of recovering a large-scale matrix, with side information in the form of additional matrices of conforming dimension. Viewing the matrix we seek to recover and the side information we have as slices of a tensor, we consider the problem of Slice Recovery, which is to recover specific slices of ‘simple’ tensors from noisy observations of the entire tensor. We propose a definition of simplicity that on the one hand elegantly generalizes a standard generative model for our motivating problem, and on the other subsumes low-rank tensors for a variety of existing definitions of tensor rank. We provide an efficient algorithm for slice recovery that is practical for massive datasets and provides a significant performance improvement over state of the art incumbent approaches to tensor recovery. Further, we establish near-optimal recovery guarantees that in an important regime represent an order improvement over the best available results for this problem. Experiments on data from a music streaming service demonstrate the performance and scalability of our algorithm.