Jeff M. Phillips
Assistant Professor


Book Chapter  Journal  Conference  Workshop  Tech Report]

Journal

2016

  • The Robustness of Estimator Composition. (Project Website)
    By Pingfan Tang,   Jeff M. Phillips,   
    Vol.abs/1609.01226, CoRR (CORR 2016),  2016.
    Abstract
  • Relative Error Embeddings for the Gaussian Kernel Distance. (Project Website)
    By Di Chen,   Jeff M. Phillips,   
    Vol.abs/1602.05350, CoRR (CORR 2016),  2016.
    Abstract
  • Approximate Distribution of L1 Median on Uncertain Data. (Project Website)
    By Jeff M. Phillips,   Pingfan Tang,   
    Vol.abs/1601.00630, CoRR (CORR 2016),  2016.
    Abstract
  • EIC Editorial. (Project Website)
    By Jian Pei,   Leman Akoglu,   Hongrae Lee,   Justin J. Levandoski,   Xuelong Li,   Rosa Meo,   Carlos Ordonez,   Jeff M. Phillips,   Barbara Poblete,   K. Selçuk Candan,   Meng Wang,   Ji-Rong Wen,   Li Xiong ,   Wenjie Zhang,   
    Vol.28, Pages 2535-2537, IEEE Trans. Knowl. Data Eng. (TKDE 2016),  2016.
    Abstract
  • 2015

  • Improved Practical Matrix Sketching with Guarantees
    By Mina Ghashami,    Amey Desai,    and Jeff M. Phillips
    Vol.28, Pages 1678--1690, Transactions on Knowledge and Data Engineering (IEEE TKDE),  Janurary,  2015.
    Abstract

    Matrices have become essential data representations for many large-scale problems in data analytics, and hence matrix sketching is a critical task. Although much research has focused on improving the error/size tradeoff under various sketching paradigms, the many forms of error bounds make these ap- proaches hard to compare in theory and in practice. This paper attempts to categorize and compare most known methods under row-wise streaming updates with provable guarantees, and then to tweak some of these methods to gain practical improvements while retaining guarantees. For instance, we observe that a simple heuristic iSVD, with no guarantees, tends to outperform all known approaches in terms of size/error trade-off. We modify the best performing method with guarantees FREQUENTDIRECTIONS under the size/error trade-off to match the performance of iSVD and retain its guarantees. We also demonstrate some adversarial datasets where iSVD performs quite poorly. In comparing techniques in the time/error trade-off, techniques based on hashing or sampling tend to perform better. In this setting we modify the most studied sampling regime to retain error guarantee but obtain dramatic improvements in the time/error trade-off. Finally, we provide easy replication of our studies on APT, a new testbed which makes available not only code and datasets, but also a computing platform with fixed environmental settings.

  • Frequent Directions: Simple and Deterministic Matrix Sketching.
    By Mina Ghashami,    Edo Liberty,    Jeff M. Phillips and David P. Woodruff
    Vol.0, SIAM Journal of Computing (SICOMP),  Janurary,  2015.
    Abstract

    We describe a new algorithm called Frequent Directions for deterministic matrix sketching in the row-updates model. The algorithm is presented an arbitrary input matrix A %u2208 Rn%uFFFDd one row at a time. It performed O(dl) operations per row and maintains a sketch matrix B in R^{l%uFFFDd} such that for any k < l ||A^TA %u2212 B^TB||_2 <= ||A%u2212A_k||^2_F/(l%u2212k) and ||A%u2212pi_{B_k}(A)||^2_F <= (1 k/(l-k)) ||A%u2212A_k||^2_F. Here, A_k stands for the minimizer of ||A %u2212 A_k||_F over all rank k matrices (similarly B_k ) and pi_{B_k}(A) is the rank k matrix resulting from projecting A on the row span of B_k. We show both of these bounds are the best possible for the space allowed. The summary is mergeable, and hence trivially parallelizable. Moreover, Frequent Directions outperforms exemplar implementations of existing streaming algorithms in the space-error tradeoff.

  • 2014

  • $ε$-Kernel Coresets for Stochastic Points. (Project Website)
    By Jian Li,   Jeff M. Phillips,   Haitao Wang,   
    Vol.abs/1411.0194, CoRR (CORR 2014),  2014.
    Abstract
  • 2013

  • (Approximate) Uncertain Skylines
    By Peyman Afshani,    Pankaj K. Agarwal,    Lars Arge,    Kasper Green Larsen and Jeff M. Phillips
    Vol.0, To Appear THEORY OF COMPUTING SYSTEMS (ICDT),  January,  2013.
    Abstract

    Given a set of points with uncertain locations, we consider the problem of computing the probability of each point lying on the skyline, that is, the probability that it is not dominated by any other input point. If each point%u2019s uncertainty is described as a probability distribution over a discrete set of locations, we improve the best known exact solution. We also suggest why we believe our solution might be optimal. Next, we describe simple, near-linear time approximation algorithms for computing the probability of each point lying on the skyline. In addition, some of our methods can be adapted to construct data structures that can efficiently determine the probability of a query point lying on the skyline.

  • Geometric Inference on Kernel Density Estimates. (Project Website)
    By Jeff M. Phillips,   Bei Wang,   Yan Zheng,   
    Vol.abs/1307.7760, CoRR (CORR 2013),  2013.
    Abstract
  • Relative Errors for Deterministic Low-Rank Matrix Approximations. (Project Website)
    By Mina Ghashami,   Jeff M. Phillips,   
    Vol.abs/1307.7454, CoRR (CORR 2013),  2013.
    Abstract
  • Rethinking Abstractions for Big Data: Why, Where, How, and What. (Project Website)
    By Mary W. Hall,   Robert M. Kirby,   Feifei Li ,   Miriah D. Meyer,   Valerio Pascucci,   Jeff M. Phillips,   Robert Ricci,   Jacobus E. van der Merwe,   Suresh Venkatasubramanian,   
    Vol.abs/1306.3295, CoRR (CORR 2013),  2013.
    Abstract
  • 2012

  • Chernoff-Hoeffding Inequality and Applications (Project Website)
    By Jeff M. Phillips,   
    Vol.abs/1209.6396, CoRR (CORR 2012),  2012.
    Abstract
  • Geometric Computations on Indecisive and Uncertain Points (Project Website)
    By Allan Jørgensen,   Maarten Löffler,   Jeff M. Phillips,   
    Vol.abs/1205.0273, CoRR (CORR 2012),  2012.
    Abstract
  • 2011

  • ε-Samples of Kernels (Project Website)
    By Jeff M. Phillips,   
    Vol.abs/1112.4105, CoRR (CORR 2011),  2011.
    Abstract
  • A Gentle Introduction to the Kernel Distance (Project Website)
    By Jeff M. Phillips,   Suresh Venkatasubramanian,   
    Vol.abs/1103.1625, CoRR (CORR 2011),  2011.
    Abstract
  • 2010

  • Matching Shapes Using the Current Distance (Project Website)
    By Sarang C. Joshi,   Raj Varma Kommaraju,   Jeff M. Phillips,   Suresh Venkatasubramanian,   
    Vol.abs/1001.0591, CoRR (CORR 2010),  2010.
    Abstract
  • A Unified Algorithmic Framework for Multi-Dimensional Scaling (Project Website)
    By Arvind Agarwal,   Jeff M. Phillips,   Suresh Venkatasubramanian,   
    Vol.abs/1003.0529, CoRR (CORR 2010),  2010.
    Abstract
  • 2009

  • Computing Hulls And Centerpoints In Positive Definite Space (Project Website)
    By P. Thomas Fletcher,   John Moeller,   Jeff M. Phillips,   Suresh Venkatasubramanian,   
    Vol.abs/0912.1580, CoRR (CORR 2009),  2009.
    Abstract
  • 2008

  • Algorithms for eps-approximations of Terrains (Project Website)
    By Jeff M. Phillips,   
    Vol.abs/0801.2793, CoRR (CORR 2008),  2008.
    Abstract
  • An Efficient Algorithm for 2D Euclidean 2-Center with Outliers (Project Website)
    By Pankaj K. Agarwal,   Jeff M. Phillips,   
    Vol.abs/0806.4326, CoRR (CORR 2008),  2008.
    Abstract
  • Conference

    2016

  • Scalable Spatial Scan Statistics through Sampling (Project Website), Talk
    By Michael Matheny,    Raghvendra Singh,    Kaiqiang Wang,    Liang Zhang and Jeff M. Phillips
    In Proceedings of ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL),  pages ??-??,  November,  2016.
    Abstract

    Finding anomalous regions within spatial data sets is a central task for biosurveillance, homeland security, policy making, and many other important areas. These communities have mainly settled on spatial scan statistics as a rigorous way to discover regions where a measured quantity (e.g., crime) is statistically significant in its difference from a baseline population. However, most common approaches are ineffi- cient and thus, can only be run with very modest data sizes (a few thousand data points) or make assumptions on the geographic distributions of the data. We address these challenges by designing, exploring, and analyzing sample-then-scan algorithms. These algorithms randomly sample data at two scales, one to define regions and the other to approximate the counts in these regions. Our experiments demonstrate that these algorithms are efficient and accurate independent of the size of the original data set, and our analysis explains why this is the case. For the first time, these sample-then-scan algorithms allow spatial scan statistics to run on a million or more data points without making assumptions on the spatial distribution of the data. Moreover, our experiments and analysis give insight into when it is appropriate to trust the various types of spatial anomalies when the data is modeled as a random sample from a larger but unknown data set.

  • Efficient Frequent Directions Algorithm for Sparse Matrices
    By Mina Ghashami,    Edo Liberty,    and Jeff M. Phillips
    In Proceedings of ACM Conference on Knowledge Discovery and Data Mining (KDD 2016),  pages ??-??,  Feburary,  2016.
    Abstract

    This paper describes Sparse Frequent Directions, a variant of Frequent Directions for sketch- ing sparse matrices. It resembles the original algorithm in many ways: both receive the rows of an input matrix A^{nd} one by one in the streaming setting and compute a small sketch B in R^{l x d}. Both share the same strong (provably optimal) asymptotic guarantees with respect to the space- accuracy tradeoff in the streaming setting. However, unlike Frequent Directions which runs in O(ndl) time regardless of the sparsity of the input matrix A, Sparse Frequent Directions runs in O~(nnz(A)*l nl^2) time. Our analysis loosens the dependence on computing the Singular Value Decomposition (SVD) as a black box within the Frequent Directions algorithm. Our bounds require recent results on the properties of fast approximate SVD computations. Finally, we empirically demonstrate that these asymptotic improvements are practical and significant on real and synthetic data.

  • The Robustness of Estimator Composition. (Project Website)
    By Pingfan Tang,   Jeff M. Phillips,   
    In Proceedings of NIPS (NIPS 2016),  pages 929-937,  2016.
    Abstract
  • epsilon-Kernel Coresets for Stochastic Points. (Project Website)
    By Lingxiao Huang,   Jian Li,   Jeff M. Phillips,   Haitao Wang,   
    In Proceedings of ESA (ESA 2016),  pages 50:1-50:18,  2016.
    Abstract
  • 2015

  • Streaming Kernel Principal Component Analysis
    By Mina Ghashami,    Daniel Perry,    and Jeff M. Phillips
    In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS),  pages ??-??,  December,  2015.
    Abstract

    Kernel principal component analysis (KPCA) provides a concise set of basis vectors which capture non-linear structures within large data sets, and is a central tool in data analysis and learning. To allow for non-linear relations, typically a full nn kernel matrix is constructed over n data points, but this requires too much space and time for large values of n. Techniques such as the Nystrom method and random feature maps can help towards this goal, but they do not explicitly maintain the basis vectors in a stream and take more space than desired. We propose a new approach for streaming KPCA which maintains a small set of basis elements in a stream, requiring space only logarithmic in n, and also improves the dependence on the error parameter. Our technique combines together random feature maps with recent advances in matrix sketching, it has guaranteed spectral norm error bounds with respect to the original kernel matrix, and it compares favorably in practice to state-of-the-art approaches.

  • Geometric Inference on Kernel Density Estimates. (Project Website)
    By Jeff M. Phillips,   Bei Wang,   Yan Zheng,   
    In Proceedings of Symposium on Computational Geometry (COMPGEOM 2015),  pages 857-871,  2015.
    Abstract
  • Subsampling in Smoothed Range Spaces. (Project Website)
    By Jeff M. Phillips,   Yan Zheng,   
    In Proceedings of ALT (ALT 2015),  pages 224-238,  2015.
    Abstract
  • L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data. (Project Website)
    By Yan Zheng,   Jeff M. Phillips,   
    In Proceedings of KDD (KDD 2015),  pages 1533-1542,  2015.
    Abstract
  • 2014

  • Relative Errors for Deterministic Low-Rank Matrix Approximations. (Project Website)
    By Mina Ghashami,   Jeff M. Phillips,   
    In Proceedings of SODA (SODA 2014),  pages 707-717,  2014.
    Abstract
  • 2013

  • Nearest Neighbor Searching Under Uncertainty II
    By Pankaj K. Agarwal,    Boris Aronov,    Sariel Har-Peled,    Jeff M. Phillips,    Ke Yi,    and Wuzhou Zhang
    (To Appear) In Proceedings of 32nd ACM Symposium on Principles of Database Systems (PODS),  pages ??-??,  June,  2013.
    Abstract

    Nearest-neighbor search (NN), which returns the nearest neighbor of a query point in a set of points, is an important and widely studied problem in many fields, and it has wide range of applications. In many of them, such as sensor databases, location-based services, face recognition, and mobile data, the location of data is imprecise. We therefore study nearest neighbor queries in a probabilistic framework in which the location of each input point is specified as a probability density function. We present efficient algorithms for (i) computing all points that are nearest neighbors of a query point with nonzero probability; (ii) estimating, within a specified additive error, the probability of a point being the nearest neighbor of a query point; (iii) using it to return the point that maximizes the probability being the nearest neighbor, or all the points with probabilities greater than some threshold to be the NN. We also present a few experimental results to demonstrate the effectiveness of our approach.

  • Range Counting Coresets for Uncertain Data
    By Amirali Abdullah,    Samira Daruki,    and Jeff M. Phillips
    (To Appear) In Proceedings of 29th Annual ACM Symposium on Computational Geometry (SOCG),  pages ??-??,  June,  2013.
    Abstract

    We study coresets for various types of range counting queries on uncertain data. In our model each uncertain point has a probability density describing its location, sometimes defined as k distinct locations. Our goal is to construct a subset of the uncertain points, including their locational uncertainty, so that range counting queries can be answered by just examining this subset. We study three distinct types of queries. RE queries return the expected number of points in a query range. RC queries return the number of points in the range with probability at least a threshold. RQ queries returns the probability that fewer than some threshold fraction of the points are in the range. In both RC and RQ coresets the threshold is provided as part of the query. And for each type of query we provide coreset constructions with approximation-size tradeoffs. We show that random sampling can be used to construct each type of coreset, and we also provide significantly improved bounds using discrepancy-based approaches on axis-aligned range queries.

  • Quality and Efficiency for Kernel Density Estimates in Large Data, Talk
    By Yan Zheng,    Jeffrey Jestes,    Jeff M. Phillips,    Feifei Li
    In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD 2013),  pages 433-444,  June,  2013.
    Abstract

    Kernel density estimates are important for a broad variety of applications including media databases, pattern recognition, computer vision, data mining, and the sciences. Their con- struction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. We propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude more ef- ficient than previous algorithms. Our algorithms do not re- quire knowledge of the kernel or its bandwidth parameter and are easily parallelizable. We demonstrate how to imple- ment our ideas in a centralized setting and in MapReduce, although our algorithms are applicable to any large-scale data processing framework. Extensive experiments on large real datasets demonstrate the quality, efficiency, and scala- bility of our techniques.

  • Radio Tomographic Imaging and Tracking of Stationary and Moving People via Kernel Distance
    By Yang Zhao,    Neal Patwari,    Jeff M. Phillips,    and Suresh Venkatasubramanian
    (To Appear) In Proceedings of 12th ACM-IEEE Conference on Information Processing in Sensor Networks (IPSN),  pages ??-??,  April,  2013.
    Abstract

    Network radio frequency (RF) environment sensing (NRES) systems pinpoint and track people in buildings using changes in the signal strength measurements made by a wireless sensor network. It has been shown that such systems can locate people who do not participate in the system by wearing any radio device, even through walls, because of the changes that moving people cause to the static wireless sensor network. However, many such systems cannot locate stationary people. We present and evaluate a system which can locate stationary or moving people, without calibration, by using kernel distance to quantify the difference between two histograms of signal strength measurements. From five experiments, we show that our kernel distance-based radio tomographic localization system performs better than the state-of-the-art NRES systems in different non line-of-sight environments.

  • eps-Samples for Kernels
    By Jeff M. Phillips
    In Proceedings of 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA),  pages ??-??,  January,  2013.
    Abstract

    We study the worst case error of kernel density estimates via subset approximation. A kernel density estimate of a distribution is the convolution of that distribution with a fixed kernel (e.g. Gaussian kernel). Given a subset (i.e. a point set) of the input distribution, we can compare the kernel density estimates of the input distribution with that of the subset and bound the worst case error. If the maximum error is eps, then this subset can be thought of as an eps-sample (aka an eps-approximation) of the range space defined with the input distribution as the ground set and the fixed kernel representing the family of ranges. Interestingly, in this case the ranges are not binary, but have a continuous range (for simplicity we focus on kernels with range of [0,1]); these allow for smoother notions of range spaces. It turns out, the use of this smoother family of range spaces has an added benefit of greatly decreasing the size required for eps-samples. For instance, in the plane the size is O((1/eps^{4/3}) log^{2/3}(1/eps)) for disks (based on VC-dimension arguments) but is only O((1/eps) sqrt{log (1/eps)}) for Gaussian kernels and for kernels with bounded slope that only affect a bounded domain. These bounds are accomplished by studying the discrepancy of these "kernel" range spaces, and here the improvement in bounds are even more pronounced. In the plane, we show the discrepancy is O(sqrt{log n}) for these kernels, whereas for balls there is a lower bound of Omega(n^{1/4}).

  • Є-Samples for Kernels. (Project Website)
    By Jeff M. Phillips,   
    In Proceedings of SODA (SODA 2013),  pages 1622-1632,  2013.
    Abstract
  • 2012

  • Efficient Protocols for Distributed Classification and Optimization
    By Hal Daume III,    Jeff M. Phillips,    Avishek Saha,    and Suresh Venkatasubramanian
    In Proceedings of 23rd International Conference on Algorithmic Learning Theory (ALT),  pages ??-??,  October,  2012.
    Abstract

    In distributed learning, the goal is to perform a learning task over data distributed across multiple nodes with minimal (expensive) communication. Prior work (Daum ́e III et al., 2012) proposes a general model that bounds the communication required for learning classifiers while allowing for ε training error on linearly separable data adversarially distributed across nodes. In this work, we develop key improvements and extensions to this basic model. Our first result is a two-party multiplicative-weight-update based protocol that uses O(d2 log 1/ε) words of communication to classify distributed data in arbitrary dimension d, ε-optimally. This readily extends to classification over k nodes with O(kd2 log 1/ε) words of communication. Our proposed protocol is simple to implement and is considerably more efficient than baselines compared, as demonstrated by our empirical results. In addition, we illustrate general algorithm design paradigms for doing efficient learning over distributed data. We show how to solve fixed-dimensional and high dimensional linear programming efficiently in a distributed setting where constraints may be distributed across nodes. Since many learning problems can be viewed as convex optimization problems where constraints are generated by individual points, this models many typical distributed learning scenarios. Our techniques make use of a novel connection from multipass streaming, as well as adapting the multiplicative-weight-update framework more generally to a distributed setting. As a consequence, our methods extend to the wide range of problems solvable using these techniques.

  • Ranking Large Temporal Data, Talk
    By Jestes Jestes,    Jeff M. Phillips,    Feifei Li,    Mingwang Tang
    In Proceedings of 38th International Conference on Very Large Databases (VLDB 2012),  pages pages 1412-1423,  Istanbul, Turkey,  August,  2012.
    Abstract

    Ranking temporal data has not been studied until recently [14], even though ranking is an important operator (being promoted as a first-class citizen) in database systems [8]. However, only the instant top-k queries on temporal data were studied in [14], where objects with the k highest scores at a query time instance t are to be retrieved. The instant top-k definition clearly comes with limitations (sensitive to outliers, difficult to choose a meaningful query time t). A more flexible and general ranking operation is to rank objects based on the aggregation of their scores in a query interval, which we dub the aggregate top-k query on temporal data. For example, return the top-10 weather stations having the highest average temperature from 10/01/2010 to 10/07/2010; find the top-20 stocks having the largest total transaction volumes from02/05/2011 to 02/07/2011. This work presents a comprehensive study to this problem by designing both exact and approximate methods (with approximation quality guarantees). We also provide theoretical analysis on the construction cost, the index size, the update and the query costs of each approach. Extensive experiments on large real datasets clearly demonstrate the efficiency, the effectiveness, and the scalability of our methods compared to the baseline methods.

  • Mergeable Summaries
    By Pankaj K. Agarwal,    Graham Cormode,    Zengfeng Huang,    Jeff M. Phillips,    Zhewei Wei,    and Ke Yi
    (To Appear) In Proceedings of ACM Symposium on Principals of Database Systems (PODS 2012),  pages ??-??,  May,  2012.
    Abstract

    We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a summary on the two data sets combined together, while preserving the error and size guarantees. This property means that the summary can be treated like other algebraic objects such as sum and max, which is especially useful for computing summaries on massive distributed data. Many data summaries are trivially mergeable by construction, most notably those based on linear transformations. But some other fundamental ones like those for heavy hitters and quantiles, are not (known to be) mergeable. In this paper, we demonstrate that these summaries are indeed mergeable or can be made mergeable after appropriate modi%uFB01cations. Speci%uFB01cally, we show that for eps-approximate heavy hitters, there is a deterministic mergeable summary of size O(1/eps); for eps-approximate quantiles, there is a deterministic summary of size O(1/epslog(eps n)) that has a restricted form of mergeability, and a randomized one of size O(1/epslog^3/2 1/eps) with full mergeability. We also extend our results to geometric summaries such as eps-approximations and eps-kernels.

  • Efficient Threshold Monitoring for Distributed Probabilistic Data, Talk
    By Mingwang Tang,    Feifei Li,    Jeff M. Phillips,    Jeffrey Jestes
    In Proceedings of 28th IEEE International Conference on Data Engineering (ICDE 2012),  pages 1120-1131,  Washington DC,  April,  2012.
    Abstract

    In distributed data management, a primary concern is monitoring the distributed data and generating an alarm when a user specified constraint is violated. A particular useful instance is the threshold based constraint, which is commonly known as the distributed threshold monitoring problem. This work extends this useful and fundamental study to distributed probabilistic data that emerge in a lot of applications, where uncertainty naturally exists when massive amounts of data are produced at multiple sources in distributed, networked locations. Examples include distributed observing stations, large sensor fields, geographically separate scientific institutes/units and many more. When dealing with probabilistic data, there are two thresholds involved, the score and the probability thresholds. One must monitor both simultaneously, as such, techniques developed for deterministic data are no longer directly applicable. This work presents a comprehensive study to this problem. Our algorithms have significantly outperformed the baseline method in terms of both the communication cost (number of messages and bytes) and the running time, as shown by an extensive experimental evaluation using several, real large datasets.

  • Protocols for Learning Classifiers on Distributed Data
    By Hal Daume III,    Jeff M. Phillips,    Avishek Saha,    Suresh Venkatasubramanian
    In Proceedings of 15th Interntational Conference on Artificial Intelligence and Statistics (AISTATS 2012),  pages ??-??,  La Palma, Canary Islands,  April,  2012.
    Abstract

    We consider the problem of learning classifiers for labeled data that has been distributed across several nodes. Our goal is to find a single classifier, with small approximation error, across all datasets while minimizing the communication between nodes. This setting models real-world communication bottlenecks in the processing of massive distributed datasets. We present several very general sampling-based solutions as well as some two-way protocols which have a provable exponential speed-up over any one-way protocol. We focus on core problems for noiseless data distributed across two or more nodes. The techniques we introduce are reminiscent of active learning, but rather than actively probing labels, nodes actively communicate with each other, each node simultaneously learning the important data from another node.

  • Lower Bounds for Number-in-Hand Multiparty Communication Complexity, Made Easy
    By Jeff M. Phillips,    Elad Verbin,    and Qin Zhang
    In Proceedings of 23th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2012),  pages ??-??,  January,  2012.
    Abstract

    We consider a model for multiparty communication complexity which we call private message mul- tiparty communication complexity, essentially multiparty communication complexity with card-in-hand model but no blackboard. That is, any communication between two players is not witnessed by any other players. Techniques developed for this model provides simpler lower bounds in other models as well. Furthermore, this model has applications for proving lower bounds for the distributed sensing framework.

  • Uncertainty Visualization in HARDI based on Ensembles of ODFs
    By Fangxiang Jiao,    Jeff M. Phillips,    Yaniv Gur,    and Chris R. Johnson
    In Proceedings of 5th IEEE Pacific Visualization Symposium (PACIFICVIS),  pages ??-??,  Songdo, Korea,  February,  2012.
    Abstract

    In this paper, we propose a new and accurate technique for uncertainty analysis and uncertainty visualization based on fiber orientation distribution function (ODF) glyphs, associated with high angular resolution diffusion imaging (HARDI). Our visualization applies volume rendering techniques to an ensemble of 3D ODF glyphs, which we call SIP functions of diffusion shapes, to capture their variability due to underlying uncertainty. This rendering elucidates the complex heteroscedastic structural variation in these shapes. Furthermore, we quantify the extent of this variation by measuring the fraction of the volume of these shapes, which is consistent across all noise levels, the certain volume ratio. Our uncertainty analysis and visualization framework is then applied to synthetic data, as well as to HARDI human-brain data, to study the impact of various image acquisition parameters and background noise levels on the diffusion shapes.

  • 2011

  • Geometric Computations on Indecisive Points
    By Allan G. Jorgensen,    Maarten Loffler,    Jeff M. Phillips
    In Proceedings of 12th Algorithms and Data Structure Symposium (WADS 2011),  pages 536-547,   New York City, New York, USA.,  August ,  2011.
    Abstract

    We study computing with indecisive point sets. Such points have spatial uncertainty where the true location is one of a finite numberof possible locations. This data arises from probing distributions a few times or when the location is one of a few locations from a knowndatabase. In particular, we study computing distributions of geometric functions such as the radius of the smallest enclosing ball and the diameter. Surprisingly, we can compute the distribution of the radius of the smallest enclosing ball exactly in polynomial time, but computing the same distribution for the diameter is #P-hard. We generalize ourpolynomial-time algorithm to all LP-type problems. We also utilize our indecisive framework to deterministically and approximately compute on a more general class of uncertain data where the location of each point is given by a probability distribution.

  • Horoball Hulls and Extents in Positive Definite Space
    By P. Thomas Fletcher,    John Moeller,    Jeff M. Phillips,    Suresh Venkatasubramanian
    In Proceedings of 12th Algorithms and Data Structure Symposium (WADS 2011),  pages 386-398,   New York City, New York, USA.,  August ,  2011.
    Abstract

    The space of positive definite matrices P(n) is a Riemannian manifold with variable nonpositive curvature. It includes Euclidean space and hyperbolicspace as submanifolds, and poses significant challenges for the design of algorithms for data analysis. In this paper, we develop foundational geometric structures and algorithms for analyzing collections of such matrices. A key technical contribution of this work is the use of horoballs, a natural generalization of halfspaces for non-positively curved Riemannian manifolds. We propose generalizations of the notion of a convex hull and a center point and approximations of these structures using horoballs and based on novel decompositions of P(n). This leads to an algorithm for approximate hulls using a generalization of extents.

  • Comparing Distributions and Shapes Using the Kernel Distance
    By Sarang Joshi,    Raj Varma Kommaraju,    Jeff M. Phillips,    Suresh Venkatasubramanian
    In Proceedings of 27th Annual Symposium on Computational Geometry (SOCG 2011),  pages 47-56,  Paris, France,  June,  2011.
    Abstract

    Starting with a similarity function between objects, it is possible to define a distance metric (the kernel distance) on pairs of objects, and more generally on probability distributions over them. These distance metrics have a deep basis in functional analysis and geometric measure theory, and have a rich structure that includes an isometric embedding into a Hilbert space. They have recently been applied to numerous problems in machine learning and shape analysis. In this paper, we provide the first algorithmic analysis of these distance metrics. Our main contributions are as follows: We present fast approximation algorithms for computing the kernel distance between two point sets P and Q that runs in near-linear time in the size of P ? Q (an explicit calculation would take quadratic time). We present polynomial-time algorithms for approximately minimizing the kernel distance under rigid transformation; they run in time O(n poly(1/e, log n)). We provide several general techniques for reducing complex objects to convenient sparse representations (specifically to point sets or sets of points sets) which approximately preserve the kernel distance. In particular, this allows us to reduce problems of computing the kernel distance between various types of objects such as curves, surfaces, and distributions to computing the kernel distance between point sets.

  • Spatially-Aware Comparison and Consensus for Clusterings
    By Jeff M. Phillips,    Parasaran Raman,    and Suresh Venkatasubramanian
    In Proceedings of 10th SIAM Intenational Conference on Data Mining (SDM 2011),  pages 307-318,  Mesa, Arizona, USA,   April,  2011.
    Abstract

    This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatially-aware consensus clustering procedure. This consensus procedure is implemented via a novel reduction to Euclidean clustering, and is both simple and efficient. All of our results apply to both soft and hard clusterings. We accompany these algorithms with a detailed experimental evaluation that demonstrates the efficiency and quality of our techniques.

  • (Approximate) Uncertain Skylines
    By Peyman Afshani,    Pankaj K. Agarwal,    Lars Arge,    Kasper Green Larsen,    and Jeff M. Phillips
    In Proceedings of 14th International Conference on Database Theory (ICDT 2011),  pages 186-196,  Uppsala, Sweden,  March,  2011.
    Abstract

    Given a set of points with uncertain locations, we consider the problem of computing the probability of each point lying on the skyline, that is, the probability that it is not dominated by any other input point. If each point?s uncertainty is described as a probability distribution over a discrete set of locations, we improve the best known exact solution. We also suggest why we believe our solution might be optimal. Next, we describe simple, near-linear time approximation algorithms for computing the probability of each point lying on the skyline. In addition, some of our methods can be adapted to construct data structures that can eciently determine the probability of a query point lying on the skyline.

  • 2010

  • Johnson-Lindenstrauss Dimensionality Reduction on the Simplex
    By Rasmus J. Kyng,    Jeff M. Phillips,    Suresh Venkatasubramanian
    In Proceedings of the 20th Fall Workshop on Computational Geometry (FWCG 2010),  pages 1-4,  NY, USA,  October,  2010.
    Abstract

    We propose an algorithm for dimensionality reduction on the simplex, mapping a set of high-dimensional distributions to a space of lower-dimensional distributions, whilst approximately preserving pairwise Hellinger distance between distributions. By introducing a restriction on the input data to distributions that are in some sense quite smooth, we can map $n$ points on the $d$-simplex to the simplex of $O(eps^{-2}log n)$ dimensions with $eps$-distortion with high probability. The techniques used rely on a classical result by Johnson and Lindenstrauss on dimensionality reduction for Euclidean point sets and require the same number of random bits as non-sparse methods proposed by Achlioptas for database-friendly dimensionality reduction.

  • Stability of epsilon-Kernels
    By Pankaj K. Agarwal,    Jeff M. Phillips,    Hai Yu
    In Proceedings of 18th Annual European Symposium on Algorithms (ESA 2010),  pages 487-499,  September,  2010.
    Abstract

    Given a set P of n points in |R^d, an eps-kernel K subset P approximates the directional width of P in every direction within a relative (1-eps) factor. In this paper we study the stability of eps-kernels under dynamic insertion and deletion of points to P and by changing the approximation factor eps. In the first case, we say an algorithm for dynamically maintaining a eps-kernel is stable if at most O(1) points change in K as one point is inserted or deleted from P. We describe an algorithm to maintain an eps-kernel of size O(1/eps^{(d-1)/2}) in O(1/eps^{(d-1)/2} log n) time per update. Not only does our algorithm maintain a stable eps-kernel, its update time is faster than any known algorithm that maintains an eps-kernel of size O(1/eps^{(d-1)/2}). Next, we show that if there is an eps-kernel of P of size k, which may be dramatically less than O(1/eps^{(d-1)/2}), then there is an (eps/2)-kernel of P of size O(min {1/eps^{(d-1)/2}, k^{floor(d/2)} log^{d-2} (1/eps)}). Moreover, there exists a point set P in |R^d and a parameter eps > 0 such that if every eps-kernel of P has size at least k, then any (eps/2)-kernel of P has size Omega(k^{floor(d/2)}).

  • Universal Multi-Dimensional Scaling
    By Arvind Agarwal,    Jeff M. Phillips,    Suresh Venkatasubramanian
    In Proceedings of 16th Annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD),  pages ??-??,  Washington, DC , USA,  August,  2010.
    Abstract

    In this paper, we propose a unified algorithmic framework for solving many known variants of MDS. Our algorithm is a simple iterative scheme with guaranteed convergence, and is modular; by changing the internals of a single subroutine in the algorithm, we can switch cost functions and target spaces easily. In addition to the formal guarantees of convergence, our algorithms are accurate; in most cases, they converge to better quality solutions than existing methods in comparable time. Moreover, they have a small memory footprint and scale effectively for large data sets. We expect that this framework will be useful for a number of MDS variants that have not yet been studied. Our framework extends to embedding high-dimensional points lying on a sphere to points on a lower dimensional sphere, preserving geodesic distances. As a complement to this result, we also extend the Johnson-Lindenstrauss Lemma to this spherical setting, by showing that projecting to a random O((1/epsilon) log n)- dimensional sphere causes only an "epsilon distortion in the geodesic distances.

  • Lipschitz Unimodal and Isotonic Regression on Paths and Trees
    By Pankaj K. Agarwal,    Jeff M. Phillips,    Bardia Sadri
    In Proceedings of 9th Latin American Theoretical Informatics Symposium (LATIN 2010),  pages 384-396,  Oaxaca, Mexico,  April,  2010.
    Abstract

    We describe algorithms for finding the regression of t, a sequence of values, to the closest sequence s by mean squared error, so that s is always increasing (isotonicity) and so the values of two consecutive points do not increase by too much (Lipschitz). The isotonicity constraint can be replaced with a unimodular constraint, where there is exactly one local maximum in s. These algorithm are generalized from sequences of values to trees of values. For each scenario we describe near-linear time algorithms.

  • Stability of -Kernels. (Project Website)
    By Pankaj K. Agarwal,   Jeff M. Phillips,   Hai Yu,   
    In Proceedings of ESA (1) (ESA 2010),  pages 487-499,  2010.
    Abstract
  • 2009

  • Shape Fitting on Point Sets with Probability Distributions
    By Maarten Loffler,    Jeff M. Phillips
    In Proceedings of 17th Annual European Symposium on Algorithms (ESA 2009),  pages 313-324,  Copenhagen, Denmark,  September,  2009.
    Abstract

    We consider problems on data sets where each data point has uncertainty described by an individual probability distribution. We develop several frameworks and algorithms for calculating statistics on these uncertain data sets. Our examples focus on geometric shape tting problems. We prove approximation guarantees for the algorithms with respect to the full probability distributions. We then empirically demonstrate that our algorithms are simple and practical, solving for a constant hidden by asymptotic analysis so that a user can reliably trade speed and size for accuracy.

  • 2008

  • An Efficient Algorithm for Euclidean 2-Center with Outliers
    By Pankaj K. Agarwal,    Jeff M. Phillips
    In Proceedings of 16th Annual European Symposium on Algorithms (ESA 2008),  pages ??-??,  Universit?t Karlsruhe, Germany,  September ,  2008.
    Abstract

    For a set P of n points in R^2, the Euclidean 2-center problem computes a pair of congruent disks of the minimal radius that cover P.We extend this to the (2, k)-center problem where we compute the minimal radius pair of congruent disks to cover n-k points of P. We present a randomized algorithm with O(nk^7 log^3 n) expected running time for the (2, k)-center problem. We also study the (p,k)-center problem in R^2 under the `l_{infty}-metric. We give solutions for p = 4 in O(k^{O(1)} n log n) timeand for p = 5 in O(k^{O(1)} n log^5 n) time.

  • Algorithms for epsilon-Approximations of Terrains
    By Jeff M. Phillips
    In Proceedings of 35th International Colloquium on Automata, Languages, and Programming (ICALP),  pages 447-458,  Reykjavik, Iceland,  July ,  2008.
    Abstract

    Consider a point set D with a measure function mu : D rightarrow R. Let A be the set of subsets of D induced by containment in a shape from some geometric family (e.g. axis-aligned rectangles, half planes, balls, k-oriented polygons). We say a range space (D, A) has an epsilon-approximation P if | mu(R cap P )/mu(P ) - mu(R cap D)/mu(D) | = epsilon.max R in AWe describe algorithms for deterministically constructing discrete epsilon-approximations for continuous point sets such as distributions or terrains. Furthermore, for certain families of subsets A, such as those described by axis-aligned rectangles, we reduce the size of the epsilon-approximations by almost a square root from O( 1/epsilon^2 log (1/epsilon )) to O( 1/epsilon polylog (1/epsilon) ). This is often the first step in transforming a continuous problem into a discrete one for which combinatorial techniques can be applied. We describe applications of this result in geo-spatial analysis, biosurveillance, and sensor networks.

  • Spatial Scan Statistics for Graph Clustering
    By Bei Wang,    Jeff M. Phillips,    Robert Schrieber,    Dennis Wilkinson,    Nina Mishra,    Robert Tarjan
    In Proceedings of 8th SIAM Intenational Conference on Data Mining (SDM 2008),  pages 727-738,  Atlanta, Georgia, USA,  April,  2008.
    Abstract

    In this paper, we present a measure associated with detection and inference of statistically anomalous clusters of a graph based on the likelihood test of observed and expected edgesin a subgraph. This measure is adapted from spatial scan statistics for point sets and provides quantitative assessmentfor clusters. We discuss some important properties of this statistic and its relation to modularity and Bregman divergences. We apply a simple clustering algorithm to findclusters with large values of this measure in a variety of real-world data sets, and we illustrate its ability to identifystatistically significant clusters of selected granularity.

  • An Efficient Algorithm for 2D Euclidean 2-Center with Outliers. (Project Website)
    By Pankaj K. Agarwal,   Jeff M. Phillips,   
    In Proceedings of ESA (ESA 2008),  pages 64-75,  2008.
    Abstract
  • 2007

  • Value-Based Notification Conditions in Large-Scale Publish/Subscribe Systems
    By Badrish Chandramouli,    Jeff M. Phillips,    Jun Yang
    In Proceedings of 33rd Intenational Conference on Very Large Data Bases (VLDB 2007),  pages 878-889,  Vienna, Austria,  September,  2007.
    Abstract

    We address the problem of providing scalable support for subscriptions with personalized value-based notication conditions in wideareapublish/subscribe systems. Notication conditions can be netuned by subscribers, allowing precise and exible control of when events are delivered to the subscribers. For example, a user may specify that she should be notied if and only if the price of aparticular stock moves outside a ?radius? around her last notied value. Naive techniques for handling notication conditions are notscalable. It is challenging to share subscription processing and notication dissemination of subscriptions with personalized valuebased notication conditions, because two subscriptions may seetwo completely different sequences of notications even if they specify the same radius. We develop and experimentally evaluate scalable processing and dissemination techniques for these subscriptions.Our approach uses standard network substrates for notication dissemination, and avoids pushing complex application processing into the network. Compared with other alternatives, our approach generates orders of magnitude lower network trafc, and incurs lower server processing cost.

  • Outlier Robust ICP for Minimizing Fractional RMSD
    By Jeff M. Phillips,    Ran Liu,    Carlo Tomasi
    In Proceedings of 6th International Conference on 3-D Digital Imaging and Modeling (3DIM),  pages 427-434,  Montreal, Canada ,  August ,  2007.
    Abstract

    We describe a variation of the iterative closest point (ICP) algorithm for aligning two point sets under a set of transformations. Our algorithm is superior to previous algorithms because (1) in determining the optimal alignment, it identifies and discards likely outliers in a statistically robust manner, and (2) it is guaranteed to converge to a locally optimal solution. To this end, we formalize a new distance measure, fractional root mean squared distance (FRMSD), which incorporates the fraction of inliers into the distance function. Our framework can easily incorporate most techniques and heuristics from modern registration algorithms. We experimentally validate our algorithm against previous techniques on 2 and 3 dimensional data exposed to a variety of outlier types.

  • 2006

  • Spatial Scan Statistics: Approximations and Performance Study
    By Deepak Agarwal,    Andrew McGregor,    Jeff M. Phillips,    Suresh Venkatasubramanian,    Zhengyuan Zhu
    In Proceedings of 12th Annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006),  pages 24-33,  Philadelphia, USA,  August,  2006.
    Abstract

    Spatial scan statistics are used to determine hotspots in spatial data, and are widely used in epidemiology and biosurveillance.In recent years, there has been much effortinvested in designing efficient algorithms for finding such ?high discrepancy? regions, with methods ranging from fast heuristics for special cases, to general grid-based methods,and to efficient approximation algorithms with provable guarantees on performance and quality.In this paper, we make a number of contributions to the computational study of spatial scan statistics. First, we describea simple exact algorithm for finding the largest discrepancy region in a domain. Second, we propose a new approximation algorithm for a large class of discrepancyfunctions (including the Kulldorff scan statistic) that improves the approximation versus runtime trade-off of priormethods. Third, we extend our simple exact and our approximation algorithms to data sets which lie naturally on a grid or are accumulated onto a grid. Fourth, we conduct adetailed experimental comparison of these methods with a number of known methods, demonstrating that our approximationalgorithm has far superior performance in practice to prior methods, and exhibits a good performance-accuracy trade-off.All extant methods (including those in this paper) are suitable for data sets that are modestly sized; if data sets are of the order of millions of data points, none of thesemethods scale well. For such massive data settings, it is natural to examine whether small-space streaming algorithms might yield accurate answers. Here, we provide some negative results, showing that any streaming algorithms that even provide approximately optimal answers to the discrepancymaximization problem must use space linear in the input.

  • Bipartite Matching under the RMS Distance
    By Pankaj K. Agarwal,    Jeff M. Phillips
    In Proceedings of 18th Canadian Conference on Computational Geometry (CCCG 2006),  pages ??-??,  Ontario, Canada,  August ,  2006.
    Abstract

    Given two sets A and B of n points each in R 2, we study the problem of computing a matching between A and B that minimizes the root mean square (rms) distance of matched pairs. We can compute an optimal matching in O(n^( 2 delta)) time, for any delta > 0, and an epsilon-approximation in time O((n/epsilon)^(3/2) (logn)^6). If the set B is allowed to move rigidly to minimize the rms distance, we can compute a rigid motion of B and a matching in O((n^4 /epsilon^(5/2))(logn)^6 ) time whose cost is within (1 epsilon) factor of the optimal one

  • Spatial Scan Statistics: Approximations and Performance Study
    By Deepak Agarwal,    Andrew McGregor,    Jeff M. Phillips,    Suresh Venkatasubramanian,    Zhengyuan Zhu
    In Proceedings of The Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006),  pages 24-33,  Philadelphia, USA,  August,  2006.
    Abstract

    Spatial scan statistics are used to determine hotspots in spatial data, and are widely used in epidemiology and biosurveillance. In recent years, there has been much effort invested in designing efficient algorithms for finding such %u201Chigh discrepancy%u201D regions, with methods ranging from fast heuristics for special cases, to general grid-based methods, and to efficient approximation algorithms with provable guarantees on performance and quality. In this paper, we make a number of contributions to the computational study of spatial scan statistics. First, we describe a simple exact algorithm for finding the largest discrepancy region in a domain. Second, we propose a new approximation algorithm for a large class of discrepancy functions (including the Kulldorff scan statistic) that improves the approximation versus runtime trade-off of prior methods. Third, we extend our simple exact and our approximation algorithms to data sets which lie naturally on a grid or are accumulated onto a grid. Fourth, we conduct a detailed experimental comparison of these methods with a number of known methods, demonstrating that our approximation algorithm has far superior performance in practice to prior methods, and exhibits a good performance-accuracy trade-off. All extant methods (including those in this paper) are suitable for data sets that are modestly sized; if data sets are of the order of millions of data points, none of these methods scale well. For such massive data settings, it is natural to examine whether small-space streaming algorithms might yield accurate answers. Here, we provide some negative results, showing that any streaming algorithms that even provide approximately optimal answers to the discrepancy maximization problem must use space linear in the input.

  • The Hunting of the Bump: On Maximizing Statistical Discrepancy
    By Deepak Agarwal,    Jeff M. Phillips,    Suresh Venkatasubramanian
    In Proceedings of 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2006),  pages 1137-1146,   Miami, USA,  January ,  2006.
    Abstract

    Anomaly detection has important applications in bio-surveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrep-ancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy.In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees,both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy;roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the sameset of shapes.We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for datafrom a Poisson distribution. We present an algorithm running in O( 1/epsilon n^2 (logn)^2) that computes the maximum discrepancy rectangle to within additive error ., for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian,Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time O(n^4).

  • On Bipartite Matching under the RMS Distance. (Project Website)
    By Jeff M. Phillips,   Pankaj K. Agarwal,   
    In Proceedings of CCCG (CCCG 2006),  pages ??-??,  2006.
    Abstract
  • 2004

  • Guided Expansive Spaces Trees: A Search Strategy for Motion- and Cost-Constrained State Spaces
    By Jeff M. Phillips,    Nazareth Bedrossian,    and Lydia E. Kavraki
    In Proceedings of IEEE International Conference on Robotics and Automation (ICRA),  pages ??-??,  Barcelona, Spain,  April ,  2004.
    Abstract

    Motion planning for systems with constraints on controls or the need for relatively straight paths for real-time actions presents challenges for modern planners. This paper presents an approach which addresses these types of systems by building on existing motion planning approaches. Guided Expansive Spaces Trees are introduced to search for a low cost and relatively straight path in a space with motion constraints. Path Gradient Descent, which builds on the idea of Elastic Strips, finds the locally optimal path for an existing path. These techniques are tested on simulations of rendezvous and docking of the space shuttle to the International Space Station and of a 4-foot fan-controlled blimp in a factory setting

  • 2003

  • Spacecraft Rendezvous and Docking with Real-Time, Randomized Optimization
    By Jeff M. Phillips,    Lydia E. Kavraki,    and Nazareth Bedrossian
    In Proceedings of AIAA Guidance, Navigation, and Control (AIAA),  pages ??-??,  August ,  2003.
    Abstract

    This paper presents a probabilistic approach to solve optimal control problems with appli-cation to spacecraft proximity operations. The 6 degree-of-freedom rendezvous and dockingproblem, using impulsive control, and avoidance of known obstacles and plume impingement issolved. Our solution is then extended to real-time obstacle avoidance. The space is searchedby expanding from the start location by applying only feasible controls and coasts, reducingby nearly 50% the variables perturbed in the search. A randomized expansion techniqueexplores the search space. A gradient descent approach smoothes the path and avoids newobstacles in real-time by 'stretching' the best pre-computed path in a locally optimal manner.

  • Probabilistic Optimization Applied to Spacecraft Rendezvous and Docking
    By Jeff M. Phillips,    Lydia E. Kavraki,    and Nazareth Bedrossian
    In Proceedings of AAS/AIAA Space Flight Mechanics Meeting,  pages ??-??,  Ponce, Puerto Rico,  February,  2003.
    Abstract

    This paper presents a probabilistic approach to solve optimal control problems with appli-cation to spacecraft proximity operations. The 6 degree-of-freedom rendezvous and dockingproblem, using impulsive control, and avoidance of known obstacles and plume impingement issolved. Our solution is then extended to real-time obstacle avoidance. The space is searchedby expanding from the start location by applying only feasible controls and coasts, reducingby nearly 50% the variables perturbed in the search. A randomized A* expansion techniqueexplores the search space. A gradient descent approach smoothes the path and avoids newobstacles in real-time by 'stretching' the best pre-computed path in a locally optimal manner.

  • 2002

  • Simulated Knot Tying
    By Jeff M. Phillips,    Andrew M. Ladd,    Lydia E. Kavraki
    In Proceedings of IEEE International Conference on Robotics and Automation (ICRA),  pages 841-846,  Washington, DC, USA,  May ,  2002.
    Abstract

    Applications such as suturing in medical simulations require the modeling of knot tying in physically realistic rope. The paper describes the design and implementation of such a system. Our model uses a spline of linear springs, adaptive subdivision and a dynamics simulation. Collisions are discrete event simulated and follow the impulse model. Although some care must be taken to maintain stable knots, we demonstrate our simple model is sufficient for this task. In particular, we do not use friction or explicit constraints to maintain the knot. As examples, we tie an overhand knot and a reef knot.

  • Workshop

    2012

  • Sensor Network Localization for Moving Sensors
    by Arvind Agarwal,    Hal Daume III,    Jeff M. Phillips,    and Suresh Venkatasubramanian
    In Proceedings of the 2nd IEEE ICDM International Workshop on Data Mining in Networks, in conjunction with the ICDM (DaMNet), pages , ,  December,  2012.  .
    Abstract

    Sensor network localization (SNL) is the problem of determining the locations of the sensors given sparse and usually noisy inter-communication distances among them. In this work we propose an iterative algorithm named PLACEMENT to solve the SNL problem. This iterative algorithm requires an initial estimation of the locations and in each iteration, is guaranteed to reduce the cost function. The proposed algorithm is able to take advantage of the good initial estimation of sensor locations making it suitable for localizing moving sensors, and also suitable for the refinement of the results produced by other algorithms. Our algorithm is very scalable. We have experimented with a variety of sensor networks and have shown that the proposed algorithm outperforms existing algorithms both in terms of speed and accuracy in almost all experiments. Our algorithm can embed 120,000 sensors in less than 20 minutes.

  • 2011

  • Generating a Diverse Set of High-Quality Clusterings
    by Jeff M. Phillips,    Parasaran Raman,    Suresh Venkatasubramanian
    In Proceedings of the the 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings, in conjunction with the ECML/PKDD 2011 (MultiClust 2011), pages ??-??, Athens, Greece,  September,  2011.  .
    Abstract

    We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many high-quality partitions, and then grouping these partitions to obtain k representatives. The decomposition makes the approach extremely modular and allows us to optimize various criteria that control the choice of representative partitions.

  • 2010

  • Metrics for Uncertainty Analysis and Visualization of Diffusion Tensor Images
    by Fangxiang Jiao,    Jeff M. Phillips,    Jeroen Stinstra,    Jens Krueger,    Raj Varma Kummaraju,    Edward Hsu,    Julie Korenberg,    Chris R. Johnson
    In Proceedings of the 5th International Workshop on Medical Imaging and Augmented Reality, in conjunction with the MICCAI 2010 (MIAR), pages , Beijing, China,  September ,  2010.  Springer Lecture Notes in Computer Science (LNCS) series.
    Abstract

    In this paper, we propose three metrics to quantify the differences between the results of diffusion tensor magnetic resonance imaging (DT-MRI) fiber tracking algorithms: the area between corresponding fibers of each bundle, the Earth Mover?s Distance (EMD) between two fiber bundle volumes, and the current distance between two fiber bundle volumes. We also discuss an interactive fiber track comparison visualization toolkit we have developed based on the three proposed fiber difference metrics and have tested on six widely-used fiber tracking algorithms. To show the effectiveness and robustness of our metrics andvisualization toolkit, we present results on both synthetic data and high resolution monkey brain DT-MRI data. Our toolkit can be used for testing the noise effects on fiber tracking analysis and visualization and to quantify the difference between any pair of DT-MRI techniques, compare single subjects within an image atlas.

  • Incremental Multi-Dimensional Scaling
    by Arvind Agarwal,    Jeff M. Phillips,    Hal Daume III,    Suresh Venkatasubramanian
    In Proceedings of the The Learning Workshop at Snowbird, in conjunction with the (), pages , ,  April ,  2010.  .
    Abstract

    Multi-Dimensional Scaling (MDS) is a widely used method for embedding a given distance matrix into a low dimensional space, used both as a preprocessing step for many machine learning problems, as well as a visualization tool in its own right. In this paper, we present an incremental version of MDS (iMDS). In iMDS, d-dimensional data points are presented in a stream, and the task is to embed the current d-dimensional data point into a k-dimensional space for k < d such that distances from the current point to the previous points are preserved. Let {x1,..., xt-1} ? R k be the data points at time step t that have already been embedded into a k-dimensional space, {r1,..., rt-1} be the given distances computed in R d, then objective of iMDS is to find a point xt: xt = arg min p?R k ?t-1 i=1

  • 2006

  • Segmenting Motifs in Protein-Protein Interface Surfaces
    by Jeff M. Phillips,    Johannes Rudolph,    Pankaj K. Agarwal
    In Proceedings of the Proceedings of the 6th Workshop on Algorithms in Bioinformatics , in conjunction with the (WABI), pages , Zurich, Switzerland,  September ,  2006.  .
    Abstract

    Protein-protein interactions form the basis for many intercellular events. In this paper we develop a tool for understanding the structure of these interactions. Specifically, we define a method for identifying a set of structural motifs on protein-protein interface surfaces. These motifs are secondary structures, akin to alpha-helices and beta-sheets in protein structure; they describe how multiple residues form knob-into-holefeatures across the interface. These motifs are generated entirely from geometric properties and are easily annotated with additional biologicaldata. We point to the use of these motifs in analyzing hotspot residues.