publications
2026
- SIGMOD 2026AeroSketch: Near-Optimal Time Matrix Sketch Framework for Persistent, Sliding Window, and Distributed StreamsProceedings of the ACM on Management of Data, Apr 2026
Many real-world matrix datasets arrive as high-throughput vector streams, making it impractical to store or process them in their entirety. To enable real-time analytics under limited computational, memory, and communication resources, matrix sketching techniques have been developed over recent decades to provide compact approximations of such streaming data. Some algorithms have achieved optimal space and communication complexity. However, these approaches often require frequent time-consuming matrix factorization operations. In particular, under tight approximation error bounds, each matrix factorization computation incurs cubic time complexity, thereby limiting their update efficiency. In this paper, we introduce AeroSketch, a novel matrix sketching framework that leverages recent advances in randomized numerical linear algebra (RandNLA). AeroSketch achieves optimal communication and space costs while delivering near-optimal update time complexity (within logarithmic factors) across persistent, sliding window, and distributed streaming scenarios. Extensive experiments on both synthetic and real-world datasets demonstrate that AeroSketch consistently outperforms state-of-the-art methods in update throughput. In particular, under tight approximation error constraints, AeroSketch reduces the cubic time complexity to the quadratic level. Meanwhile, it maintains comparable approximation quality while retaining optimal communication and space costs.
@article{10.1145/3786622, author = {Yin, Hanyan and Wen, Dongxie and Li, Jiajun and Wei, Zhewei and Zhang, Xiao and Zhao, Peng and Zhou, Zhi-Hua}, title = {AeroSketch: Near-Optimal Time Matrix Sketch Framework for Persistent, Sliding Window, and Distributed Streams}, year = {2026}, issue_date = {February 2026}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {4}, number = {1}, url = {https://doi.org/10.1145/3786622}, doi = {10.1145/3786622}, journal = {Proceedings of the ACM on Management of Data}, month = apr, articleno = {8}, numpages = {26}, keywords = {streaming data, optimal time complexity, matrix sketch, persistent sketch, sliding window, distributed sketch}, } - ICLR 2026Revisiting Matrix Sketching in Linear Bandits: Achieving Sublinear Regret via Dyadic Block SketchingIn The Fourteenth International Conference on Learning Representations, Feb 2026
Linear bandits have become a cornerstone of online learning and sequential decision-making, providing solid theoretical foundations for balancing exploration and exploitation. Within this domain, matrix sketching serves as a critical component for achieving computational efficiency, especially when confronting high-dimensional problem instances. The sketch-based approaches reduce per-round complexity from \(Ω(d^2)\)to \(O(dl)\), where is the dimension and \(l<d\)is the sketch size. However, this computational efficiency comes with a fundamental pitfall: when the streaming matrix exhibits heavy spectral tails, such algorithms can incur vacuous linear regret. In this paper, we revisit the regret bounds and algorithmic design for sketch-based linear bandits. Our analysis reveals that inappropriate sketch sizes can lead to substantial spectral error, severely undermining regret guarantees. To overcome this issue, we propose Dyadic Block Sketching, a novel multi-scale matrix sketching approach that dynamically adjusts the sketch size during the learning process. We apply this technique to linear bandits and demonstrate that the new algorithm achieves sublinear regret bounds without requiring prior knowledge of the streaming matrix properties. It establishes a general framework for efficient sketch-based linear bandits, which can be integrated with any matrix sketching method that provides covariance guarantees. Comprehensive experimental evaluation demonstrates the superior utility-efficiency trade-off achieved by our approach.
@inproceedings{wen2026revisiting, title = {Revisiting Matrix Sketching in Linear Bandits: Achieving Sublinear Regret via Dyadic Block Sketching}, author = {Wen, Dongxie and Yin, Hanyan and Zhang, Xiao and Zhao, Peng and Zhang, Lijun and Wei, Zhewei}, year = {2026}, booktitle = {The Fourteenth International Conference on Learning Representations}, month = feb }
2025
- IEEE TONPredictive Configuration on DHCP in WLANsPei Zhang, Yunzhe Wang, Hanyan Yin, Botong Wu, Qi Wang, Xiaohong Huang, Yan Ma, Jilong Wang, and Congcong Miao*IEEE Transactions on Networking, Jul 2025
DHCP is widely deployed in WLANs to automatically assign IP addresses to WiFi devices when users connect to the WLANs. However, frequent user mobility brings big challenges to the DHCP performance. Recently proposed IP configuration (e.g., IP lease time, size of IP address pool) decisions on DHCP are based on traditional models to study user mobility patterns which lead to poor DHCP performance since the online time of individuals varies due to their personal pReferences and the number of crowds differs spatially and temporally. In this paper, we propose PredHCP, a predictive configuration framework on DHCP to improve the DHCP performance. Specifically, PredHCP utilizes an attention-based recurrent neural network (ARNN) to learn sequential patterns of individual mobility and accurately predicts user online time to ensure the effective IP lease time configuration. Meanwhile, PredHCP introduces a spatio-temporal graph neural network (STGNN) to learn both spatial and temporal dependencies of crowd migration and accurately predict crowd size in each area to ensure effective IP pool configuration. We conduct comprehensive experiments on real network traces for a month to evaluate the performance of PredHCP. Experimental results show that PredHCP can accurately predict user mobility patterns by achieving lower prediction errors. By accurately modeling mobility patterns, PredHCP makes effective IP configuration to ensure high DHCP performance. Large-scale simulation results show that PredHCP can save up to 69% IP addresses and the IP efficiency is 41% which outperforms existing methods by 6%.
@article{11076090, author = {Zhang, Pei and Wang, Yunzhe and Yin, Hanyan and Wu, Botong and Wang, Qi and Huang, Xiaohong and Ma, Yan and Wang, Jilong and Miao, Congcong}, journal = {IEEE Transactions on Networking}, title = {Predictive Configuration on DHCP in WLANs}, year = {2025}, pages = {1-16}, keywords = {IP networks;Servers;Wireless fidelity;Recurrent neural networks;Buildings;Performance evaluation;Smart phones;Portable computers;Graph neural networks;Protocols;Dynamic host configuration protocol;IP address;individual mobility prediction;crowd mobility prediction;deep learning model}, doi = {10.1109/TON.2025.3585618}, issn = {2998-4157}, month = jul, }
2024
- VLDB 2024Optimal Matrix Sketching over Sliding WindowsIn 50th International Conference on Very Large Databases, Guangzhou, China, Aug 2024
Best Paper Nomination

Matrix sketching, aimed at approximating a matrix \(\boldsymbol{A} ∈\mathbb{R}^{N \times d}\)consisting of vector streams of length \(N\)with a smaller sketching matrix \(\boldsymbol{B}∈\mathbb{R}^{\ell\times d}\), \(\ell ≪N\), has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal \(O\left({d\over \varepsilon}\right)\)space bound and provides a covariance error guarantee of \(\varepsilon=\lVert \boldsymbol{A}^⊤\boldsymbol{A}-\boldsymbol{B}^⊤\boldsymbol{B}\rVert_2/\lVert \boldsymbol{A}\rVert_F^2\). The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix \(\boldsymbol{A}_W\), formed by input vectors over the most recent \(N\)time units. However, despite recent efforts, whether achieving the optimal \(O\left({d\over\varepsilon}\right)\)space bound on sliding windows is possible has remained an open question. In this paper, we introduce the DS-FD algorithm, which achieves the optimal \(O\left({d\over\varepsilon}\right)\)space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of DS-FD across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically.
@inproceedings{yin2024optimal, title = {Optimal Matrix Sketching over Sliding Windows}, author = {Yin, Hanyan and Wen, Dongxie and Li, Jiajun and Wei, Zhewei and Zhang, Xiao and Huang, Zengfeng and Li, Feifei}, booktitle = {50th International Conference on Very Large Databases}, doi = {10.14778/3665844.3665847}, year = {2024}, month = aug, volume = {17}, issue = {9}, pages = {2149--2161}, }