You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p><strong>Where</strong>: University of Washington, Seattle.<br>
79
+
Allen School of Computer Science and Engineering.<br>
80
+
Paul G. Allen Center, CSE 291</p>
81
+
82
+
<p><strong>When</strong>:
83
+
Friday, January 19th, 2024, 2:30pm-3:30pm</p>
84
+
85
+
<p><strong>Title</strong>:
86
+
Towards End-to-end Data Pipeline for Effective Data Science
87
+
</p>
88
+
89
+
<p><strong>Abstract</strong>:
90
+
Nowadays data-driven approaches have become a mainstream research methodology in multiple communities. To support effective and scalable data science applications on the ever growing datasets, researchers from both academic and industrial fields have made great efforts in building end-to-end data pipelines. In this talk, I will present my efforts in improving two essential components of an end-to-end data pipeline: data preparation and data processing. First, I will present a unified self-supervised learning paradigm that can improve the performance of a variety of data preparation tasks, such as dataset discovery, table annotation and entity matching. Next, I will introduce my work in optimizing parallel recursive queries to support analytical workloads in data processing. Finally, I will conclude with the vision for future work of data pipelines.
91
+
</p>
92
+
93
+
<p><strong>Bio</strong>:
94
+
Jin Wang is a research scientist and research lead from Megagon Labs. Before that he obtained his PhD degree of Computer Science from University of California, Los Angeles in July 2020. His research interests lie in the board area of data management and data science. In particular, his research focuses on Database systems, Datalog, Data Integration and Table Representation Learning. His work appears in leading conferences and journals of data management such as SIGMOD, VLDB, ICDE and VLDB Journal.
0 commit comments