Y.Matias, Data Streams&Cloud Computing for Massive Scalability , L018
Faculty of Engineering and Natural Sciences
FENS SEMINARS
Algorithms Day at Sabancı University: Massive Data Processing and the Web
March 17, 2008, 13:40, FMAN L018
Data Streams and Cloud Computing for Massive Scalability
Yossi Matias
Director, Google R&D Center, Tel Aviv
With the proliferation of data intensive applications, it has become necessary to develop new techniques to handle massive data sets. Traditional algorithmic techniques and data structures are not always suitable to handle the amount of data that is required and the fact that the data often streams by and cannot be accessed again. A field of research established over the past decade is that of handling massive data sets using data synopses, and developing algorithmic techniques for data stream models. We will discuss some of the research work that has been done in the field.
We will also highlight the emerging technological landscape of Cloud Computing, which allows computing services over the web, and which enables unprecedented scalability in terms of content, users, queries, and applications.
Google processes 20 petabytes (20,000,000 gigabytes) of raw data every day in the form of crawled documents, web request logs, etc, to compute various kinds of derived data such as inverted indices, representations of the graph structure of the Web documents, summaries of pages crawled and the set of most frequent queries. As the input data is extremely large, the computational tasks are distributed across thousands of machines and even then very fast algorithms are required for completing these tasks in a reasonable amount of time. A key computing paradigm that provides the necessary level of performance is sublinearity: it is often possible to obtain a short "sketch" of a data set, especially in the form of a stream, in sublinear processing time and space.
The Algorithms Day at Sabancı University will focus on data processing possibilities offered by the sublinear computing paradigm and the related research at Google.
Distinguished Speaker: Yossi Matias (director, Google Tel Aviv R&D Center)
Yossi Matias is the director of the Google Tel Aviv R&D Center. He is on leave from the
School of Computer Science at Tel Aviv University, where he is a Professor. Earlier, he was a research scientist at Bell Laboratories. Matias received his Ph.D. with distinction from Tel Aviv University. Since then he has authored over 100 research papers and has been the inventor of over 20 patents in the areas of data analysis, algorithms for massive data sets, data streams, data synopses, parallel computation, data compression, data and information management systems, data security and privacy, video scrambling, and Internet technologies.
Matias is a frequent speaker at and was on the program committees of numerous international scientific and technology conferences. He has also been heavily involved in the high tech industry and in technology and product development: He founded the Lucent Personalized Web Assistant project (1996), developing one of the early Internet privacy and anti-spam technologies; he co-founded and led Zapper Technologies (1999), developing advanced contextual and personalized search technologies; he was the CTO and Chief Scientist of Hyperroll, leading the technology strategy of its high performance data aggregation software that is at the core of data warehouses and Business Intelligence applications of multiple Fortune 100 enterprises. Yossi Matias is a recipient of the 2005 ACM-EATCS Gödel prize in Theoretical Computer Science "for the profound impact on the theory and practice of the analysis of data streams".
Local Speaker: S. Cenk Şahinalp (Simon Fraser University, Canada and Sabancı University)
Sahinalp is a Canada Research Chair and Professor of Computing Science at Simon Fraser University, Burnaby BC, Canada. His research focuses on problems in string/sequence algorithms, discrete metric embeddings, massive data sets, data compression and computational molecular biology. Sahinalp received a B.S. in Electrical Engineering from Bilkent University and and a Ph.D. in Computer Science from University of Maryland, College Park . Later, he was first a postdoc at Bell Labs, Murray Hill (with Yossi Matias) and then a research associate at the University of Pennsylvania. During this time he also held a faculty position at the University of Warwick and was a frequent visitor to DIMACS and AT&T Research . Before moving to SFU, Sahinalp was a faculty member at the Departments of EECS and Genetics at Case Western Reserve University. Sahinalp has published over 65 papers in some of the flagship venues in computer science and computational biology and has been an invited speaker and PC member in many conferences and workshops. He was the PC chair of the14th Combinatorial Pattern Matching Conference and has received an NSF Career Award, a BC Advanced Systems Institute Fellowship and a Michael Smith Foundation
Scholarship.
Local Speaker: Funda Ergun (Simon Fraser University, Canada and Sabancı University)
Ergun is an associate professor at Simon Fraser University in Burnaby, BC, Canada. Her interests lie in the area of massive data computation, in particular sublinear algorithms, property testing, and streaming computation, as well as high speed networks in the
context of algorithmic aspects of sensor networks and quality of service. She received her B.S. degree in computer engineering from Bilkent University and her Ph.D. in computer science from Cornell University. After that she was a postdoctoral fellow at the University of Pennsylvania, then a member of technical staff at Bell Labs, Murray Hill, NJ, and later a faculty member at Case Western Reserve University before joining SFU. She has also been a long-term visitor at MIT, NEC research in Princeton, NJ, and DIMACS.