論文
2015 – 2016
2015 – Facebook – One Trillion Edges: Graph Processing at Facebook-Scale.(一兆邊:Facebook規(guī)模的圖像處理)
2013 – 2014
2014 – Stanford – Mining of Massive Datasets.(海量數(shù)據(jù)集挖掘)
2013 – AMPLab – Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. (Presto: 稀疏矩陣的分布式機(jī)器學(xué)習(xí)和圖像處理)
2013 – AMPLab – MLbase: A Distributed Machine-learning System. (MLbase:分布式機(jī)器學(xué)習(xí)系統(tǒng))
2013 – AMPLab – Shark: SQL and Rich Analytics at Scale. (Shark: 大規(guī)模的SQL 和豐富的分析)
2013 – AMPLab – GraphX: A Resilient Distributed Graph System on Spark. (GraphX:基于Spark的彈性分布式圖計(jì)算系統(tǒng))
2013 – Google – HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm. (HyperLogLog實(shí)踐:一個(gè)藝術(shù)形態(tài)的基數(shù)估算算法)
2013 – Microsoft – Scalable Progressive Analytics on Big Data in the Cloud.(云端大數(shù)據(jù)的可擴(kuò)展性漸進(jìn)分析)
2013 – Metamarkets – Druid: A Real-time Analytical Data Store. (Druid:實(shí)時(shí)分析數(shù)據(jù)存儲(chǔ))
2013 – Google – Online, Asynchronous Schema Change in F1.(F1中在線、異步模式的轉(zhuǎn)變)
2013 – Google – F1: A Distributed SQL Database That Scales. (F1: 分布式SQL數(shù)據(jù)庫(kù))
2013 – Google – MillWheel: Fault-Tolerant Stream Processing at Internet Scale.(MillWheel: 互聯(lián)網(wǎng)規(guī)模下的容錯(cuò)流處理)
2013 – Facebook – Scuba: Diving into Data at Facebook. (Scuba: 深入Facebook的數(shù)據(jù)世界)
2013 – Facebook – Unicorn: A System for Searching the Social Graph. (Unicorn: 一種搜索社交圖的系統(tǒng))
2013 – Facebook – Scaling Memcache at Facebook. (Facebook 對(duì) Memcache 伸縮性的增強(qiáng))
2011 – 2012
2012 – Twitter – The Unified Logging Infrastructure for Data Analytics at Twitter. (Twitter數(shù)據(jù)分析的統(tǒng)一日志基礎(chǔ)結(jié)構(gòu))
2012 – AMPLab –Blink and It’s Done: Interactive Queries on Very Large Data. (Blink及其完成:超大規(guī)模數(shù)據(jù)的交互式查詢)
2012 – AMPLab –Fast and Interactive Analytics over Hadoop Data with Spark. (Spark上 Hadoop數(shù)據(jù)的快速交互式分析)
2012 – AMPLab –Shark: Fast Data Analysis Using Coarse-grained Distributed Memory. (Shark:使用粗粒度的分布式內(nèi)存快速數(shù)據(jù)分析)
2012 – Microsoft –Paxos Replicated State Machines as the Basis of a High-Performance Data Store. (Paxos的復(fù)制狀態(tài)機(jī)——高性能數(shù)據(jù)存儲(chǔ)的基礎(chǔ))
2012 – Microsoft –Paxos Made Parallel. (Paxos算法實(shí)現(xiàn)并行)
2012 – AMPLab – BlinkDB:BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data.(超大規(guī)模數(shù)據(jù)中有限誤差與有界響應(yīng)時(shí)間的查詢)
2012 – Google –Processing a trillion cells per mouse click.(每次點(diǎn)擊處理一兆個(gè)單元格)
2012 – Google –Spanner: Google’s Globally-Distributed Database.(Spanner:谷歌的全球分布式數(shù)據(jù)庫(kù))
2011 – AMPLab –Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.(Scarlett:應(yīng)對(duì)MapReduce集群中的偏向性內(nèi)容)
2011 – AMPLab –Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.(Mesos:數(shù)據(jù)中心中細(xì)粒度資源共享的平臺(tái))
2011 – Google –Megastore: Providing Scalable, Highly Available Storage for Interactive Services.(Megastore:為交互式服務(wù)提供可擴(kuò)展,高度可用的存儲(chǔ))
2001 – 2010
2010 – Facebook – Finding a needle in Haystack: Facebook’s photo storage.(探究Haystack中的細(xì)微之處: Facebook圖片存儲(chǔ))
2010 – AMPLab – Spark: Cluster Computing with Working Sets.(Spark:工作組上的集群計(jì)算)
2010 – Google – Storage Architecture and Challenges.(存儲(chǔ)架構(gòu)與挑戰(zhàn))
2010 – Google – Pregel: A System for Large-Scale Graph Processing.(Pregel: 一種大型圖形處理系統(tǒng))
2010 – Google – Large-scale Incremental Processing Using Distributed Transactions and Noti?cations base of Percolator and Caffeine.(使用基于Percolator 和 Caffeine平臺(tái)分布式事務(wù)和通知的大規(guī)模增量處理)
2010 – Google – Dremel: Interactive Analysis of Web-Scale Datasets.(Dremel: Web規(guī)模數(shù)據(jù)集的交互分析)
2010 – Yahoo – S4: Distributed Stream Computing Platform.(S4:分布式流計(jì)算平臺(tái))
2009 – HadoopDB:An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.(混合MapReduce和DBMS技術(shù)用于分析工作負(fù)載的的架構(gòu))