Themen

Im Folgenden finden Sie eine Aufstellung der zur Verfügung stehenden Themen. Die angegebene Literatur versteht sich als Startlektüre und weitere Literatur sollte selbstständig recherchiert wertden.

Themenblock 1: Data Infrastruktur

Verteilte Dateisysteme
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation, pages 137-150, Berkeley, CA, USA, 2004. USENIX Association.
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. SIGOPS Oper. Syst. Rev., 37(5):29, 2003.
- Hadoop File System. http://hadoop.apache.org/common/docs/current/hdfs_design.html
Cluster Scheduling
- Eric Baldeschwieler et al., Apache Hadoop YARN: Yet Another Resource Negotiator, https://www.cs.cmu.edu/~garth/15719/papers/yarn.pdf
- Hindemann, Meso: A Platform for Fine-Grained Resource Sharing in the Data Center, Technical Report, University of California, Berkley, 2010
- Verma et al., Large-scale cluster management at Google with Borg, Proceedings of the European Conference on Computer Systems (EuroSys), ACM, 2015
- Kubernetes, https://kubernetes.io/>
- Burns et al., Borg, Omega, and Kubernetes, ACM Queue, 2016
Distributed Processing Engines
- Dean et al., MapReduce: Simplified Data Processing in Large Clusters, 2004
- Zaharia et al., Spark: Cluster Computing with Working Sets, Hotcloud 2010.
- Carbone et al., Apache Flink: Stream and Batch Processing in a Single Engine, IEEE Engineering Bulletins, 2015
- Rocklin, Dask: Parallel Computation with Blocked algorithms and Task Scheduling, Proceedings of the 14th Python in Science Conference, 2015
SQL Engines
- Thusoo et al., Hive: A Warehousing Solution Over a Map-Reduce Framework, 2009.
- Armbrust et al., Spark SQL: Relational Data Processing in Spark, 2015.
- Kornacker et al., Impala: A Modern, Open-Source SQL Engine for Hadoop, 2015
Stream Processing
- Kafka, https://kafka.apache.org/
- Twitter Storm, http://storm-project.net/
- Zaharia et al., Discretized Streams: Fault-Tolerant Streaming Computation at Scale, SOSP, 2013
- Carbone et al., Apache Flink: Stream and Batch Processing in a Single Engine, IEEE Engineering Bulletins, 2015
Cloud-Services for Data

Storage and SQL Services

Ramakrishnan et al., Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics, VLDB, 2017
Corbett et al., Spanner: Google's Globally-Distributed Database, OSDI, 2012

Higher-Level Services

Amazon Elastic MapReduce: Elastic MapReduce
Hadoop on Azure: HDInsight: Managed Open Source Big Data Analytics

Themenblock 2: Data Science und Machine Learning

Data Science Overview: Methods and Frameworks

Peng, Matsui, The Art of Data Science, 2017
Hey, The Fourth Paradigm of Scientific Discovery, 2009
Cao, Data Science: Challenges and Directions, Communications of the ACM, 2017
Donoho, 50 Years of Data Science, 2015
Scikit-Learn, 2017
Machine Learning Frameworks (Github), 2017

Machine Learning Best Practices

Dominigos, A Few Useful Things to Know about Machine Learning, KDD, 2014
Sculley et al., Machine Learning: The High Interest Credit Card of Technical Debt, NIPS, 2014

Distributed Machine Learning

Dean et al., Large Scale Distributed Deep Networks, 2012
Li et al., Scaling Distributed Machine Learning with the Parameter Server, OSDI, 2014
Xing et al., Petuum: A New Platform for Distributed Machine Learning on Big Data, KDD, 2015
Meng et al., MLlib: Machine Learning in Apache Spark, Journal of Machine Learning Research, 2016
H2O, H2O version 3, 2017.

Model Deployment

Cloud

Crankshaw et al., Clipper: A Low-Latency Online Prediction Serving System, NSDI, 2017
Tensorflow Serve, https://www.tensorflow.org/serving/, 2017

Edge Inference

Apple CoreML, https://developer.apple.com/documentation/coreml, 2017
Nvidia, TensorRT, https://developer.nvidia.com/tensorrt, 2017

Themenblock 3: Deep Learning

Deep Learning: Convolutional Neural Networks

LeCun, Bengio, Hinton, Deep Learning, Nature, 2015
Krizhevsky et al., ImageNet Classification with Deep Convolutional Neural Networks, 2012
Ujjwal Karn, An intuitive explanation of convolutional neural networks, 2016
Alex Krizhevsky, One weird trick for parallelizing convolutional neural networks, 2014
Stanford, CS231n: Convolutional Neural Networks, 2017

Deep Learning Frameworks: Caffe, Torch/PyTorch, Tensorflow, CNTK, MXNet

Jia et al., Caffe: Convolutional Architecture for Fast Feature Embedding, MM, 2014.
Torch, 2017.
PyTorch, 2017.
Abadi et al., TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, White Paper, 2015.
Seide et al., CNTK: Microsoft's Open Source Deep-Learning Toolkit, White Paper, 2016.
Chen et al., MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems, 2015.

Last Change: Mon, 11 Dec 2023 07:35:29 +0100 - Viewed on: Sun, 28 Apr 2024 19:13:25 +0200
Copyright © MNM-Team http://www.mnm-team.org - Impressum / Legal Info - Datenschutz / Privacy

The Munich Network Management Team

Themenblock 1: Data Infrastruktur

Themenblock 2: Data Science und Machine Learning

Themenblock 3: Deep Learning