Please consider downloading the latest version of Internet Explorer
to experience this site as intended.
Tools Search Main Menu

Tutorials

April 26-27, 2019 at the University of Rochester, New York


EIGHTH ANNUAL CONFERENCE OF THE UPSTATE CHAPTERS OF THE AMERICAN STATISTICAL ASSOCIATION


Tutorials

Registration for tutorials is open.

Registered conference attendees will get a link by email with a registration site. For questions, email Dr. Love at tanzy_love@urmc.rochester.edu

There is no additional fee for tutorials, however, we want to make sure that we have enough seats in the rooms.

The registration site for tutorials will remain open as long as there are seats available.

 

Friday, April 26

 

10:00-11:45 (Helen Wood Hall)                  Tutorials 1-3

  1. Analyzing Bitcoin: A Theoretical and Empirical Analysis of How the Bitcoin otocol Impacts the Market for Bitcoin -- Dr Andrea Podhorsky [1W-304]
  2. Teaching Statistics with Teamwork -- Dr Carol Marchetti [2W-213]
  3. Machine learning applications for molecular modeling and dynamics and a tutorial with free and open source software -- Dr Gregory Babbitt [2W-216]

12:00-1:45 (Helen Wood Hall)                  Tutorials 4-5

  1. How to approach a Big Analysis with R – Mr Donald Harrington [1W-304]
  2. Recent Gaussian Process Evolution with Large Spatial-Temporal Data, A Concise Tutorial -- Mr Bohan Liu [2W-213]

12:00-1:00 (Helen Wood Hall)                  Tutorial 6

    6. Introduction to Microsoft AI: An interactive tutorial on how to use Microsoft new platform to solve a Software engineering Problem -- Mr Hussein Talib Al-Rubaye and Mrs. Deema AlShoaibi [2W-216]

2:00-3:45 (Helen Wood Hall)                  Tutorials 7 & 9

  1. Analyzing Public LIGO and Virgo Data from the Gravitational Wave Open Science Center -- Dr John Whelan [2W-213]

    9. High Performance Computing for Statistical Machine Learning and Data Science: A Discovery of Methods, Techniques and Tools with R -- Dr Ernest Fokoue [1W-304]

2:00-3:00 (Helen Wood Hall)                  Tutorial 8

    8.  Statistics for Cybersecurity Measurement & Assessment -- Dr Josephine Wolff [2W-216]

 

Tutorial 1 - 10:00-11:45 (Helen Wood Hall [1W-304])

 

Presenter

 

Dr Andrea Podhorsky

Department of Economics, York University

and

visiting the School of Individualized Study at the Rochester Institute of Technology

 

Title

Analyzing Bitcoin: A Theoretical and Empirical Analysis of How the Bitcoin Protocol Impacts the Market for Bitcoin

Abstract

This tutorial establishes the basics of Bitcoin and examines a model for its valuation. I use the model to show that bitcoin's volatile price path and inefficiency are related, and that both are a consequence of the protocol's difficulty adjustment mechanism. I demonstrate how to parse data from the Bitcoin blockchain, and apply the generalized supremum augmented Dickey-Fuller (GSADF) test based on Phillips et al. (2015a, 2015b) to determine whether the boom-and-bust cycles evident in bitcoin price data can be explained by the fundamentals. I find that in conjunction with the model, the GSADF test provides strong evidence that the explosive behavior apparent in bitcoin price data are not bubbles caused by investor behavior but rather are a consequence of the economic functioning the Bitcoin protocol.

 

Tutorial 2 - 10:00-11:45 (Helen Wood Hall [2W-213])

 

Presenter

Dr Gregory Babbitt
School of Life Sciences
Rochester Institute of Technology

 

Title

Machine learning applications for molecular modeling and dynamics and a tutorial with free and open source software

Abstract

I will give a brief introduction to our labs recent work to apply machine learning to simulations of molecular motion or ‘dynamics’.  Bring a laptop for the subsequent hands-on tutorial where I will demonstrate the use of UCSF Chimera, a popular software for molecular modeling of protein structures databased at the Protein Data Bank.  Using well known case examples of protein and disease, I will lead the class through Chimera download and installation, various graphical representations, and electron density and electrostatic charge modeling.  We will then progress to molecular docking simulations of drug target interactions in Chimera in conjunction with signaling pathway explorations using the KEGG relational knowledgebase.  We will finish with set up of some computationally intensive simulations of molecular mechanics/dynamics in implicit and explicit solvent conditions.  Attendees will leave with a new appreciation of how the proteins in their bodies interact with small molecule medications at the scale of single atom resolution and will be empowered with user friendly software tools that will enable them to find out more.   

 

 Tutorial 3 - 10:00-11:45 (Helen Wood Hall [2W-216])

 

Presenter

 

Dr Carol Marchetti

Professor of Statistics

School of Mathematical Sciences

Rochester Institute of Technology

 

Title

Teaching Statistics with Teamwork

 

Abstract

 

I love the theme of this year’s conference – Understanding our Connectedness through Statistics.  In this session, we will flip the phrase and examine “Understanding Statistics through our Connectedness”. In other words, helping students to learn statistics by working together effectively. Research shows that by working together in small groups, students can develop critical thinking skills, exchange knowledge, share expertise, increase motivation and improve their attitudes toward learning. Yet, when asked, many students describe negative experiences with group work and may even declare that they prefer individual work. In this session, we’ll discuss ways to implement student teams in a variety of environments and for a variety of tasks. We’ll address approaches to creating student teams, instructional supports, tools for communication, and the instructor’s role in managing teamwork


 

Tutorial 4 - 12:00-1:45 (Helen Wood Hall [1W-304])

 

Presenter

Donald Harrington
Department of Biostatistics and Computational Biology
University of Rochester 

 

Title 

How to approach a Big Analysis with R. 

 

Abstract

In R, doing a small analysis is straight forward.  But how would you deal with a large analysis requesting hundreds or even thousands of regressions (or t-tests or other techniques)?  I am not talking Big Data.  I am referring to a big, detailed analysis with project restrictions requiring full control over many analysis parameters.  For example, what if you need control over variables, dataset, exclusions, transformation, and method for each model.   With so many restrictions, models, and results, the analysis can become a Big Mess. 

I have made some big messes and, in the process, learned a few tricks in R and leveraged some of my work experience in banking, engineering, software development, and statistics to clean them up.   Right now, my largest analysis has about 2,500 regressions using over 20 datasets, lots of transformations and exclusions and a bunch of other minute details.   Managing and summarizing the results is one of the more difficult issues in dealing with this size project.

In this lecture style tutorial, I will provide a strategy and show a few key R functions that make it possible to do a large analysis without losing your mind and in a relatively short period of time.  

 

 

Tutorial 5  - 12:00-1:45 (Helen Wood Hall [2W-213])

 

Presenter

 

Mr Bohan Liu

Data Scientist
Columbus, Ohio

 

Title

 

Recent Gaussian Process Evolution with Large Spatial-Temporal Data, A Concise Tutorial

 

Abstract

 

In this tutorial we provide an introductory overview of various alternatives of full Gaussian Process (GP) with applications. GP is an indisputable method for spatial-temporal data analysis. The effectiveness of how GP adapt the growing size of data has been studied for decades. Recent research works focused on a few directions such as constructing processes from lower dimensional space (Low-Rand models), sparsity injection (Tapered GP), conditional factorization of joint density (NNGP), weighted sum of compactly supported basis functions (Multiresolution) and even utilizing the parallel computational power with Bayesian approximation (Meta kriging). Our tutorial will mainly concentrate upon the intuition, methodology, advantages and weaknesses of each method with comparisons of their predictive performance and scalability in both simulation and real datasets.

 

 

Tutorial 6  - 12:00-1:00 (Helen Wood Hall [2W-216])

 

Presenters

 

Mr Hussein Talib Al-Rubaye and Mrs. Deema AlShoaibi

 

Title

 

Introduction to Microsoft AI: An interactive tutorial on how to use Microsoft new platform to solve a Software engineering Problem.

 

Abstract

 

In recent years, the growth of open source software repositories have unlocked various research related to mining these repositories to support the development of software and help programmers in better performing their engineering tasks. In this tutorial, I will be introducing a novel software engineering problem that I am going to design a solution to using Microsoft AI. This tutorial will show all the necessary steps to input a dataset, all the wat to training a testing a models. This tutorial is interactive, I will share the project link with the audience so they can contribute to the solution. Together, we will empirically evaluate various learners for this given problem. The outcomes of this tutorial will not only bridge the gap between the software engineering and statistics community, but also it will allow attendants to gain some hands on experience on how to use Microsoft AI.

 

 

 

Tutorial 7 - 2:00-3:45 (Helen Wood Hall [2W-213])

 

Presenter

 

Dr John Whelan,

School of Mathematical Sciences

Rochester Institute of Technology

 

Title

 

Analyzing Public LIGO and Virgo Data from the Gravitational Wave Open Science Center

 

Abstract

 

During their first two observing runs, from 2015 to 2017, the LIGO and Virgo gravitational wave detectors observed gravitational waves from ten binary black hole mergers and one binary neutron star inspirial. In conjunction with the related scientific publications, data from those observations were released by the Gravitational Wave Open Science Center. The GWOSC has now released the data from all of the first two observing runs. In this tutorial, I will show how to use these data and other resources to replicate analyses such as the matched filter used to establish the detections.

 

Type of software needed to participate during the tutorial: Python, Jupyter Notebook (I'd like to send a link to participants to enable them to download data and specialized tools >1 day beforehand)

 

 

 

Tutorial 8 - 2:00-3:00 (Helen Wood Hall [2W-216])

 

Presenter

Dr Josephine Wolff
College of Liberal Arts
Rochester Institute of Technology

Title

Statistics for Cybersecurity Measurement & Assessment

Abstract

This tutorial will go over some of the ways that statistical methods have been employed to develop cybersecurity metrics around the improvements in computer systems delivered by new policies, technical controls, and other safeguards. The session will cover some of the challenges of working with cybersecurity data, the types of research that have been done to date in this field, and future areas for applying statistical methods to cybersecurity. We will cover statistics used for the assessment of cybersecurity policies, including data breach notification policies, as well as those used to measure progress in reducing spam, malware rates, and data breach costs.


 

 Tutorial 9  - 2:00-3:45 (Helen Wood Hall [1W-304])

 

Presenter

Dr Fokoue, Ernest
Data Science Research Group
School of Mathematical Sciences
Rochester Institute of Technology

 

Title

High Performance Computing for Statistical Machine Learning and Data Science: A Discovery of Methods, Techniques and Tools with R

 

Abstract

Anyone who has flirted with modern data science and statistical machine learning has no doubt heard of the crucial importance of scalability on the one hand and real time computing on the other. It turns out that those crucial goals require. In this tutorial, I will use the mighty R platform to provide the audience with a gentle introduction to some of the foundations ideas and concepts of high performance computing with a focus on the crucial mix of deep understanding of modern computer hardware architectures and profound comprehension of mathematical, statistical and algorithmic aspects of computational complexity that help speed up learning machines. I will touch on basic yet powerful tricks like the exploitation of the vectorization of operations for improvement of computing processing speed, but will also cover multicore parallel computation along with MPI computation and general parallel distributed computing.

Keywords: High performance computing, complexity, algorithms, Multicore processing, MPI, Rmpi, Scalability