Master in City & Technology – Term 3
Seminar Name: Big Data Strategies
Total Hours: 20 hours
Faculty: Diego Pajarito

MaCT Big Data Strategies // Diego Pajarito

Syllabus

Data, Big Data, and Data Science are buzzwords within any academic or professional community nowadays (These terms also show a high interest within web search engines). However, the number of researchers or companies able to perform big data analysis seems to be a small fraction despite the aggressive marketing strategy from technology suppliers. The seminar on “Big data strategies” aims at presenting the path for designing analysis strategies in this still-evolving and highly expanding discipline by offering hands-on sessions for creating new data sets for urban analytics.
To which extent does a project belong to a big data strategy? When could a study be labeled as big data? How can data science help urban design?. There are no straightforward answers to these questions not only due to the confusing definition of big data based on v-words (i.e. Volume, Velocity, Variety, Variability, Veracity, Visualization and Value) (Khan et al., 2018) but also due to the blurry connections between big data, data science and urban analytics. The relationship between urban design and data science inspires this seminar to embrace a perspective from the history of big data from the origins of a written language to the hyper-connected society. The seminar will examine the evolution of data analysis or, in other words, the way for people, scientists, and practitioners to use digital technology, measurements, and observation to support for decision making nowadays.
Currently, data is often seen as a commodity, a strategic resource or even as a religion to embrace despite any technical or conceptual limitation. However, running big data projects strongly depends on a set of technical details. Most of the implementations start from a checklist of hardware needed for data storage and analysis of files sized in Giga, Tera or Petabytes. Non-technical teams get easily confused by trendy terms such as cloud computing or by product names such as Watson, Azure or Amazon. Team plans can radically change when tech jargon such as sequential, parallel computing, CPUs, GPUs or distributed systems are on the table. Finally, some teams end up relying on modern computers to handle processing algorithms and big data frameworks and only a few tens of gigabits. On top of the hardware discussion, teams need to set up tasks and efficient workflow. Here decisions are mostly a selection of different formulas, statistical methods or other mathematical issues. Nonetheless, it is still a conceptual discussion the final decision reflects only black boxes or plug-and-play technologies which, indeed, claim to solve almost any problem with artificial intelligence or machine learning. Here the need for a critical and rational analysis proposed for the seminar on avoiding any blind selection.
From the stakeholder’s perspective, the outcome of data science (probably the most famous) is data visualization which simplifies in a single view insight, trends, distribution and patterns hidden within the data. Unfortunately, the multiple tools and methods available remain behind a series of well-known and standardized plots, maps and graphs. Urban analysts need to further explore and experiment with more visual resources to spread out alternatives to create small data sets that match space and time context better.
In a nutshell, the main goal of data science and, therefore, this seminar, is to get into practical tools able to turn big data into small data (i.e., data sets that a human being can manipulate or “smart” graphs that summarize ideas or messages). After the seminar, students will improve their big data “skills” by facing real-world scenarios, various data sets and multiple analysis tools to learn-by-doing, explore and put in practice strategies for tackling current data science challenges.
The skills developed during the seminar can stimulate participants to get into diverse fields of research not only within the domain of urban design or urban analytics but also within the fields of data journalism, quantitative analysis, and visualization. Also, participants can explore further alternative data sources such as user-generated or open linked data to complement their current thoughts.

Faculty

Diego Pajarito got his PhD in Geoinformatics as part of a Marie Curie ITN Action – Joint doctorate between the Universities of Münster, Universitat Jaume I and Universidade Nova de Lisboa (2018), and the MSc in Information and Communication Sciences from Universidad Distrital de Bogotá (2014). He has performed research for the spatiotemporal analysis of sustainable transport and other urban systems as well as for data collection techniques through mobile devices and crowdsourced data collection. Diego’s interests are the simplification of data collection and analysis for non-expert audiences when it comes to considering it within a geographical context. He has been a lecturer of courses on spatial analysis, big data and spatial databases in Colombia, Universidad Distrital (2010-2015) and Universidad Autonoma de Bucaramanga (2014). He has also been a consultant for geospatial analysis and high-performance computing for different agencies in Colombia such as the Ministry of Agriculture (2010-2015), Institute of Environmental, meteorological affairs (2010, 2012, 2014), Ministry of Justice (2014), Geographical Institute (2007-2010), among others. Diego is the “Data Scientist” of IAAC’s Advanced Architecture Group and seminar faculty of the MaCT.

Links
Twitter: @diegopajarito
Linkedin: Diego Pajarito G.
Research Gate: Diego Pajarito

Course Structure

Monday 3 Jun 2019

  • 12.00 – 12.20: Introduction and Methodology
  • 12.20 – 12.20: Daily presentation “From data to Big Data”
  • 12.40 – 15.45: Challenge 1: Describing large sets of data
  • 15.45 – 17.30: Challenge 2
  • 17.30 – 18.00: Daily pitch

Wed 5 Jun 2019

  • 12.00 – 12.20: Stand-up meeting
  • 12.20 – 13.00: Daily presentation Algorithms, computers, and processors
  • 13.00 – 15.45: Challenge 1
  • 15.45 – 17.30: Challenge 2
  • 17.30 – 18.00: Daily pitch

Frid 7 Jun 2019

  • 12.00 – 12.20: Stand-up meeting
  • 12.20 – 13.00: Daily presentation Data visualization
  • 13.00 – 15.45: Challenge 1
  • 15.45 – 17.30: Challenge 2
  • 17.30 – 18.00: Daily pitch

Tues 11 Jun 2019

  • 15.00 – 15.20: Stand-up meeting
  • 15.20 – 17.30: Scenario-building for a city
  • 17.30 – 18.00: Daily pitch

Wed 12 Jun 2019
15.30 – 17.30: Final Presentations

Hardware / Software requirements
The seminar will use Robot Ignite Academy from TheConstruct.
Pycharm or additional IDE platforms
Numeric and geospatial analysis libraries for python
(Pandas, Geopandas, Numpy, Tensor Flow)
Qgis and other

Exercise

When it comes to urban analytics, stakeholders will face different definitions, a wider variety of tools and technologies when analyzing the issues of cities. Such people need to properly understand the steps of data preparation and the iterative generation of analysis outcomes. Therefore, the studio will offer students a realistic scenario to practically perform this general procedure. Real world examples and a constant reflection on the implications of adopting quantitative approaches to the needs of advanced architecture will feed the sessions structured under the premises of agile processes widely adopted in modern software development teams.

  • Unstructured data sources / Natural language processing
  • Map Reduce / GPU processing / Parallel processing
  • Semantic Web (Ontologies RDF)
  • Machine learning and Artificial Intelligence
  • Data visualization
  • Ethics and privacy implication (From the algorithmic approach to the need for protecting people’s privacy)

Open and Big Data who are the main players, companies, governments, and citizens.SDI and World bank strategy (Why do we need to create open data)

Deliverable

  • A daily pitch presentation summarizing everyday results
  • A strategy for adopting big data technology/ies as a tool for advanced architecture
  • Data source description (Size, temporal and spatial resolution)
  • Hardware requirements (Architecture, processing capacity)
  • Analysis tool (Technology, language, operational costs)
  • Expected analysis outcomes (Format, frequency, final user)

Students are requested to submit all the material on the IAAC Google drive folder and a Blog post need to be curated on iaacblog.net for each project within a maximum of 1 week after the end of the Seminar.

Grading System
• 0 – 4.9 Fail (submission of a supplementary work by May)
• 5.0 -6.9 Pass
• 7.0 – 8.9 Good
• 9.0 – 10 Excellent/Distinction.

On this basis, students will be evaluated on several aspects such as:
Group daily pitch 40 %
Final presentation 40 %
Blog posts 20 %

References / Bibliography > optional

  1. http://wiki.ros.org/ROS/Tutorials

  2. http://wiki.ros.org/Books

  3. Introduction to Autonomous Mobile Robots. By Roland Siegwart, Illah Reza Nourbakhsh, and Davide Scaramuzza
  4. Springer Handbook of Robotics edited by Bruno Siciliano and Oussama Khatib.
  5. https://dl.acm.org/citation.cfm?id=3206166

  6. Full studies
  7. Centre for Interdisciplinary Methodologies

    https://warwick.ac.uk/fac/cross_fac/cim/apply-to-study/masters-programmes/urban-analytics/

  8. NYU

    https://wagner.nyu.edu/education/degrees/master-urban-planning/urban-analytics#