Master in City & Technology 2021/22 – Term III
Seminar Name: Digital Tools & Big Data III – Collecting, processing and sharing big data across the web
Total Hours: 20 hours
Faculty: Diego Pajarito

US Army map of ocean currents using stream graphs. CC BY 4.0 Public Domain


Course Abstract

Once data is available, researchers, urban designers and other stakeholders take advantage of digital technologies. Traditional, web and non-traditional sources feed the existing computing processing flows and constantly enhance methods and applications for advanced architecture. Data science and big data are two digital applications for the analysis of massive data sources. These methods and applications serve to define workflows and arrangement of basic programming pieces to deal with tasks such as data discovery and cleaning, descriptive statistics, visualisation or other data management tasks. The challenge is, therefore, to organise these tasks to deliver understandable outcomes. Since there is no one-fits-all strategy, data science is built on top of exploration and tests across big data tools.

The goal of this course is to provide students with experience handling common tasks of big data, data science or data analytics. The course offers an environment in which students can experience and develop activities commonly happening in urban analytics. From data collection, ingestion, analysis and visualization, the students will experience the workflow while getting their hands on processing data available on the web and visualising the existing flows in both-geospatial and non-spatial contexts.

The course has seven sessions in which students directly interact with large data sets in practical sessions to develop the technical skills highly demanded in big data projects. The sessions start by discovering big data sources, performing descriptive analytics and plotting different data sets to identify trends and correlation. The course moves towards spatial and temporal dimensions of big data sets and the way to graphically represent features from these multiple dimensions. The last part of the course deals with data management tasks such as splitting, aggregating, merging and summarising datasets to improve analysis and visualization.

The third-term course aims to generate an environment for students to get trained and strengthen their data analysis skills. Through hands-on sessions, students will gain experience in extracting data from existing web APIs (i.e., application programming interfaces), non-traditional data sources (e.g., semi-structured, linked data and openstreetmap sources). Additionally, the students will use these resources to perform basic processing tasks such as data aggregation and network analysis (e.g., shortest path estimation, routing, service area). Finally, the course will offer a set of resources to map and visualise flows and share these and other visual resources in the web. Students will prepare an academic short-paper to describe goals, methods and initial outcomes. The students will also deliver a documented repository with data, source code and visual outcomes generated for creating the poster.

Faculty



Diego Pajarito got his PhD in Geoinformatics as part of a Marie Curie ITN Action – Joint doctorate between the Universities of Münster, Universitat Jaume I and Universidade Nova de Lisboa (2018), and the MSc in Information and Communication Sciences from Universidad Distrital de Bogotá (2014). He has performed research for the spatio-temporal analysis of sustainable transport and other urban systems as well as for data collection techniques through mobile devices and crowdsourced data collection. Diego’s interests are the simplification of data collection and analysis for non-expert audiences when it comes to considering it within a geographical context. He has been a lecturer of courses on spatial analysis, big data and spatial databases in Colombia, Universidad Distrital (2010-2015) and Universidad Autonoma de Bucaramanga (2014). He has also been a consultant for geospatial analysis and high-performance computing for different agencies in Colombia such as the Ministry of Agriculture (2010-2015), Institute of Environmental, meteorological affairs (2010, 2012, 2014), Ministry of Justice (2014), Geographical Institute (2007-2010), among others. Daniel is the “Data Scientist” of IAAC’s Advanced Architecture Group and seminar faculty of the MaCT.