Hello, world!
Hey there, my name is Nhu Ngoc, or simply Ngoc 👋 I'm a graduate student interested in machine learning, artificial intelligence,
and developing data-intensive applications.
Education
Erasmus Mundus Joint Master in Big Data Management and Analytics
2024 - 2026
I'm currently a master student pursing the Erasmus Mundus Joint Master Degree in Big Data Management
and Analytics. I spent the first semester at Université Libre de Bruxelles (Belgium), and am currently spending
my second semester at
Universitat Politècnica de Catalunya (Spain). I will be spending my second year at
Università degli Studi di Padova (Italy)
to pursue the specialization in Statistics and Deep Learning for Data Analytics.
Coursework at Université Libre de Bruxelles (Fall 2024)
- Advanced Databases
- Database Systems Architecture
- Data Warehouses
- Management of Data Science and Business Workflows
- Data Mining
- Language: French
Coursework at Universitat Politècnica de Catalunya (Spring 2025)
- Big Data Management
- Semantic Data Management
- Machine Learning
- Viability of Business Projects
- Big Data Seminar
- Debates on Ethics of Big Data
- Language: Spanish
Bachelor of Science in Computer Science
2019 - 2023
In 2023, I graduated from New York University Abu Dhabi with a major in Computer Science and two minors
in Applied Mathematics and Interactive Media. My undergraduate coursework and research experiences
explored a wide range of topics in computer science but focused largely on machine learning and big data.
Coursework for major in Computer Science
NYU Abu Dhabi (United Arab Emirates)
- Introduction to Computer Science
- Discrete Mathematics
- Data Structures
- Algorithms
- Computer Systems Organization
- Software Engineering
- Operating Systems
- Computational Social Science
NYU (United States)
- Artificial Intelligence
- Processing Big Data for Analytics Applications
- Natural Language Processing
- Computer Networks
NYU Paris (France, taken remotely)
- Introduction to Machine Learning
Coursework for minor in Applied Mathematics
NYU Abu Dhabi (United Arab Emirates)
- Calculus with Applications to Science and Engineering
- Multivariable Calculus with Applications to Science and Engineering
- Linear Algebra
- Probability and Statistics
Coursework for minor in Interactive Media
NYU Abu Dhabi (United Arab Emirates)
- Introduction to Interactive Media
- Communications Lab
- Politics of Code
- Art Intel
NYU London (United Kingdom)
- Immersive Storytelling and the Art of Making the Virtual a Reality
Academic research
NYU Abu Dhabi's Center for Quantum and Topological Systems
2023
Upon graduating from NYU Abu Dhabi in 2023, I carried out a two-month research assistantship at the
Center for Quantum and Topological Systems, supervised by
Prof. Hisham Sati, studying the use of the
dependently typed programming language Agda in formalizing braid groups as automorphisms of the free
groups.
Capstone project in Computer Science
2022 - 2023
My Capstone project, submitted for fulfillment of my Bachelor’s degree at NYU Abu Dhabi, was conducted under
the mentorship of
Prof. Talal Rahwan and
Dr. Sid Ahmed Benabderrahmane. In this project, I assessed the
use of AutoEncoders as an anomaly detection method by reconstructing system-level provenance data from
DARPA's Transparent Computing program to identify anomalous patterns in system activities indicative of
advanced persistent threats. The research on my Capstone has since been
published.
NYU Tandon's Behavioral Urban Informatics, Logistics, and Transport Laboratory
2022
NYU Abu Dhabi's Center for Global Sea Level Change
2021
My time at the
Center for Global Sea Level Change was my first research assistantship conducted in 2021 under the supervision of
Prof. David Holland and Dr. Daiane Gracieli Faller. I worked on
developing a probabilistic model for predicting long-term changes of the Abu Dhabi shoreline given different
geomorphological settings under a set of sea level rise scenarios.
Industry experiences
In early 2023, I worked at Wio Bank
(Abu Dhabi, UAE) as an Application Security Intern.
From October 2023 to August 2024, I worked in R&D at LG Electronics Development Vietnam as a Software Engineer.
My work focused on developing over-the-air firmware update solutions for in-vehicle infotainment (IVI) systems.
Honors & scholarships
Erasmus Mundus Joint Master - Partner Country Scholarship
2024
I was awared a full-ride scholarship by the European Commission to pursue the Erasmus Mundus Joint Master in
Big Data Management and Analytics (BDMA). The scholarship is worth a total of €49,000, covering tuition fees,
living expenses, and transportation costs for the two-year program.
New York University Founder's Day Award
2023
This is an award for top-ranking graduating students at New York University.
Miscellaneous works: publications & hackathon
-
Hack Me If You Can: Aggregating AutoEncoders for Countering Persistent Access Threats Within Highly
Imbalanced Data
2024
Journal paper
Read abstract
Advanced Persistent Threats (APTs) are sophisticated, targeted cyberattacks designed to gain
unauthorized access to systems and remain undetected for extended periods. To evade
detection, APT cyberattacks deceive defense layers with breaches and exploits, thereby complicating
exposure by traditional anomaly detection-based security methods. The challenge of detecting APTs
with machine learning is compounded by the rarity of relevant datasets and the significant
imbalance in the data, which makes the detection process highly burdensome. We present AE-APT, a deep
learning-based tool for APT detection that features a family of AutoEncoder methods ranging
from a basic one to a Transformer-based one. We evaluated our tool on a suite of provenance trace
databases produced by the DARPA Transparent Computing program, where APT-like attacks constitute as
little as 0.004% of the data. The datasets span multiple operating systems, including Android, Linux,
BSD, and Windows, and cover two attack scenarios. The outcomes showed that AE-APT has significantly
higher detection rates compared to its competitors, indicating superior performance in detecting
and ranking anomalies.
[published paper]
-
NY Statewide Behavioral Equity Impact Decision Support Tool with Replica
2023
Technical report
Technical report I contributed to during my research assistanship at NYU Tandon's BUILT Lab.
Read abstract
A NY statewide model choice model is developed to deterministically fit heterogeneous
coefficients for trips along each census block-group OD pair conducted by each population segment within
a random-utility-consistent framework. The proposed approach is to use inverse optimization
(IO) to derive coefficients for each OD pair times population segment as an agent. This is only
possible with ubiquitous population data. We call this a group-level agent-based mixed logit (g-AMXL)
model, which is an extension of the AMXL model proposed by Ren and Chow (2022). The significance of
g-AMXL is as follows. First, g-AMXL takes OD level (instead of individual level) trip data as
inputs, which is efficient in dealing with ubiquitous datasets containing millions of observations.
Second, preference heterogeneities are based on non-parametric aggregation of coefficients per agent instead of
having to assume a distributional fit. Third, since each agent’s representative utility function is
fully specified, g-AMXL can be directly integrated into system design optimization models as constraints
instead of dealing with simulation-based approaches required by mixed logit (MXL) models.
[published report]
-
A Vietnamese Named Entity Recognition System for COVID-19 Articles
2022
Conference paper
Research originally conducted for the class CSCI 480 Natural Language Processing
at NYU Courant, Fall 2021. Paper later accepted at the 2022 IEEE MIT Undergraduate
Research Technology Conference, Massachussets, United States. Due to visa issues,
I could not present my paper in-person and instead gave the presentation via Zoom.
DOI: 10.1109/URTC56832.2022.10002170.
Read abstract
This paper presents a named entity recognition system for the specific domain of Vietnamese
COVID-19 news articles. By incorporating manually selected and domain-specific features into
a simple deep learning architecture, the system can identify a wide range of custom named
entities relevant in the context of COVID-19 and future epidemics. Using high-dimensional
embedding vectors in combination with part-of-speech tags and additional features, the
system achieves an F score of about 90.41%, surpassing or coming close to results by other
models that are more complicated or pre-trained and fine-tuned.
[published paper]
[pre-recorded presentation]
-
qSa'id (كيو-ساعِد) - Quantum-ML-Assisted Diagnostic Treatment Access Platform for Autism
2022
Hackathon
A two-fold project including a hybrid quantum - classical machine learning screening tool and an
approach to the quadratic unconstrained binary optimization (QUBO) problem formulated as optimizing
placements of specialized treatment centers in a healthcare system. Created for NYU Abu Dhabi
Hackathon for Social Good in the Arab World 2022 - Focusing on Quantum Computing.
[repo]
[press coverage]
An assortment of projects
Big data
ElevatEd - Integrated Student Success Monitoring Platform
A big data pipeline presenting historical analytics and an academic counselor recommendation system
for student success monitoring and intervention at higher education institutions. Integrated project
for the classes Big Data Management, Semantic Data Management, and Viability of Business Projects at
Universitat Politècnica de Catalunya, Spring 2025.
Tools/languages: Delta Lake, DuckDB, MongoDB, Neo4J, Pinecone, Apache Spark, Apache Airflow
[repo]
New York City Vehicle Collisions: A Study of Brooklyn
An exploratory data analytics project that examines the relationship between weather conditions and
vehicle collisions in Brooklyn. Conducted for the class CSCI 476 Processing Big Data for
Analytics Applications at NYU Courant, Fall 2021.
Tools/languages: Java, Scala, Spark SQL, Spark MLlib
[blog post] [repo]
Machine learning
Federated Graph Learning
A state-of-the-art survey on the topic of federated learning with graph-structured data.
Created for the Thirteenth European Big Data Management & Analytics Summer School
(
eBISS 2025)
from June 30 - July 4, 2025 in Brussels, Belgium.
[poster]
A Vietnamese Named Entity Recognition System for COVID-19 Articles
Assessed the use of a Long Short-Term Memory model in named entity recognition system for the
specific domain of Vietnamese COVID-19 news articles. Final project for the class CSCI 480
Natural Language Processing at NYU Courant, Fall 2021
Tools/languages: TensorFlow
[paper] [presentation]
Diffusing Bohemian Rhapsody
A visual project exploring the textual latent space of Stable Diffusion and its application in
video interpolation. The result is a continuously morphing video for the song Bohemian Rhapsody by
Queen. Created for the class IM 3312 Art Intel at NYU Abu Dhabi, Spring 2023.
[documentation]
[video]
The Neural Mirror
An visual project exploring the image latent space of Stable Diffusion. The result is an
interactive, generative artwork showing a continuous stream of images generated by Stable Diffusion
using webcam capture as the guidance input. Final project for the class IM 3312 Art Intel at
NYU Abu Dhabi, Spring 2023.
[documentation]
[exhibition]
[presentation]
X-Ray Images Classification
Assessed the use of convolutional neural networks and dimensionality reduction methods in classifying
COVID-19 radiography images. Final project for the class CSCI-UA 9473 Introduction to Machine Learning at
NYU Paris (taken remotely), Spring 2021.
Tools/languages: TensorFlow, scikit-learn
[presentation]
Databases
DNA Sequence Database Extension
A PostgreSQL extension for storing and analyzing DNA sequences focusing on k-mer analysis.
Final project for the class INFO-H417 Database Systems Architecture at
Université Libre de Bruxelles, Fall 2024.
Tools/languages: SQL, C
[repo]
iOS development
MyMovies
An iOS app for searching and keeping movies in collections. Final project for CodePath's iOS
Development course.
Tools/languages: Swift
[repo]
Uptify
An iOS app for displaying statistics of user's Spotify usage. Personal project.
Language: Swift
[repo]
Parstagram
A simple Instagram clone with a custom Parse backend that allows a user to post photos, view a
global photos feed, and add comments. Created for CodePath's iOS Development course.
Tools/languages: Swift
[repo]
Tweeter
A basic iOS Twitter clone app to view, compose, favorite, and retweet tweets. Created for
CodePath's iOS Development course.
Tools/languages: Swift
[repo]
Web development
Upbase
A web app for inventory tracking. Created for Shopify Backend Developer Intern Challenge 2022.
Tools/languages: NodeJS, MongoDB
[repo]
Spacey
A webpage showing images pulled randomly from NASA's Astronomy Picture of the Day API. Created for
Shopify Frontend Developer Intern Challenge 2022.
Tools/languages: React
[repo]
Computational social science
Hacking the Box Office
A project exploring whether the demographic data of the cast, such as ethnicity, gender,
age, and star power, has an effect on the global box office of movies. Final project for the class
CS-UH 2219E Computational Social Science at NYU Abu Dhabi, Fall 2022.
Tools/languages: pandas, NumPy, Matplotlib, seaborn
[report]
[supplemental materials]
Digital humanities
Mapping and filling the gap: a study of Zanzibar
A digital humanities project examining the use of computational methods in extracting insights from
historical records. Assignment for the class CDAD 1033EQ Data and
Human Space at NYU Abu Dhabi, Fall 2022.
[storymap]
A quantitative look into global linguistic landscapes
A digital humanities project exploring linguistic landscapes using crowdsourced images. Assignment
for the class CDAD 1033EQ Data and Human Space at NYU Abu Dhabi, Fall 2022.
[storymap]
Personal
I come from Quảng Trị Province in Central Vietnam. I also spent three years in Hồ Chí Minh City during which
I attended Lê Hồng Phong High School for the Gifted. Since undergraduate,
I have lived, studied, and conducted research in many cities around the world, including Abu Dhabi, New York,
Brussels, and Barcelona.
I dabble in the arts and generally anything and everything that fascinates me. I enjoy playing the flute,
taking amateur photographs,
creating digital arts, editing videos, and learning languages.
Unless you speak fluent Vietnamese, it's highly likely that you won't be able to pronounce my name correctly.
I'm giving you the blanket approval to pronounce it /knock/ :)