Your Page Title

Samu Syrjänen

I’m a Data Science Master’s student at the University of Helsinki, and a research assistant at Aalto University with one year of experience. My research assistant role has mainly consisted of Data Science related tasks such as developing ML algorithms, analyzing data, and making automated pipelines for various data processing tasks. My background is mainly in Computer Science, Data Engineering, Machine Learning, and Data Analysis. I have also studied Physics (33 cr) and Mathematics (65 cr), and know a thing or two about sailboats and tanks, and how to lead their crews.

I'm looking for long-term work opportunities to gain experience and develop more specialized skills. Future career interests include working with data architecture, pipelines, analytics, cloud platforms, and machine learning to provide solutions for product development, marketing, and business intelligence problems. Besides the technical roles, I'm also able to work in the more hands-on or business administration positions, where a more tech-heavy background might sometimes be beneficial.

Strengths
- Enjoy cleaning and organizing data, tasks, and resources to facilitate productivity
- Education and work experience revolved around data and software since 2019
- Practical experience with databases, algorithms, and data pipelines
- Degree structure has a strong emphasis on machine learning, mathematics, and statistics
- Practical experience with sailboat engines and electric systems, as well as leading sailboat and tank crews.

Weaknesses
- Only 1 year's worth of paid work experience
- No expert-level talent in any specific niche yet
- A lack of credible business experience
- Prefer to focus on a single project at a time

Currently looking for work and would prefer to start after graduating, which is expected in early 2026. I'm especially interested in international employers and I want to relocate from my current home, Helsinki.

Open CV

Career

Dec 2024 - Current

Research Assistant

Aalto University

As part of Jaan Praks research group at Aalto University and ESA's Hera space mission, I'm responsible for creating a pipeline to clean, calibrate, and analyze hyperspectral image data, and turn it into final data products. The data is received from the ASPECT Hyperspectral Imager on the Hera/Milani space probe. The mission's objective is to gather data about the Didymos binary asteroid system, which was previously in 2022 shot with a DART spacecraft in hopes to succesfully redirect the Didymos's moon, Dimorphos, and to study it's effects. The larger aim is to research this kind of asteroid redirection as a means for planetary defence.

My work entails plenty of data analysis and cross-national coordination between teams working on this project. After the calibration, my pipeline analyzes the hyperspectral images with a convolutional neural network, which will allow us to gain insights into the mineral composition of the target asteroids Didymos and Dimorphos binary asteroid system. The data will be further analyzed with various methods, and documented in scientific literature. The images and derived information will additionally be used to make a reconstructed 3D model containing all related information.

Related to this position, I attended a PDS4 workshop held in ESAC, Madrid, that gave a deep dive into the Planetary Data System (PDS4) archiving format. PDS4 is the latest standard used in NASA's and ESA's scientific data archives.

Sep 2023 - Current

Master's Degree in Data Science

University of Helsinki

Transcript of Records

May 2024 - Aug 2024

Research Assistant

University of Helsinki

As part of this space weathering research project, I created a Convolutional Neural Network enhanced Gaussian Process algorithm for estimating asteroid surface age. The algorithm uses asteroid hyperspectral reflectance spectra to give an age estimate. It stands out as a surprisingly flexible algorithm to predict outcomes even with a sparse training set, as in our case. (More details below)

Job Certificate

Sep 2019 - Dec 2023

Bachelor's Degree in Computer Science

University of Helsinki

Products

May 2024 - Aug 2024

I was hired as a Research Assistant to develop a Gaussian Process (GP) ML model to estimate the age of S/Sq/Q-type asteroid surfaces. The GP model is trained with hyperspectral reflectance data that is acquired from various laboratory tests. This algorithm has potential to be used to analyze hyperspectral asteroid images from future space missions.

GP model was selected for this task as it has great potential even with a relatively sparse training dataset like ours. I conducted a comprehensive study in order to select the most optimal model structure, which eventually consisted of a Convolutional Neural Network (CNN) based Feature Extractor, and Hadamard Multitask Regression which are built on top of the base GP model. The GP model uses Constant Mean and Matern Kernel as its base components.

The Hadamard Multitask functionality was useful in our case because it can be used to train a separate initial model for each of the different asteroid types, and to combine those models into one single algorithm, which is optimized to benefit from the possible similarities between the asteroid types.

I'm proud of the results this model was able to achieve. The results were comparable with an ensemble model that uses CNN, Gradient-Boosting regression, K-Nearest neighbor, Extra-Tree Regression, and Random Forest algorithms. More information can be seen in the future once our report paper is published. In the mean time, there is a link to my Github repository below.

I learned:

Mathematical Understanding of the Underlying GP and CNN Models
PyTorch/GPyTorch
Data Cleaning Pipeline
ML Optimization/Testing

Project Github Job Certificate

Feb 2025 - Current

This work is still ongoing.

The thesis aims to explore some different ways to build data streaming pipelines in the cloud. I develop an end-to-end streaming pipeline that uses real-time stock market data and derives analytics from it. The data is ingested with Kafka, processed with Spark/Databricks, and visualized with PowerBI. All services are built on scalable cloud compute that is able to handle massive volumes of data.

I have learned so far:

AWS
Databricks
Spark
ETL/ELT Pipelines
Data Cleaning/Validation
Medallion Architecture

Jan 2024 - May 2024

In this project our team developed a building façade recognition algorithm based on semantic segmentation. Multiple different methodologies needed to be implemented and combined into one fast algorithm to make it robust against visual variations caused by weather, time of day, seasons, and foliage. Additionally, the algorithm had to be as efficient as possible to be able to run it seamlessly on a mobile device.

The structure of the algorithm consists of the following filtering methodologies: number of stories, feature counting, wall/door color, text matching, roof shape, and wall texture. I specialized to the texture analysis and developed a Local Binary Pattern algorithm that filters images based on texture similarity. The results were deemed satisfactory by the customer. More detais can be seen in the project report.

I learned:

Strong Problem Solving
Understanding Customer Needs
OpenCV Computer Vision
Local Binary Pattern (LBP) Algorithm
Scrum

Project Report Course Website

Jan 2023 - May 2023

The app tracks users' mobile device and prevents them from getting lost in Finnish forests while picking berries. Users can see their own and friends' routes. If internet connection is lost, you and your friends can see each others' latest logged location on the map.

Our team used scrum and agile software development methodologies to facilitate the development of the product. It was essential to understand the customer's needs well enough to be able to adapt and find the best possible solutions for each problem without the need to continuously bother the customer with overly technical details and questions.

Our team resumed the work of a previous development team. The full stack development consisted of Python for backend, and TypeScript for frontend. The app is built using Reach Native and Expo, and we were able set it up for every popular mobile operating system. I'm proud of our solutions for many problems regarding the compatibility of dependencies and other technologies used, such as React Native, Expo, map tiles, and caching.

I learned:

Deploying the app to the university server
A working test version on a real phone (in addition to emulators)
Code refactoring for quality
A script that enables a fast setup for a new developer to begin the development in a new environment
Resolving Dependency Conflicts
All user data is encrypted and safe in the database
In-app functionality to add, track, and discard multiple friends at the same time (+ settings and UI for it)
The map source can be changed
Ability to switch language
UI overhaul
Various quality of life features for users and developers

Project Website Course Website

Mar 2022 - Apr 2022

A bare-bones forum website similar to Reddit, where users can discuss topics. Users can create accounts and post, like, and comment threads and other comments. Users can also send private messages to each other and search other users.

I learned:

Frontend & Backend software architectures and development
Python
SQL
Web Development
Testing
Databases
Software Architecture

Project Github Course Website

Oct 2023 - Dec 2023

In this project, our task was to predict the saturation vapor pressure of different molecules based on their chemical and physical features. To achieve this, we experimented with multiple different machine learning models and different techniques to find and train the best possible model for the task. The project report contains a demonstration of exploratory data analysis, principal component analysis, different ML models, hyperparameter optimization, feature selection, and our conclusions to all of the above.

The course staff organized a Kaggle competition regarding the project (link below), where each group posted their results. Our group performed averagely. Our group's R-squared score was 0.6729 while the winning group had a score of 0.7275.

I learned:

Exploratory Data Analysis (EDA)
Feature Selection
Hyperparameter Optimization
Data Cleaning Pipeline
Linear Regression
Decision Trees
Random Forests
Gradient Boosting

Project Report Kaggle Competition Course Website

May 2023 - Jun 2023

This app clusters similar texts into groups. It is self-made from the ground up, using no premade algorithms from external libraries. The app uses a dataset of old BBC news articles and takes additional texts as input from the user. It sorts those articles into groups based on text similarity using TF-IDF matrix and K-means clustering algorithm.

I learned:

K-Means Unsupervised Machine Learning Algorithm (With TF-IDF Matrices)
Data Cleaning Pipeline
Testing
Software Architecture

Project Github Course Website

Skills

Python
Scrum
Github
Data Cleaning and Processing
Databricks
[ML] Gaussian Process
SQL
GPyTorch
ML training/Hyperparameter Optimization
[ML] Local Binary Pattern
Agile Development Methodologies
[ML] K-Means Clustering
HTML & CSS
Medallion Architecture
ETL/ELT Pipelines
Spark
[ML] Convolutional Neural Networks (CNN)
PyTorch
Databases
Software Architectures
Automatic Testing
Data Encryption
Exploratory Data Analysis (EDA)
AWS
[ML] Feature Selection Methods
[ML] Linear Regression
[ML] Decision Trees
[ML] Random Forests
[ML] Gradient Boosting
OpenCV (Computer Vision)
JavaScript/TypeScript

Languages

English CEFR C1
Finnish Native
Japanese Beginner

Contact Me

samu.syrjanen@gmail.comCopy