# Pastebin SWm3q5z8 Hey @mayhem, based on my study for the past few weeks, I have narrowed down to two model classes that can be used for the task: 1. CNNs - a. TempoCNN paper b. Deep Rythm paper 2. Transformers - a. Finetuning b. Training from scratch For each of these I have explored some codebases and written some code as well Literature Review done by me on latest techniques: https://docs.google.com/spreadsheets/d/125o9eUXl5Lbrg5fl3kUzMb001X7lVeNjCd0f3Aw0CRY/edit?usp=sharing Pipeline for Finetuning an audio classifier model: https://colab.research.google.com/drive/1FNNFQjZU8I7SKmOKcjY_PxQLB2wHzdep?usp=sharing I believe I can implement the two CNN models from papers in 175h. For transformers, we can extend to 350 , though I am still learning more about implementing from scratch, but can definitely try to finetune pre-existing models. What do you suggest for the proposal? Also I would need some more clarity on the integration part and how to integrate, building an API, etc.