DEV Community

Mike Young profile picture

Mike Young

Building indie hacker stuff in my free time, focusing on AI. Launching https://aimodels.fyi - find the right AI model for your project!

Location Washington, DC Joined Joined on  Personal website https://aimodels.fyi twitter website

Education

Purdue

Work

Indie hacking stuff!

Twenty Constructionist Things to Do with Artificial Intelligence and Machine Learning

Twenty Constructionist Things to Do with Artificial Intelligence and Machine Learning

Comments
4 min read

Want to connect with Mike Young?

Create an account to connect with Mike Young. You can also sign in below to proceed if you already have an account.

Already have an account? Sign in
A decoder-only foundation model for time-series forecasting

A decoder-only foundation model for time-series forecasting

Comments
4 min read
LLM Agents can Autonomously Exploit One-day Vulnerabilities

LLM Agents can Autonomously Exploit One-day Vulnerabilities

Comments
4 min read
Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models

Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models

1
Comments
4 min read
A Closer Look at AUROC and AUPRC under Class Imbalance

A Closer Look at AUROC and AUPRC under Class Imbalance

Comments
4 min read
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Comments
4 min read
What are human values, and how do we align AI to them?

What are human values, and how do we align AI to them?

Comments
4 min read
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text

The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text

Comments
4 min read
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Comments
4 min read
Confidential Federated Computations

Confidential Federated Computations

Comments
4 min read
Born With a Silver Spoon? Investigating Socioeconomic Bias in Large Language Models

Born With a Silver Spoon? Investigating Socioeconomic Bias in Large Language Models

2
Comments
4 min read
Long-form music generation with latent diffusion

Long-form music generation with latent diffusion

1
Comments
4 min read
Chinchilla Scaling: A replication attempt

Chinchilla Scaling: A replication attempt

Comments
3 min read
AutoCodeRover: Autonomous Program Improvement

AutoCodeRover: Autonomous Program Improvement

1
Comments
3 min read
The Illusion of State in State-Space Models

The Illusion of State in State-Space Models

Comments
4 min read
Zero-shot Building Age Classification from Facade Image Using GPT-4

Zero-shot Building Age Classification from Facade Image Using GPT-4

Comments
4 min read
H2O-Danube-1.8B Technical Report

H2O-Danube-1.8B Technical Report

Comments
4 min read
Dataset Reset Policy Optimization for RLHF

Dataset Reset Policy Optimization for RLHF

Comments
4 min read
Manipulating Large Language Models to Increase Product Visibility

Manipulating Large Language Models to Increase Product Visibility

Comments
3 min read
Recommender Systems in the Era of Large Language Models (LLMs)

Recommender Systems in the Era of Large Language Models (LLMs)

Comments
4 min read
Large Language Models as Optimizers

Large Language Models as Optimizers

1
Comments
4 min read
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

Comments
4 min read
CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

Comments
3 min read
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

1
Comments
4 min read
SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

Comments
4 min read
BooookScore: A systematic exploration of book-length summarization in the era of LLMs

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

Comments
4 min read
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Comments
4 min read
Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying

Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying

Comments
4 min read
The Curse of Recursion: Training on Generated Data Makes Models Forget

The Curse of Recursion: Training on Generated Data Makes Models Forget

Comments
4 min read
TransformerFAM: Feedback attention is working memory

TransformerFAM: Feedback attention is working memory

Comments
4 min read
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models

ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models

Comments
4 min read
Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists

Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists

6
Comments
4 min read
GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications

GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications

6
Comments
4 min read
Show Your Work with Confidence: Confidence Bands for Tuning Curves

Show Your Work with Confidence: Confidence Bands for Tuning Curves

6
Comments
4 min read
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

5
Comments
4 min read
Rho-1: Not All Tokens Are What You Need

Rho-1: Not All Tokens Are What You Need

5
Comments
4 min read
Vision Transformers Need Registers

Vision Transformers Need Registers

5
Comments
4 min read
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

5
Comments
3 min read
The Expressive Power of Transformers with Chain of Thought

The Expressive Power of Transformers with Chain of Thought

5
Comments
4 min read
CodecLM: Aligning Language Models with Tailored Synthetic Data

CodecLM: Aligning Language Models with Tailored Synthetic Data

6
Comments
4 min read
Generalization in diffusion models arises from geometry-adaptive harmonic representations

Generalization in diffusion models arises from geometry-adaptive harmonic representations

5
Comments
4 min read
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

5
Comments
4 min read
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

5
Comments
3 min read
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

5
Comments
3 min read
JetMoE: Reaching Llama2 Performance with 0.1M Dollars

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

4
Comments
4 min read
ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past

ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past

5
Comments
4 min read
Chapter: Vulnerability of Quantum Information Systems to Collective Manipulation

Chapter: Vulnerability of Quantum Information Systems to Collective Manipulation

5
Comments
4 min read
The Impact of Depth on Compositional Generalization in Transformer Language Models

The Impact of Depth on Compositional Generalization in Transformer Language Models

5
Comments
4 min read
Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

5
Comments
4 min read
LLMs are secretly good at regression calculations

LLMs are secretly good at regression calculations

4
Comments
9 min read
The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest

The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest

Comments
4 min read
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Comments
4 min read
From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications

From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications

Comments
4 min read
The Topos of Transformer Networks

The Topos of Transformer Networks

Comments
4 min read
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Comments
4 min read
Impact of Extensions on Browser Performance: An Empirical Study on Google Chrome

Impact of Extensions on Browser Performance: An Empirical Study on Google Chrome

Comments
3 min read
Increased LLM Vulnerabilities from Fine-tuning and Quantization

Increased LLM Vulnerabilities from Fine-tuning and Quantization

Comments
4 min read
The Use of Generative Search Engines for Knowledge Work and Complex Tasks

The Use of Generative Search Engines for Knowledge Work and Complex Tasks

Comments
3 min read
94% on CIFAR-10 in 3.29 Seconds on a Single GPU

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

Comments
3 min read
Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings

Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings

Comments
4 min read
loading...