AI Learning YouTube News & VideosMachineBrain

Building a 35,000-Sample Wall Street Bets Dataset: Fine-Tuning, Nvidia Giveaway, and Data Access

Building a 35,000-Sample Wall Street Bets Dataset: Fine-Tuning, Nvidia Giveaway, and Data Access
Image copyright Youtube
Authors
    Published on
    Published on

Welcome back to another thrilling episode on the sentdex channel, where we dive deep into the exhilarating world of Reddit data sets. Today, the team embarks on a quest to construct a dataset comprising 35,000 samples sourced from the dynamic realm of the Wall Street Bets subreddit. This dataset isn't just your run-of-the-mill collection; it intricately captures conversations between multiple speakers and the calculated responses from a sophisticated bot, mirroring the intricate tapestry of real-life interactions. As they fine-tune this dataset, the team aims to surpass previous iterations, setting their sights on achieving unparalleled results in this data-driven odyssey.

But wait, there's more excitement in store! In a generous gesture, Nvidia steps in to offer a jaw-dropping giveaway of an RTX 4080 super at the GPU Technology Conference. Attendees are in for a treat as they delve into the cutting-edge advancements in technology and witness the fusion of robotics with generative AI, promising a future brimming with innovation and endless possibilities. As the channel founder reminisces about their encounter with Reddit data from yesteryears, they shed light on the evolution of chatbots and language models, showcasing the remarkable progress in the field over the years.

Venturing into the vast landscape of Reddit data sources, the team explores avenues such as torrents, archive.org, and BigQuery, unearthing a treasure trove of terabytes worth of comments dating up to 2019. With meticulous attention to detail, they navigate the process of exporting this valuable data to Google Cloud Storage, opting for a JSON format with compression to streamline handling. Their ultimate goal? To make this dataset accessible to all by uploading it to Hugging Face, ensuring that this wealth of information is readily available for enthusiasts and researchers alike. Amidst the whirlwind of data processing and fine-tuning, the team strategizes on pre-decompressing files to enhance processing speed and efficiency, a crucial step in their relentless pursuit of data perfection.

building-a-35000-sample-wall-street-bets-dataset-fine-tuning-nvidia-giveaway-and-data-access

Image copyright Youtube

building-a-35000-sample-wall-street-bets-dataset-fine-tuning-nvidia-giveaway-and-data-access

Image copyright Youtube

building-a-35000-sample-wall-street-bets-dataset-fine-tuning-nvidia-giveaway-and-data-access

Image copyright Youtube

building-a-35000-sample-wall-street-bets-dataset-fine-tuning-nvidia-giveaway-and-data-access

Image copyright Youtube

Watch Building an LLM fine-tuning Dataset on Youtube

Viewer Reactions for Building an LLM fine-tuning Dataset

Viewer expresses gratitude for learning ML and coding from the channel

Viewer shares experience of creating a binary classification model about breast cancer

Viewer mentions starting a tech startup for logistics

Viewer shares creating a scraper for WSB comments

Viewer appreciates the fun learning process provided by the channel

Viewer discusses cleaning and uploading conversation data for others to use

Viewer points out potential issues with data filtering for model training

Viewer plans on creating a comprehensive guide for ComfyUI and Flux

Viewer inquires about completing the Python from scratch series

Viewer requests a video on meta-learning with examples

unleashing-longnet-revolutionizing-large-language-models
sentdex

Unleashing Longnet: Revolutionizing Large Language Models

Explore the limitations of large language models due to context length constraints on sentdex. Discover Microsoft's longnet and its potential to revolutionize models with billion-token capacities. Uncover the challenges and promises of dilated attention in expanding context windows for improved model performance.

revolutionizing-programming-function-calling-and-ai-integration
sentdex

Revolutionizing Programming: Function Calling and AI Integration

Explore sentdex's latest update on groundbreaking function calling capabilities and API enhancements, revolutionizing programming with speed and intelligence integration. Learn how to define functions and parameters for optimal structured data extraction and seamless interactions with GPT-4.

unleashing-falcon-40b-practical-applications-and-comparative-analysis
sentdex

Unleashing Falcon 40b: Practical Applications and Comparative Analysis

Explore the Falcon 40b instruct model by sentdex, a powerful large language model with 40 billion parameters. Discover its practical applications, use cases, and comparison to other models like GPT-3.5 and GPT-4. Unleash the potential of Falcon in natural language generation, math problem-solving, and understanding human emotions. Get insights on running the model locally, its licensing, and the AI team behind its development. Join the AI revolution with Falcon 40b instruct!

revolutionizing-sentiment-analysis-knn-vs-bert-with-gzip-compression
sentdex

Revolutionizing Sentiment Analysis: KNN vs. Bert with Gzip Compression

Explore how a text classification method on sentdex challenges Bert in sentiment analysis using K nearest neighbors and gzip compression. Learn about the process, implementation, efficiency improvements, and promising results of this innovative approach.