Data Crossroads #31
The Rabbit R1 is an AI-powered gadget that can use your apps for you
David Pierce writing for The Verge:
Jesse Lyu, the CEO and founder of an AI startup called Rabbit, says he doesn’t want to replace your smartphone. At least not right away. His company’s new gadget, a $199 standalone AI device called the R1, is so staggeringly ambitious that Lyu seems to think he can’t help but replace your phone at some point. Just not quite yet.
(Update January 10th, 4:45PM ET: Rabbit announced its initial 10,00 unit run of R1 units has already sold out, and now it’s taking pre-orders for a second shipment in the spring.)
The R1 looks a little like a Playdate console or maybe a modernized version of one of those ’90s-era handheld TVs. It’s a standalone gadget about half the size of an iPhone with a 2.88-inch touchscreen, a rotating camera for taking photos and videos, and a scroll wheel / button you press to navigate around or talk to the device’s built-in assistant. It has a 2.3GHz MediaTek processor, 4GB of memory, and 128GB of storage, all inside a rounded body designed in collaboration with the design firm Teenage Engineering. All Rabbit says about the battery is that it lasts “all day.”
I spent a few minutes with the R1 after Rabbit’s launch event, and it’s an impressive piece of hardware. Only one device (Lyu’s) was actually functional, and even that one couldn’t do much because of spotty hotel Wi-Fi. But the R1 is surprisingly light and feels much nicer than it looks in pictures. Its buttons are clicky and satisfying, which is no surprise from Teenage Engineering, and the whole thing fits nicely in my grip. It’s definitely a fingerprint magnet, though.
I’m still unable to understand why this can’t be an app, official explanations notwithstanding.
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Angels Balaguer et al on Arxiv:
There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well understood. In this paper, we propose a pipeline for fine-tuning and RAG, and present the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages, including extracting information from PDFs, generating questions and answers, using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We propose metrics to assess the performance of different stages of the RAG and fine-Tuning pipeline. We conduct an in-depth study on an agricultural dataset. Agriculture as an industry has not seen much penetration of AI, and we study a potentially disruptive application - what if we could provide location-specific insights to a farmer? Our results show the effectiveness of our dataset generation pipeline in capturing geographic-specific knowledge, and the quantitative and qualitative benefits of RAG and fine-tuning. We see an accuracy increase of over 6 p.p. when fine-tuning the model and this is cumulative with RAG, which increases accuracy by 5 p.p. further. In one particular experiment, we also demonstrate that the fine-tuned model leverages information from across geographies to answer specific questions, increasing answer similarity from 47% to 72%. Overall, the results point to how systems built using LLMs can be adapted to respond and incorporate knowledge across a dimension that is critical for a specific industry, paving the way for further applications of LLMs in other industrial domains.
Common questions answered. One of the more relatively less technical papers (translation: no math).
ChatGPT does Advent of Code 2023
Advent of Code (henceforth AoC) is an annual programming "event", held by Eric Wastl, that takes place during the first 25 days of december. Each day at midnight a problem unlocks, consisting of an input file and a description of the required solution (either a number or a sequence of letters and numbers) to be determined by processing the input file. To solve the problem you have to submit to the website the correct solution. Once you do part 2 of the problem unlocks, usually a harder version of the problem in part 1. You don't have to submit any code so, in theory, you could solve everything by hand, however, usually, this is intractable and writing a program to do the work for you is the only easy way to solve the problem.
There's also a leaderboard where participants are scored based on how fast they submitted a solution.
Problems start very easy on day 1 (sometimes as easy as just asking for a program that sums all numbers in the input) and progress towards more difficult ones, but they never get very hard: a CS graduate should be able to solve all problems, except maybe 1 or 2, in a couple of hours each.
And this concluded thus:
Overall my subjective impression is that not much has changed, it can't solve anything that requires something more complicated than just following instructions and its bad at following instructions unless they are very simple.
It could be that LLMs have reached their plateau. Or maybe Q* or Bard Ultra or Grok Extra will wipe the floor next year, like GPT-4 was supposed to do this year. It's hard not to feel jaded about the hype cycle.
I have a bunch of observations about the performance of ChatGPT on AoC which I will report here in no particular order.
What’s fascinating is that for LLMs, it looks like their reasoning abilities are just a byproduct of them having looked at large amounts of text. That’s also the reason why their reasoning abilities are pretty nascent at this time.
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
This article will teach you about self-attention mechanisms used in transformer architectures and large language models (LLMs) such as GPT-4 and Llama. Self-attention and related mechanisms are core components of LLMs, making them a useful topic to understand when working with these models.
However, rather than just discussing the self-attention mechanism, we will code it in Python and PyTorch from the ground up. In my opinion, coding algorithms, models, and techniques from scratch is an excellent way to learn!
A more practical study on self-attention from a gifted author.
Other interesting links
Saving 99% costs by saving output from GPT-4, using it to fine-tune Mixtral 8x7B and switching to Mixtral 8x7B
Many AI Safety Orgs Have Tried to Criminalize Currently-Existing Open-Source AI
New material found by AI could reduce lithium use in batteries
WhisperSpeech: An Open Source text-to-speech system built by inverting Whisper.