Data Crossroads #63

Sep 02, 2024

Andy Jassy on Amazon Q, their GenAI assistant for software development

Amazon Q, our GenAI assistant for software development, is trying to bring some light to this heaviness. We have a new code transformation capability, and here’s what we found when we integrated it into our internal systems and applied it to our needed Java upgrades:

- The average time to upgrade an application to Java 17 plummeted from what’s typically 50 developer-days to just a few hours. We estimate this has saved us the equivalent of 4,500 developer-years of work (yes, that number is crazy but, real).

- In under six months, we've been able to upgrade more than 50% of our production Java systems to modernized Java versions at a fraction of the usual time and effort. And, our developers shipped 79% of the auto-generated code reviews without any additional changes.

- The benefits go beyond how much effort we’ve saved developers. The upgrades have enhanced security and reduced infrastructure costs, providing an estimated $260M in annualized efficiency gains.

One of the very few horizontally monetizable use cases for LLMs currently are coding-related copilots.

Creating calendar entries from an image using Anthropic Claude 3.5

From Greg Wilson’s Tech Blog:

A few days ago, my jazz piano teacher sent me the new fall/winter schedule for my private jazz piano lessons -- 13 different dates -- as a JPG (mine are outlined in green marker):
I was too lazy to go make 13 entries in Google Calendar, so I decided to see if Claude could help me out:
I first uploaded the jpg to Claude 3.5 Sonnet)...
My prompt: List the dates that are outlined in green
The dates outlined in green in the image are:
Friday, 13-Sep-24 Friday, 27-Sep-24 Friday, 11-Oct-24 Friday, 8-Nov-24 Friday, 6-Dec-24 Friday, 20-Dec-24 Friday, 17-Jan-25 Friday, 31-Jan-25 Friday, 28-Feb-25 Friday, 14-Mar-25 Friday, 11-Apr-25 Friday, 25-Apr-25 Friday, 23-May-25
These dates correspond to the Friday column for students who have their first every-other-week lesson starting the week of Tuesday Sept 10, 2024.
Cool - that was easy and accurate. Now I need to get it into my calendar, so I asked it to create an ics file...

We’re happy when AI saves us $260 million (like we observed from the first entry in this week’s newsletter). But, we’re also happy when we save time and when not having to do annoying work! Ah, vision models.

Anthropic publishes the ‘system prompts’ that make Claude tick

Kyle Wiggers writing for Techcrunch:

vendors usually keep system prompts close to the chest — presumably for competitive reasons, but also perhaps because knowing the system prompt may suggest ways to circumvent it. The only way to expose GPT-4o‘s system prompt, for example, is through a prompt injection attack. And even then, the system’s output can’t be trusted completely.
However, Anthropic, in its continued effort to paint itself as a more ethical, transparent AI vendor, has published the system prompts for its latest models (Claude 3 Opus, Claude 3.5 Sonnet and Claude 3 Haiku) in the Claude iOS and Android apps and on the web.
[…]
The latest prompts, dated July 12, outline very clearly what the Claude models can’t do — e.g. “Claude cannot open URLs, links, or videos.” Facial recognition is a big no-no; the system prompt for Claude Opus tells the model to “always respond as if it is completely face blind” and to “avoid identifying or naming any humans in [images].”
But the prompts also describe certain personality traits and characteristics — traits and characteristics that Anthropic would have the Claude models exemplify.
The prompt for Claude 3 Opus, for instance, says that Claude is to appear as if it “[is] very smart and intellectually curious,” and “enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.” It also instructs Claude to treat controversial topics with impartiality and objectivity, providing “careful thoughts” and “clear information” — and never to begin responses with the words “certainly” or “absolutely.”

There were a bunch of chuckles around Apple’s prompts that included things like “Do not hallucinate”. It’s always good to look at pro-level prompts.

Judge dismisses majority of GitHub Copilot copyright claims

Ryan Daws writing for Developer Tech:

Judge Jon Tigar’s ruling, unsealed last week, leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract. This decision marks a substantial setback for the developers who argued that GitHub Copilot, which uses OpenAI’s technology and is owned by Microsoft, unlawfully trained on their work.
The court’s dismissal primarily focused on the accusation that GitHub Copilot violates the Digital Millennium Copyright Act (DMCA) by suggesting code without proper attribution. An amended version of the complaint had taken issue with GitHub’s duplication detection filter, which allows users to “detect and suppress” Copilot suggestions matching public code on GitHub.
The developers argued that turning off this filter would “receive identical code” and cited a study showing how AI models can “memorise” and reproduce parts of their training data, potentially including copyrighted code.
However, Judge Tigar found these arguments unconvincing. He determined that the code allegedly copied by GitHub was not sufficiently similar to the developers’ original work. The judge also noted that the cited study itself mentions that GitHub Copilot “rarely emits memorised code in benign situations.”
As a result, Judge Tigar dismissed this allegation with prejudice, meaning the developers cannot refile the claim. Additionally, the court dismissed requests for punitive damages and monetary relief in the form of unjust enrichment.

AI: 1. Human: 0.

With 10x growth since 2023, Llama is the leading engine of AI innovation

From Meta’s AI blog:

Llama models are approaching 350 million downloads to date (more than 10x the downloads compared to this time last year), and they were downloaded more than 20 million times in the last month alone, making Llama the leading open source model family.
Llama usage by token volume across our major cloud service provider partners has more than doubled in just three months from May through July 2024 when we released Llama 3.1.
Monthly usage (token volume) of Llama grew 10x from January to July 2024 for some of our largest cloud service providers.

Llama has the resources behind it that very few other LLMs have.

Building LLMs from the Ground Up: A 3-hour Coding Workshop

Sebastian Raschka on his newsletter:

Below, you'll find a table of contents to get an idea of what this video covers (the video itself has clickable chapter marks, allowing you to jump directly to topics of interest):
0:00 – Workshop overview
2:17 – Part 1: Intro to LLMs
9:14 – Workshop materials
10:48 – Part 2: Understanding LLM input data
23:25 – A simple tokenizer class
41:03 – Part 3: Coding an LLM architecture
45:01 – GPT-2 and Llama 2
1:07:11 – Part 4: Pretraining
1:29:37 – Part 5.1: Loading pretrained weights
1:45:12 – Part 5.2: Pretrained weights via LitGPT
1:53:09 – Part 6.1: Instruction finetuning
2:08:21 – Part 6.2: Instruction finetuning via LitGPT
02:26:45 – Part 6.3: Benchmark evaluation
02:36:55 – Part 6.4: Evaluating conversational performance
02:42:40 – Conclusion

If you have the time (and the inclination), you should definitely take this.

Data Crossroads