Hi everyone-
Well, we’ve seen snow for the first time in a while in the UK… winter coat and big jumper definitely required! High time to curl up in the warm with some thought provoking AI and Data Science reading materials…
Following is the December edition of our Royal Statistical Society Data Science and AI Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity. NOTE: If the email doesn’t display properly (it is pretty long…) click on the “Open Online” link at the top right.
As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science, ML and AI practitioners. And if you are not signed up to receive these automatically you can do so here
handy new quick links: committee; ethics; research; generative ai; applications; practical tips; big picture ideas; fun; reader updates; jobs
Committee Activities
For anyone in and around London our Christmas social is on the 5th Dec …would be great to see you!
We are continuing our work with the Alliance for Data Science professionals on expanding the previously announced individual accreditation (Advanced Data Science Professional certification) into university course accreditation. Remember also that the RSS is now accepting applications for the Advanced Data Science Professional certification- more details here.
Some great recent content published by Brian Tarran on our sister blog, Real World Data Science:
Deduplicating and linking large datasets using Splink, with Robin Linacre, UK Ministry of Justice
'I would like modellers to be less ambitious in developing monster models that are impossible to inspect': with Andrea Saltelli, co-editor of 'The Politics of Modelling'
Learning from failure: ‘Red flags’ in body-worn camera data, with Noah Wright
How to ‘open science’: A brief guide to principles and practices: with Isabel Sassoon
Our Chair, Janet Bastiman, Chief Data Scientist at Napier AI, won the Women in Technology award for software and services provider at the Banking Tech Awards 2023 from FinTech Futures: many congratulations Janet, thoroughly well deserved!
Giles Pavey, Global Director Data Science at Unilever, co-wrote an article in the MIT Sloan Management Review on AI assurance: AI Ethics at Unilever: From Policy to Process
Martin Goodson, CEO and Chief Scientist at Evolution AI, and Piers Stobbs, VP Science at Deliveroo, started a podcast… AI Unfiltered!
The first edition- discussing the recent UK AI Safety Summit as well as the US Executive Order on AI is out now and can be found here (as well as on the main podcast platforms like apple, spotify, amazon music etc - just search for ‘AI Unfiltered - with Martin and Piers’)
Martin also continues to run the excellent London Machine Learning meetup and is very active with events. The next event is on December 6th, when Yuandong Tian, Research Scientist and Senior Manager in Meta AI Research (FAIR), will be presenting "Efficient Inference of LLMs with Long Context Support”. Videos are posted on the meetup youtube channel - and future events will be posted here.
This Month in Data Science
Lots of exciting data science and AI going on, as always!
Ethics and more ethics...
Bias, ethics, diversity and regulation continue to be hot topics in data science and AI...
Talk about a rollercoaster ride at OpenAI… the sequence of events starting on November 17th (with the sacking of the CEO Sam Altman by the company board), ending with his reinstatement on November November 21st - via a brief stint at Microsoft…) generated innumerable emergency podcasts and articles….
Good summary here from the Economist talking through events and implications - ‘With Sam Altman’s return, a shift in AI from idealism to pragmatism‘- also here in the Guardian
The trigger for the events is still unclear, although insiders point to internal concerns about new capabilities underpinned by something called Q*- ‘OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say’
Although Yann LeCun is not impressed!
Interestingly Ilya Sutskever, the esteemed Chief Scientist at OpenAI, must have been involved in the initial firing as he was a swing vote on the board (although he subsequently regretted his actions) so it’s interesting to look back on his recent interview with MIT Technology Review
"It’s this train of thought that has led Sutskever to make the biggest shift of his career. Together with Jan Leike, a fellow scientist at OpenAI, he has set up a team that will focus on what they call superalignment. Alignment is jargon that means making AI models do what you want and nothing more. Superalignment is OpenAI’s term for alignment applied to superintelligence."
Elsewhere in the world of AI Safety… 161 pages on the state of play in China!
"In the international arena, China has recently intensified its efforts to position AI as a domain for international cooperation. In October 2023, President Xi Jinping announced the new Global AI Governance Initiative at the Third Belt and Road Forum for International Cooperation, setting out China’s core positions on international cooperation on AI.4 The Chinese government has also indicated interest in maintaining human control over AI systems and preventing their misuse by extremist groups. However, successful cooperation with China on AI safety hinges on selecting the right international fora for exchanges, as China has expressed a clear preference for holding AI-related discussions under the aegis of the UN."
Lots of follow up on the US Executive Order on AI, and the UK summit on AI Safety as we discussed last month
The Mozilla foundation, staunch open source advocates, made the case for openness driving safety in AI
"Yes, openly available models come with risks and vulnerabilities — AI models can be abused by malicious actors or deployed by ill-equipped developers. However, we have seen time and time again that the same holds true for proprietary technologies — and that increasing public access and scrutiny makes technology safer, not more dangerous. The idea that tight and proprietary control of foundational AI models is the only path to protecting us from society-scale harm is naive at best, dangerous at worst."
Useful take from Scientific American on Biden’s Executive Order- ‘a Good Start, Experts Say, but Not Enough’
"If the AI Safety Summit is to be judged a success — or at least on the right path to creating consensus on AI safety, regulation, and ethics — then the UK government must strive to create an even playing field for all parties to discuss the future use cases for the technology,” Zenil said. “The Summit cannot be dominated by those corporations with a specific agenda and narrative around their commercial interests, otherwise this week’s activities will be seen as an expensive and misleading marketing exercise.”
And Steven Sinofsky points to the dangers of regulatory capture (as highlighted in our AI Unfiltered podcast…)
"The President’s Executive Order on Artificial Intelligence is a premature and pessimistic political solution to unknown technical problems and a clear case of regulatory capture at a time when the world would be best served by optimism and innovation"
And some good commentary on the current version of the EU AI Act
"The EU AI Act now proposes to regulate “foundational models”, i.e. the engine behind some AI applications. We cannot regulate an engine devoid of usage. We don’t regulate the C language because one can use it to develop malware. Instead, we ban malware and strengthen network systems (we regulate usage). Foundational language models provide a higher level of abstraction than the C language for programming computer systems; nothing in their behaviour justifies a change in the regulatory framework."
Key takeaways from the UK AI Safety Summit
"Existential risk is divisive, but short-term risk is not. The possibility that AI can wipe out humanity – a view held by less hyperbolic figures than Musk – remains a divisive one in the tech community. That difference of opinion was not healed by two days of debate in Buckinghamshire. But if there is a consensus on risk among politicians, executives and thinkers, then it focuses on the immediate fear of a disinformation glut. There are concerns that elections in the US, India and the UK next year could be affected by malicious use of generative AI.
This fear of a disinformation glut is very much born out in public perception: “85% of people worry about online disinformation, global survey finds“
And in real actions… Is Argentina the First A.I. Election?
"Javier Milei, the other candidate in Sunday’s runoff election, has struck back by sharing what appear to be A.I. images depicting Mr. Massa as a Chinese communist leader and himself as a cuddly cartoon lion. They have been viewed more than 30 million times."
On a positive note, we should applaud Meta/Facebook for trying to do something about this- “Meta bars political advertisers from using generative AI ads tools“
But as always with Meta, it’s complicated… “Meta disbanded its Responsible AI team“
Finally, some positive stories about investment in Open Source which we are strong advocates of… the only downside from a UK perspective, is that they are based in France which feels like a significant missed opportunity for the UK Government….
Meta taps Hugging Face for startup accelerator to spur adoption of open source AI models; see also kyutai
"Facebook parent Meta is teaming up with Hugging Face and European cloud infrastructure company Scaleway to launch a new AI-focused startup program at the Station F startup megacampus in Paris. The underlying goal of the program is to promote a more “open and collaborative” approach to AI development across the French technology world."
Developments in Data Science and AI Research...
As always, lots of new developments on the research front and plenty of arXiv papers to read...
While technical GenAI research is cool and exciting, its always nice to see research from other areas!
A bit of Neuroscience… “Trajectories through semantic spaces in schizophrenia and the relationship to ripple bursts” … I know, not the most accessible title, but an interesting idea- they use language models to help diagnose schizophrenia
A bit of Economics - The Short-Term Effects of Generative Artificial Intelligence on Employment: Evidence from an Online Labor Market
"We find that freelancers in highly affected occupations suffer from the introduction of generative AI, experiencing reductions in both employment and earnings. We find similar effects studying the release of other image-based, generative AI models. Exploring the heterogeneity by freelancers’ employment history, we do not find evidence that high-quality service, measured by their past performance and employment, moderates the adverse effects on employment. In fact, we find suggestive evidence that top freelancers are disproportionately affected by AI"
A bit of Computer Science
And some non-GenAI data science! I’m always on the lookout for approaches that beat the tried and tested statistical methods for time series and this looks promising Time Series Anomaly Detection using Diffusion-based Models -… although its hard to avoid GenAI completely- here’s a GPT inspired approach: TimeGPT
Of course GenAI is still very popular! First of all, improving Large Language Model efficiency:
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
"In this paper, we address the general quantization problem, where both weights and activations should be quantized. We show, for the first time, that the majority of inference computations for large generative models such as LLaMA, OPT, and Falcon can be performed with both weights and activations being cast to 4 bits”
Orca 2: Teaching Small Language Models How to Reason
although as Martin pointed out, not entirely clear that RegEx rules are a breakthrough!
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
"Previous methods start by training a reward model that aligns with human preferences, then leverage RL techniques to fine-tune the underlying models. However, crafting an efficient reward model demands extensive datasets, optimal architecture, and manual hyperparameter tuning, making the process both time and cost-intensive. The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model"
Plenty of interesting approaches to more specific use cases
Improving LLM’s approach to vision: CogVLM: Visual Expert for Pretrained Language Models
Improving Retrieval Augmented use cases (RAG - pointing LLMs at specific data sets): Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
LLMs for Recommender systems: Collaborative Large Language Model for Recommender Systems
"the semantic gap between natural language and recommendation tasks is still not well addressed, leading to multiple issues such as spuriously-correlated user/item descriptors, ineffective language modeling on user/item contents, and inefficient recommendations via auto-regression, etc. In this paper, we propose CLLM4Rec, the first generative RS that tightly integrates the LLM paradigm and ID paradigm of RS, aiming to address the above challenges simultaneously"
LLMs for coding…
On the Concerns of Developers When Using GitHub Copilot
"Our results reveal that (1) Usage Issue and Compatibility Issue are the most common problems faced by Copilot users, (2) Copilot Internal Issue, Network Connection Issue, and Editor/IDE Compatibility Issue are identified as the most frequent causes, and (3) Bug Fixed by Copilot, Modify Configuration/Setting, and Use Suitable Version are the predominant solutions"
And then we have the downsides and dealing with LLMs…
"We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so"
Digging into ‘Red Teaming’ approaches - Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
And lots on Hallucinations…
Correction with Backtracking Reduces Hallucination in Summarization
"In this paper, we introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization. The approach is based on two steps: hallucination detection and mitigation. We show that the former can be achieved through measuring simple statistics about conditional word probabilities and distance to context words"
Teaching Language Models to Hallucinate Less with Synthetic Tasks
"Our method, SynTra, first designs a synthetic task where hallucinations are easy to elicit and measure. It next optimizes the LLM's system message via prefix-tuning on the synthetic task, and finally transfers the system message to realistic, hard-to-optimize tasks"
And measurement and benchmarking
Finally, this somewhat innocuous sounding paper (Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models) caused quite a stir about whether this says anything about LLM’s ability to generalise (see here and here)!
Generative AI ... oh my!
Still such a hot topic it feels in need of it's own section, for all things DALLE, IMAGEN, Stable Diffusion, ChatGPT...
Despite OpenAI’s tumultuous weekend (discussed above) they still managed to announce a plethora of new capabilities at their developer day
Suddenly end users can create their own customised ‘ChatGPT’ utilising their own data, all through the browser- and the results are already impressive
"Each GPT can be granted access to web browsing, DALL-E, and OpenAI’s Code Interpreter tool for writing and executing software. There’s also a “Knowledge” section in the builder interface for uploading custom data, like the DevDay event schedule. With another feature called Actions, OpenAI is letting GPTs hook into external services for accessing data like emails, databases, and more, starting with Canva and Zapier."
They also released a new version of GPT-4 (called GPT-4 turbo) including a massive 128k context window (roughly 300 pages of a book…). Some interesting early results here, with an impressive comparison against Claude 2.1, another large context window LLM
Meanwhile over at Google…
Rumours are circulating that their next generation AI model, Gemini, is delayed…
Perhaps hedging their bets… Google agrees to invest up to $2 billion in OpenAI rival Anthropic
Still focused on turning AI into features: New features to help merchants stand out this holiday season
And still churning out innovation - DeepMind and YouTube release Lyria, a gen-AI model for music, and Dream Track to build AI tunes (more details here)
The competitive space is definitely getting more crowded
Elon Musk and X/Twitter entered the ring with ‘Grok’, an apparently more humorous and opinionated AI chat bot… to mixed reviews
Inflection announced the next version of their LLM, catchily titled Inflection-2
Kai-Fu Lee looks to build the OpenAI of China
"I think necessity is the mother of innovation, and there’s clearly a huge necessity in China,” Lee, who’s 61 and leading 01.AI as CEO, told TechCrunch in an interview, explaining the motive behind starting the company. “Unlike the rest of the world, China doesn’t have access to OpenAI and Google because those two companies did not make their products available in China, so I think many doing LLM are trying to do their part in creating a solution for a market that really needs this.”
Meanwhile over at Meta
What Meta learned from Galactica, the doomed model launched two weeks before ChatGPT
Still forging the open source route- partnership with Dell for on premises hosting, and useful whatsapp chatbot tutorials
And still innovating- Introducing Emu Video and Emu Edit, our latest generative AI research milestones (in fact in general video generation getting increasingly impressive… I2VGen-XL, VideoGen from Baidu, Stable Video Diffusion)
Some increasingly impressive commercial products out there now: Runway for video generation, Replit for app development, tome for presentations
And some cool new applications from chip design to education, to data labelling
If you’re interested in coding assistants but dont want to use a commercial player, there are now quite a few options available:
First a primer - Creating your own code writing agent. How to get results fast and avoid the most common pitfalls
Then the options: DeepSeekCoder, Cody, Tabby, and fauxpilot
GPT 4 Vision has been out for month or two now, so we are getting more usage examples which are pretty amazing
GPT-4 with Vision: Complete Guide and Evaluation - pretty mind-blowing!
Excellent tutorial incorporating GPT4V with Streamlit
Some great example applications here, including processing webcam images, , insurance adjustment and my favourite - ruskin, an art copilot
Finally, back to hallucinations and more importantly how we measure them, more background and primers, and a new library that show promise: FacTool: Factuality Detection in Generative AI
Real world applications and how to guides
Lots of practical examples and tips and tricks this month
This is really great to see, published in Nature: Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer
"The results showed that, compared to double reading, implementing the AI-assisted additional-reader process could achieve 0.7–1.6 additional cancer detection per 1,000 cases, with 0.16–0.30% additional recalls, 0–0.23% unnecessary recalls and a 0.1–1.9% increase in positive predictive value (PPV) after 7–11% additional human reads of AI-flagged cases (equating to 4–6% additional overall reading workload). The majority of cancerous cases detected by the AI-assisted additional-reader process were invasive (83.3%) and small-sized (≤10 mm, 47.0%). This evaluation suggests that using AI as an additional reader can improve the early detection of breast cancer with relevant prognostic features, with minimal to no unnecessary recalls"
Google has been working hard on the weather!
First, improvements in 24-hour forecasts with MetNet-3
Then very impressive improvements in extreme weather forecasting with GraphCast (paper here)
"In research published in Science today, Google DeepMind’s model, GraphCast, was able to predict weather conditions up to 10 days in advance, more accurately and much faster than the current gold standard. GraphCast outperformed the model from the European Centre for Medium-Range Weather Forecasts (ECMWF) in more than 90% of over 1,300 test areas. And on predictions for Earth’s troposphere—the lowest part of the atmosphere, where most weather happens—GraphCast outperformed the ECMWF’s model on more than 99% of weather variables, such as rain and air temperature"
Good applied work at the big tech players
And an interesting take on the role of economists in Data Science - The Economics Team at Instacart
"At Instacart, our Econ Team is situated in a somewhat different role than in most other tech firms. The Econ Team is a dedicated unit of primarily academically trained economists that sits within our broader machine learning organization. Members of the Econ Team operate as full-fledged machine learning engineers, focusing on problems at the intersection between economics and machine learning while also doing the leg work of productionizing their models. Instead of being assigned to a single class of problems or product area, we are a horizontal team and form tight and lasting partnerships with product partners across the company."
Lots of great tutorials and how to guides as always
What it says on the tin… NumPy for Numpties!
Interested in reinforcement learning but not sure where to start? Hands-on Deep Q-Learning
A couple of packages worth checking out
CausalPy: focussing on causal inference in quasi-experimental settings
This looks worth exploring from Amazon - Fortuna, A Library for Uncertainty Quantification
"Fortuna is a library for uncertainty quantification that makes it easy for users to run benchmarks and bring uncertainty to production systems. Fortuna provides calibration and conformal methods starting from pre-trained models written in any framework, and it further supports several Bayesian inference methods"
Random Forests in 2023: Modern Extensions of a Powerful Method
For those delving into the world of LLMs…
Start here with Karpathy’s Intro to LLMs
Delve into the world of reasoning and different prompting approaches- excellent tutorial
Have some fun playing around with finetuning
This is pretty amazing - Building an AI Video Editor Prototype in 100 Days(ish)
I enjoyed this link between theory and practice- Guesses as compressed probability distributions
Finally, a look at optimisation, a topic often less studied by data scientists but incredibly useful- Four Kinds of Optimisation
"Premature optimisation might be the root of all evil, but overdue optimisation is the root of all frustration. No matter how fast hardware becomes, we find it easy to write programs which run too slow. Often this is not immediately apparent. Users can go for years without considering a program’s performance to be an issue before it suddenly becomes so — often in the space of a single working day."
Practical tips
How to drive analytics and ML into production
First of all, The Future of ML Deployment: Trends and Predictions
Insight from DoorDash on their MLOps approach: Strategy & Principles to Scale and Evolve MLOps @DoorDash
(Another) new MLOps tool - Huggingface Model to Sagemaker Endpoint: Automating MLOps with ZenML
Useful primer on Feature Stores
This looks well worth checking out- a new testing framework that covers everything from traditional ML models to LLMs - giskard
"Scan AI models to detect risks of biases, performance issues and errors. In 4 lines of code. Giskard is a Python library that automatically detects vulnerabilities in AI models, from tabular models to LLM, including performance biases, data leakage, spurious correlation, hallucination, toxicity, security issues and many more."
Good tips on building AI products- Don’t Build AI Products The Way Everyone Else Is Doing It
Some interesting insights on LLM architecture from the engineering team at github
And a slightly different perspective… LLM Apps Are Mostly Data Pipelines
"When I think about a summary of what most of these LLM apps are doing, I’d bucket them like this: Data extraction – e.g. pull message text from the slack API Data cleansing – e.g. remove certain characters, extra spaces, encoding, etc. Data enrichment – embedding Data loading – write to vector databases Application UX i.e. prompt chaining, retrieval, inference, memory, chat UI, etc."
Bigger picture ideas
Longer thought provoking reads - lean back and pour a drink! ...
My North Star for the Future of AI from Fei-Fei Li
"Whatever academics like me thought artificial intelligence was, or what it might become, one thing is now undeniable: It is no longer ours to control. As a computer science professor at Stanford, it had been a private obsession of mine—a layer of thoughts that superimposed itself quietly over my view of the world. By the mid-2010s, however, the cultural preoccupation with AI had become deafeningly public"
The dawn of the omnistar from the Economist
“Stars may worry that ai is stealing their work and giving less talented performers the skills to snatch their audience. In fact, the famous folk complaining the loudest about the new technology are the ones who stand to benefit the most. Far from diluting star power, ai will make the biggest celebrities bigger than ever, by allowing them to be in all markets, in all formats, at all times. Put your hands together—or insert your earplugs if you prefer—for the rise of the omnistar.”
Then again… When A.I. Comes for the Elites from Peter Turchin
"So much for history. What about the future? The rise of intelligent machines will undermine social stability in a far greater way, than previous technological shifts, because now A.I. threatens elite workers—those with advanced degrees. But highly educated people tend to acquire skills and social connections that enable them to organize effectively and challenge the existing power structures. Overproduction of youth with advanced degrees has been the main force driving revolutions from the Springtime of Nations in 1848 to the Arab Spring of 2011"
Oops! We Automated Bullshit from the University of Cambridge Computer Science department
"Since my fieldwork in Africa, I’ve learned to ask different questions about AI, and in recent months, I’ve started to feel like the boy who questions the emperor’s new clothes. The problem I see, apparently not reported in coverage of Sunak’s AI Summit, is that AI literally produces bullshit. (Do I mean “literally”? My friends complain that I take everything literally, but I’m not a kleptomaniac)."
Can We Stop LLMs from Hallucinating? from Juras Juršėnas
"As such, LLMs run into a challenge. The set of all possible outputs will always have some number of synthetic statements, but to the model, all of them are truth-value agnostic. In simple terms, “Julius Caesar’s assassin was Brutus” (there were many, but for this case, it doesn’t matter) and “Julius Caesar’s assassin was Abraham Lincoln” are equivalent to a model."
AI is about to completely change how you use computers from Bill Gates
"In the next five years, this will change completely. You won’t have to use different apps for different tasks. You’ll simply tell your device, in everyday language, what you want to do. And depending on how much information you choose to share with it, the software will be able to respond personally because it will have a rich understanding of your life. In the near future, anyone who’s online will be able to have a personal assistant powered by artificial intelligence that’s far beyond today’s technology."
Fun Practical Projects and Learning Opportunities
A few fun practical projects and topics to keep you occupied/distracted:
Always a treat - Information is Beautiful Awards
Auto Generated Agent Chat: Chess Game Playing While Chitchatting by GPT-4 Agents - fun easy to follow tutorial in colab!
Introducing the Multi-Chord Diagram: Visualizing Complex Set Relationships
Awesome data set to play with - Maxar's Open Satellite Feed
Great fun - The Poet’s Journey: Visualizing Dante’s Divine Comedy Characters
Who wouldnt want this? Unauthorized “David Attenborough” AI clone narrates developer’s life, goes viral
Updates from Members and Contributors
On 14th December, Stephen Haben and colleagues at the Energy Systems Catapult are launching the AI for Decarbonisation’s Virtual Centre for Excellence (ADViCE), together with Digital Catapult and The Alan Turing Institute. Registration link to the webinar.
Lucas Franca has a fully funded PhD studentship (fees+stipend) at the Department of Computer and Information Sciences at Northumbria University- check here for details
Jobs and Internships!
The Job market is a bit quiet - let us know if you have any openings you'd like to advertise
EvolutionAI, are looking to hire someone for applied deep learning research. Must like a challenge. Any background but needs to know how to do research properly. Remote. Apply here
Data Internships - a curated list
Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here
- Piers
The views expressed are our own and do not necessarily represent those of the RSS