The Conference for Machine Learning Innovation November 27 - 30, Berlin.
I had the great pleasure of taking part in MLCon 2023 (The Conference for Machine Learning Innovation. November 27 - 30, 2023. Berlin).
Below you will find impressions from the conference, and links for further reading.
The MLCon 2023 conference was held at the Maritim proArte Hotel Berlin in Berlin, Germany.
Not that far away from Unter den Linden, and the Humboldt-Universität.
Tried to follow as many talks as possible. But, well, these notes are, of course, in
no way, shape or form complete...
Rather, these notes were written on conference nights, as my way of
keeping track of the events that I attended at the conference. And as a way of storing links and references for future reference.
1. Impressions from Tuesday, November 28th.
1.1. AI beyond the Hype.
The conference started with talks by Sebastian Meyen and Katleen Gabriels on ''AI, and the current AI hype''.
I.e.
Since the 1950s, artificial intelligence (AI) has gone through several so-called AI summers (hype), followed by AI winters (realism).
Putting stickers on street signs can still, potentially, confuse autonomous vehicles...(realism).
And still, at the same time, everyone also gets a daily media dose (hype) about the (inevitable) arrival, in the not that distant future, of potential ''unsafe''' AGI's
(that can master just about everything...).
Certainly, safety is important. And, well, a dose of realism is probably also a good idea (Moving beyond the hype).
Great opening talks!
Clearly, Mat Velloso's 2018 comment is still relevant...
1.2. Pair-Programming with AI.
Alexander Ebbes and Moritz Schrauth, Xyna.AI, talked about ''Pair-Programming with AI''.
GPT models can not only answer questions amazingly well, GPT models trained for it are also suitable for generating or completing code.
But, integrating GPT-generated code with existing project-code, and making sure that it all actually works, and is safe...
Is still not an entirely ''automatic'' process...
Still, steps towards more automated work processes are, of course, interesting, and something where pros and cons should be carefully investigated.
1.3. Integrating Large Language Models (LLMs) & Knowledge Graphs (KGs).
Jörg Schad, head of Machine Learning at ArangoDB, talked about ''Integrating Large Language Models (LLMs) & Knowledge Graphs (KGs)''.
I.e.
Large language models (LLMs), such as ChatGPT and GPT4, are making new waves in the field of natural language processing and artificial intelligence, due to their emergent ability and generalizability. However, LLMs are black-box models, which often fall short of capturing and accessing factual knowledge [1].
And:
Knowledge Graphs (KGs), are structured knowledge models that explicitly store rich factual knowledge. KGs can enhance LLMs by providing external knowledge for inference and interpretability [1].
In the talk, Schad explored:
The integration of Knowledge Graphs (KGs) and Large Language Models (LLM), to harness their combined power for improved natural language understanding.
I.e. By leveraging KGs' structured knowledge and language models' text comprehension abilities, we are able to leverage the domain-specific (and potentially) sensitive- data together with the general knowledge of LLMs.
''ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling'':
Enhancing data-driven (pre-trained language models) PLMs with knowledge-based KGs to incorporate explicit factual knowledge into PLMs, thus improving their performance to generate texts requiring factual knowledge and providing more informed responses to user queries [2].
KG-enhanced LLMs.
Interestingly, the talk also:
Examined how language models can enhance KGs through knowledge extraction and refinement
(Entity discovery, relation extraction, co-reference resoution, End-To-End KG construction).
Finally, we got a code link, on how to ''Use LLMs to provide a natural language interface to an ArangoDB database'' [3]
(with the help of Langchain).
Langchain (developing applications powered by language models).
Indeed, a super interesting talk!
1.4. AI beyond the mainstream.
Alexander Ebbes, chief architect of Xyna.AI, talked about ''AI beyond the mainstream''.
According to Ebbes, ''we want people to become makers, not users''.
The focus of this talk is on the hidden, but emerging, world of open AI components, which not only put transparency and control in the hands of users, but also offer the opportunity to move from user to maker, from passive consumer to active technical designer.
Indeed, [gen]AI can help us become makers.
But we should, of course, be aware that [gen]AI (so far) has no idea about what it is generating.
An image generator might generate an image of a guitar player - that makes no sense (e.g. an [gen]AI might give a guitar player 3 or 4 fingers on each hand).
Or you might ask for a cafe in a certain area. But with no cafes in that area, the large language model might just invent (hallucinate) a cafe there... Not good.
Obviously, we are not yet there, where all such problems are solved. Still, we
want our systems based on virtues like (with AI-components that has) ''transparency'', ''reliabilty'', ''executability'', ''independence''.
We hope that our systems will be ''creative'', with the power to work with ''unstructured and noisy data''.
But we also that hope our systems will be ''Transparant and understandable'', ''Calculable'' etc.
LLM-like, but also ''Wolfram''-like...
LLM-like:
Unstructured and noisy data
Creative but unreproducible
Unexplainable
Unpredictable behaviour
''Wolfram''-like:
Structured and clean data
Reproducible but non creative
Transparant and understandable
Calculable
Challenging, indeed.
1.5. An Introduction to Embedding Vectors.
Christoph Henkelmann, DIVISIO GmbH, talked about ''An Introduction to Embedding Vectors''.
Embedding vectors, also known as activations, feature vectors, or just embeddings are an essential tool for deep learning. They are indispensable when encoding non-numerical input data but also when making the most of the output of a neural network.
Word vectors and computational linguistics.
The King – Man + Woman = Queen
example.
Embeddings:
Embeddings can be the output(s) of an NN/layer.
Embeddings can be the input(s) of an NN/layer.
There are networks to build good embeddings
(Word2Vec, SBert, bottleneck of an autoencoder).
Building a few-shot document classifier:
All transformers and RNN architectures use token embeddings
as input and output.
Variant A: Use a specialized network. E.g. SBert.
Variant B: Use any transformer, add token embeddings, divide by token count.
Autoencoders. The middle, bottleneck, layer gives an encoding, an embedding.
- Throw away the decoder, after training, and the output of the encoder is an embedding.
E.g. Then one could attach a (new) classification layer to the system then?
RAG (Retrieval Augmented Generation).
Turn the query into an embedding vector.
Find textfiles, from storage of documents, that fits with query.
Add those documents to LLM query (i.e. provide additional context to the LLM).
Retrieval Augmented Generation Inference Engines with LangChain:
A hallmark of RAG is in the intelligent retrieval of additional information from relevant data sources, typically vector databases using algorithms like similarity search. The retrieved data is combined with the user’s query, enriching the input provided to the generative model [4].
RAG enhances the capabilities of LLMs by integrating contextual data relevant to specific prompts. This process allows LLMs to tailor their responses, drawing from specialized resources like proprietary corporate materials or detailed technical manuals—content beyond their initial training scope [5].
Powering up LLaMa 2 with retrieval augmented generation (RAG) can be used to seek and use information from Wikipedia. I.e creating a LLaMa 2 Agent Empowered with Wikipedia Knowledge [6].
What problems does RAG solve:
It provides the most important context for the large language model to take into consideration when producing a response.
By retrieving useful information, it provides it a way to avoid hallucinations, as the required information is given in the prompt.
It gives LLMs infinite context windows to provide useful output.
Provides a conversational interface to unstructured data [7].
1.6. Generative Pre-trained Transformers. From Understanding to Applications.
Jochen Emig, ONSEI GmbH, talked about ''Generative Pre-trained Transformers: From Understanding to Applications''.
Where:
GPTs learns from large corpora, which enables them to acquire language knowledge and semantic understanding. Using a multi-layered architecture with self-attention mechanisms, that allows them to capture the intricate patterns and dependencies in the text data.
Trained on a lot of data, the attention mechanism eventually learns what (other) words, in a sentence, a word refers to.
Stay tuned, for more in coming months about:
Multi modal models
(Combining vision and text).
Efficiency improvements in training.
But for now the focus is on getting up-and-running with:
VectorDB.
Model.
Frontend.
Again, a good overview talk.
1.7. Human-like Lifelong Learning in Every AI Machine?
Pankaj Gupta, DRIMCo GmbH, talked about ''Human-like Lifelong Learning in Every AI Machine''.
Static vs. dynamic ML:
Static models are easier to build and test.
Dynamic models adapt to changing data.
The world is a highly changeable place. Sales predictions built from last year's
data are unlikely to successfully predict next year's results [8]
Updating a model with new information is, of course, good, but not so, if the model
forget important data that it has learned earlier, socalled ''catastrophic forgetting'':
Catastrophic interference, also known as catastrophic forgetting, is the tendency of an artificial neural network to abruptly and drastically forget previously learned information upon learning new information [9].
So, in order to make ''lifelong learning'' work, some kind of ''data replay'' mechanism need to be used in order not to forget important data points.
Which will eventually help the ML:
Learn continuously. Accumulate the knowledge learned in the past.
Adapting and being more capable continuously.
All important, obviously.
1.8. Unexpected Forces Shaping AI.
Jeremy Wilken, NVIDIA, talked about ''Unexpected Forces Shaping AI''.
By being aware of the drivers changing the landscape of AI trends, it becomes easier to navigate in the world.
Changes can come in many forms:
Social (changes).
Technical (changes).
Economic (changes).
Political (changes).
Environmental (changes).
Value (changes).
Which can lead to new trends, when it comes to things like:
Oversight.
Information availability.
Uneven regulation.
AI or human.
Globalizing accountability.
Hearts & minds.
Resource scarcity.
Value creation.
Nostalgia.
Collective imagination.
Indeed, most things change.
E.g. ''Information availability'' is not a static thing:
Markets are supposed to reflect the collective wisdom of future value.
But, how does AI change the speed of processing? Visibility of information?
And understanding of value?
For ''Global accountability'', one can ask:
A tiny island nation might want to hold the rest of the world accountable for
rising sea levels. But how?
Indeed, how will (the creation of) AI in one country impact life in other countries?
All in all, it gives us that every moment, in the past, has a cone that starts and moves forward
(from left to the right, in the figure below).
(Out) To some desirable futures and out to some not so desirable futures.
Where we should be aware of as much, as possible, of what is going on, around us, and try to direct our
trajectory in the future cone to desirable futures.
Taking all of these ''unexpected'' forces and trends into account, might even make the future a litte less ''unexpected''?
Indeed, many good points and reflections in the talk.
A vector database, or vector store, is a database that can store vectors (fixed-length lists of numbers) along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.
Vector databases are often used to implement Retrieval-Augmented Generation (RAG), a method to improve domain-specific responses of large language models
(See section 1.5.1.).
With vector databases, LLM answers can be:
More precise.
Quote sources.
2.2. ControlNet: A Game Changer for AI Image Generation.
Lars Gregori, SAP CX, talked about ''ControlNet: A Game Changer for AI Image Generation''.
We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models.
...
We test various conditioning controls, eg, edges, depth, segmentation, human pose [10].
I.e.
ControlNet is a family of neural networks fine-tuned on Stable Diffusion that allows us to have more structural and artistic control over image generation. It can enhance the default Stable Diffusion models with task specific conditions [11].
It did indeed appear to given some extra control in the systems image generation.
So, pretty cool.
2.3. From Crapbot to ChatGPT - How we finally made chatbots work.
Pieter Buteneers, Transfo.energy, talked about ''From Crapbot to ChatGPT - How we finally made chatbots work''.
Indeed, it all started with ELIZA: A very basic Rogerian psychotherapist chatbot [12].
Eliza worked by recognizing keywords in a user's statement, and then reflecting them back in the answer to the user [13].
Moving on from Eliza, later versions of chatbots could take all the words in a sentence.
Make embeddings for these words. And then calculate the average for these words.
This would then represent the ''meaning'' of the sentence.
Where one could then use some ML algorithm (e.g. logistic regression) to classify it as a certain intent, or not.
With transformers and attention layers
it begins to be possible for the chatbot to ''understand'', how words in a sentence are connected to each other.
And, soon after, the whole world began started talking about chatBots passing bar exams etc.
We document our experimental evaluation of the performance of OpenAI’s text-davinci-003 model, often-referred to as GPT-3.5, on the multistate multiple choice (MBE) section of the exam [14].
An amazing journey indeed.
2.4. The LLM-Year in Review and an Outlook Into the Future.
Dominik Meissner, 169 Labs GmbH, talked about ''LLM-Year in Review and an Outlook Into the Future''.
The year has seen the hype of large language models across all industries.
In this talk, we will reflect on milestones, challenges, and advancements.
And, we, the audience, were asked which LLMs we had looked into in the last 12 months:
But what about LLM hallucinations then?
Here, the Gallileo hallucination index gives some insides on
which models to use:
Choosing a good model, then becomes a question of balancing hallucinations, cost and latency.
Again, a good overview, indeed.
2.5. Functional Deep Learning in Haskell.
Raoul Schlotterbeck, Active Group GmbH, talked about ''Functional Deep Learning in Haskell''.
Mathematically, neural networks are parameterized functions; optimization methods for training neural networks are higher-order functions that optimize parameters based on training data. (Sadly) Common deep learning frameworks contaminate the mathematical model with their implementation details.
In the programming language Haskell, thanks to Conal Elliot's compiler plugin "ConCat", the (indirect programming) graph based model is completely eliminated, as is the restriction to numeric arrays. Essential concepts become visible - functions, optimization, and differentiability.
Indeed, it all looked rather simple in Haskell.
Clever.
2.6. Utilizing Generative AI for Business Process Automation.
Ruben Bösche, Data Assessment Solutions, talked about ''Utilizing Generative AI for Business Process Automation''.
Based on the example of automation of project staffing (When management make a resource requests), we present how various applications of generative AI models are used independently in different tasks of the process.
And show how they can use LLM's to pass CV's (15x faster than by hand).
Giving consistent and more comparable results.
Also for CV matching, embeddings can be useful.
First. CV texts are converted into high-dimensional vectors.
Where semantically similar texts are close to each other in the vector space.
And we can then use nearest neighbor (NN) search
(To find matching CV's).
Advantages:
No human decision bias.
Faster way to scan all profiles.
Potentially, better skills assessments.
Next we could potentially create a HR-chatbot that companies can to talk to in order find the right candidates
(based on CV's).
But it could also work the other way around. - where LLMs create CV's based on natural language
(Here, the LLM will decide how to display projects and skills in the CV).
Useful, indeed.
2.7. Conference Raffle.
Hopeful participants in the conference raffle:
A lottery with a little help from ChatGPT.
ChatGPT finds the winner:
More conference impressions... E.g. see ...
2.8. A Recommender System for an Audio-on-Demand Platform.
Mirza Klimenta, Falcony AI GmbH, talked about ''A Recommender System for an Audio-on-Demand Platform''.
A recommendation system suggest, or recommend, additional products to consumers. These can be based on various criteria, including past purchases, search history, demographic information, and other factors.
In this talk, we got some insight into how:
User-item interactions can be treated as interconnected nodes within a graph. Eventually, forming the basis for a recommender system.
For more about ''Graph Neural Network (GNN)'' Architectures for Recommendation Systems, see here: [15].
Interesting, indeed.
2.9. End of conference. Goodbye & see you next year.