Navigating the OCI Generative AI Professional Certification: Strategies and Personal Insights

7 min readMay 31, 2024

I recently cleared the Oracle Cloud Infrastructure (OCI) Generative AI Professional certification exam and thought of sharing my knowledge and insights that I gained through this.

The Oracle Cloud Infrastructure 2024 Generative AI Professional
Exam (two free attempts till July 2024) is designed for developers, machine learning/AI engineers who have a basic understanding of machine learning and deep learning concepts, familiarity with Python and OCI. In this learning path, apart from gaining strong conceptual knowledge, you will learn how to build and deploy a sample LLM application. Many demos are also part of the course.

I would recommend taking up the Oracle Cloud Infrastructure 2023 AI Foundations Associate Exam (free), prior to this, as it helps to understand the foundational concepts related to AI. You can refer to this cheat sheet to understand all the key concepts. By taking both the exams, you will gain valuable knowledge and expertise in AI/Generative AI.

Large Language Models

Large Language Model (LLM) refers to a type of artificial intelligence model designed to understand and generate human-like text based on the data it has been trained on. These models are characterized by their vast size, typically consisting of billions or even trillions of parameters, which are the adjustable weights that help the model make predictions or generate text.

LLM architectures

Encoder: Encoder models are designed to encode text, that is, produce embeddings and is based on transformer architecture. By embedding text, we’re generally referring to the process of converting a sequence of words into a single vector or a sequence of vectors. Embedding of text is a numeric representation of the text that typically tries to capture the semantics or meaning of the text.

Decoder: Decoder models are designed to decode or generate text. The input to a text generation model is a sequence of words, and the output is a generated sequence of words. A decoder only produces a single token at a time.

Encoder-Decoder: The encoder processes the input sequence and compresses it into a fixed-size context vector, capturing the essential information. The decoder then uses this context vector to generate the output sequence step-by-step. Encoders and decoders can come in all different kinds of sizes. Size refers to the number of trainable parameters that the model has.

Prompting

Prompting, in the context of LLMs like GPT-4, refers to the process of providing an initial input or “prompt” to the model to generate a response or complete a task. The prompt acts as a starting point or instruction, guiding the model to produce relevant and coherent text based on its training data. Prompt engineering, is the process of iteratively refining the model input in an attempt to induce a probability distribution in the vocabulary for a particular task. This is done by modifying the inputs of the model to get closer and closer to the response that we want.

Types of prompting

In-context learning: Constructing a prompt that has demonstrations of the task that the model is meant to complete.

K-shot prompting: Including k examples of the task that you want the model to complete in the prompt. In Zero-shot prompting, no examples are provided within the prompt.

Chain-of-thought prompting: Prompt the model to break down the steps of solving the problem into small chunks.

Least to most prompting: Solve simpler problems first, and use the solutions to the simple problems to solve more difficult problems.

Step-back prompting: Identify high-level concepts pertinent to a task.

Issues with prompting

Prompting can be used to elicit unintended or even harmful behavior from a model. In prompt injection, the prompt is designed to elicit a response from the model that is not intended by the developer. In leaked prompt, the model is coaxed to reveal the data it's been trained on or reveal some sensitive information.

Hallucination refers to a phenomenon where a model generates content that is not grounded in reality or lacks factual accuracy. The threat of hallucination is one of the biggest challenges to safely deploying LLMs.

Training

Sometimes, prompting is insufficient. For example, when a model is trained on data from one domain, and you want to use it for a new domain, then prompting may not work. Training, will help in such cases where the parameters of the model are changed.

Fine-tuning: In fine-tuning, a pre-trained model, for example, BERT, and a labeled dataset is trained to perform the task by altering all of its parameters.

Parameter efficient fine-tuning: In PEFT a very small set of the model’s parameters is isolated to train, or a handful of new parameters are added to the model. For example, in LORA (Low Rank Adaptation) the parameters of the model are not altered, but additional parameters are added and trained.

Soft prompting: In soft prompting, specific parameters are added to the prompt, that acts as input to the model to perform specific tasks. This is another economic training option.

Continual pretraining: This is similar to fine-tuning, where all the parameters of the model are changed. However, continual pretraining is used for unlabeled data.

Decoding

There are various decoding techniques:

Greedy decoding: In this approach, at each step of the sequence generation process, the model selects the token (word or character) with the highest probability as its next output. This process continues until an end-of-sequence token is produced, or the sequence reaches a predefined maximum length.

Nucleus sampling: Also known as top-p sampling, is a sophisticated decoding strategy. Unlike greedy decoding, which always selects the most probable token, nucleus sampling considers a dynamic subset of the top probable tokens, allowing for more nuanced and varied text generation.

Beam search: Is an extension of the greedy decoding approach and aims to improve the quality of generated sequences by considering a set of candidate sequences instead of just the single most probable one.

Retrieval Augmented Generation (RAG)

RAG is an approach in natural language processing (NLP) that combines elements of both retrieval-based and generative models to produce high-quality, contextually relevant text. In this approach, a generative model (such as a language model) is augmented with a retrieval mechanism that retrieves relevant information from a large external knowledge source, such as a database or a corpus of text. This retrieved information is then used to guide or enhance the generation process of the model, resulting in more informed and contextually rich outputs.

There are two ways to implement RAG, sequence model and token model. The RAG sequence model focuses on generating entire sequences of text, such as paragraphs, documents, or longer pieces of content. The RAG token model, on the other hand, operates at the token level and is typically used for tasks where fine-grained control over individual tokens is required, such as text completion, question answering, or dialogue generation.

Vector Databases

Vector databases, also known as vector stores, are specialized databases designed to efficiently store, manage, and query high-dimensional vector data. These databases are particularly well-suited for applications involving machine learning, natural language processing, computer vision, recommendation systems, and other domains where data is represented as vectors. Many vector databases use a distributed architecture to handle the storage and computational demands of large scale, high-dimensional data that allows horizontal scaling, improved performance and storage capacity.

Semantic Search

Semantic search is an advanced information retrieval technique that aims to improve the accuracy and relevance of search results by understanding the meaning (semantics) behind the search query and the documents being searched. Unlike traditional keyword-based search, which relies solely on matching keywords, semantic search takes into account the intent, context, and semantics of both the query and the documents to return more precise and contextually relevant results.

OCI Generative AI Service

Oracle Cloud Infrastructure (OCI) Generative AI service is a fully-managed platform that allows you to leverage generative AI models for various text-based tasks. Here’s a breakdown of its key features:

Pre-trained Models: OCI Generative AI provides access to state-of-the-art large language models (LLMs) from Cohere and Meta. You can use these for tasks like summarization, text generation, translation and information extraction.

Fine-Tuning Capabilities: The service allows you to fine-tune these pre-trained models on your own data. This customization can significantly improve the model’s performance on specific tasks relevant to your business needs.

Dedicated Resources: OCI Generative AI utilizes isolated AI clusters for both fine-tuning and hosting custom models. This ensures security and optimal performance for your workloads.

Flexibility and Control: The service offers control over your models. You can create endpoints, update them, or even delete them as needed. Additionally, you can manage the compute resources allocated to your custom models.

Overall, by taking the OCI Generative AI Professional certification you can gain strong understanding of the LLM architecture, OCI Generative AI service offerings, and about building and deploying LLM applications. The course can be completed in under 7 hours which includes mock tests and thereafter you can schedule and take up the online exam. It’s a multiple choice exam comprising 40 questions that you need to complete in 90 minutes. The passing score is 65%. Happy learning!