Data Services

Generative AI

With the rising popularity of Artificial Intelligence (AI) and Large Language Models (LLMs), understanding these technologies is more important than ever. This webpage will guide you through the basics of AI, including its definition, ethical considerations, and practical guidelines for its use.

The fast pace of GenAI development can be overwhelming, but we're here to help. The Claude Moore Health Sciences Library currently offers workshops on the following GenAI topics: basic guidelines & prompt engineering, code assistance, data analysis, and literature searching. View the current listing of workshops here. If you're unable to attend these workshops and would like to schedule a similar session for your department, lab, or other group, contact us at hsl-rdas@virginia.edu or through our Ask Us form.

GenAI Overview

Artificial Intelligence(AI) can refer to many different types of analyses and tools. Let's start by defining AI and the technologies that fall within that domain.

What is artificial intelligence(AI)?

Artificial intelligence (AI) encompasses a broad field that is primarily interested in training intelligent machines which can help solve problems. From that broad categorization, AI can be broken down into more specific layers:

Artificial intelligence: The broadest category, encompassing all areas involved in creating intelligent machines.
Machine learning: A large category that focuses on pattern recognition in structured data, often stored in tabular format, and frequently employs statistical methods. For instance, linear regression is a supervised machine learning technique.
Deep learning: A subset of machine learning that uses artificial neural networks, a computation method inspired by the human brain. Due to its complexity, deep learning excels at handling unstructured data types such as images, text, or audio.
Large language models: Specifically designed for text, AI chatbots like ChatGPT are based on LLMs trained on particular data sets. LLMs are also a form of generative AI, as they can produce new content.

What are large language models(LLMs)?

Large language models (LLMs) are advanced deep learning models designed to generate human-like language. Broadly, LLMs are trained to take a series of words as input and predict the next word as output. To be able to do this, they are trained on vast amounts of text, often sourced from publicly available internet resources. The training process has two main stages. First, LLMs are pre-trained on large generic datasets to predict the next word in a sentence. Then, they undergo fine-tuning with specific knowledge bases to enhance their performance or focus on specific tasks.

Supervised Finetuning on LLMs. Source: Neo4j

GenAI Best Practices

If you are new to working with GenAI, it can feel overwhelming. Here are a few key principles to help you L.E.A.R.N. how to safely use GenAI.

L: Lock Down Data

Protect sensitive data and respect intellectual property when using GenAI tools

E: Engineer Prompts

Clear instructions, context, and role are key to unlocking GenAI's best responses

A: Audit Results

Human expertise is crucial to ensure GenAI's accuracy and fairness

R: Refine Through Interaction

Enhance GenAI outputs through conversation and collaboration

N: Note, Credit, Cite

Transparently acknowledge GenAI's role in your work

1. Lock Down Data

When using GenAI, it is important to be mindful of the information you upload. Depending on the specific model you are using, information from your chat prompts may be stored and used for future model training. Even if the company does not use that data for training, the storage of previous chat sessions for future reference still presents a potential risk of a data breach. The current recommendation from UVA is to use UVA-licensed AI tools such as Microsoft Coplilot Chat, which have additional security protections, whenever possible.

The UVA University Data Protections Standards (UDPS) categorizes data into the following four categories: Highly Sensitive, Sensitive, Internal Use, and Public. UVA-licensed GenAI tools have been approved for up to Sensitive data which is the default classification for information that is not considered Highly Sensitive data. Highly Sensitive data includes data like personal information that could lead to identify theft and protected health information (PHI). Highly sensitive data should never be uploaded to any AI tool. More details can be found in the GenAI Responsible Use Guidelines from Information Technology Services and examples of UDPS data classifications are provided below.

Highly Sensitive Data (HSD): Social Security numbers, credit card or debit card numbers, account passwords, medical records numbers, medical history or any other PHI
Sensitive Data: FERPA-protected student information (that does not include HSD): University records, University ID numbers, student data, research data (that does not include HSD)
Internal Use: Salary information, contracts, emails, financial reports, human resources information
Public: Published datasets, public websites, published research

*Public data may be subject to intellectual property or copyright restrictions

When uploading information to GenAI, you must also consider the potential for intellectual property or copyright concerns. The Terms of Use outlined in UVA’s Generative AI Use Guidelines specifically prohibit users from infringing upon the intellectual property of others through their use of GenAI tools. This means you need to think carefully about the information you upload to GenAI tools because you may violate restrictions around copyrighted material or commit plagiarism unintentionally.

2. Engineer Prompts

A prompt is how you start and continue your conversation with GenAI. While chat-based GenAIs allow you to use natural language, the way you prompt GenAI will affect the quality of its responses. There are many different prompting frameworks, but there are a few key components that are normally included which are described below. In particular, the amount of context provided about a task can drastically change the response from GenAI. Similarly, assigning a role for the AI to model its responses after can also improve results.

Instruction: A specific task for the model to perform
Context: Additional details or information you can provide the model for better responses
- Role: Defining a role or persona the AI should model its response after (medical librarian, statistician, manuscript reviewer)
Input: Any input data (text or data to be analyzed)
Output format: How the output should be formatted (formal text, bullet points, tables, etc. )

3. Audit Results

Although GenAI can be a helpful assistant, it's essential to always keep a human in the loop to verify results. GenAI may produce false or biased results and the user is ultimately responsible for any final products or decisions made. This means that the user must already have the expertise to validate those results or take additional steps to confirm information provided by GenAI. You should never copy and paste output from GenAI without validating it first.

The United Nations Educational, Scientific and Cultural Organization (UNESCO) released a quick start guide on GenAI in higher education which included the flow chart to the right. It reiterates the importance of understanding that GenAI may produce false results and that human expertise is still required to verify information.

Understanding this responsibility also means recognizing the different types of bias that can appear in GenAI outputs and how they may affect your work. For example, there may be gender, racial, or religious bias in the outputs from GenAI. To illustrate, given a scenario with a doctor and a nurse, GenAI may assume the doctor is a man and the nurse is a woman. For more examples of bias in GenAI, review the Types of Bias section in ChatGPT in STEM Teaching: An Introduction to using LLM-based tools in Higher Ed (Hyzyk & Misanchuk, 2024) .

**When is it safe to use ChatGPT?**

(modified figure from Sabzalieva & Valentini, 2023)

4. Refine Through Interaction

After auditing the output from GenAI, refinements are often necessary. Results can be refined by either directly editing the output from GenAI or by re-prompting GenAI with additional requests. During this process, it's important to remember that GenAI should be used as a collaborative tool.

Think of using GenAI like working with a helpful colleague (a potential role in your prompt). To make this interaction more natural, maintain a conversational tone by asking follow-up questions or making suggestions. Additionally, asking GenAI if it needs any more information from you before completing its task allows it to ‘prompt’ you back. This, again, leads to a more collaborative interaction and relevant results.

5. Note, Credit, Cite

GenAI is a rapidly evolving field and we’re all still learning how to best use it. As GenAI tools become more integrated into our work, it’s important to be transparent about their use. According to UVA’s Generative AI Use Guidelines, users must acknowledge GenAI assistance when sharing content that has been partially or fully generated by GenAI. The level of acknowledgment required will depend on the extent that GenAI was used. If you’re unsure about needing to acknowledge GenAI use, think about whether you'd acknowledge a colleague for similar help. For example, minor editing assistance may not require acknowledgment, but using GenAI to write entire sections of text would.

Additionally, consider the venue where the content will be submitted and published. Many academic journals now provide guidelines on when and how to acknowledge the use of GenAI. Similarly, class syllabi should be consulted for any specific guidelines on GenAI usage for class assignments. One set of guidelines from the Association for Computing Machinery is paraphrased below. Additional guidelines from Publishers can be found on the Shannon Library GenAI Guide.

Disclose the use of generative AI tools (e.g., ChatGPT, Jasper) for generating text, images, tables, code, etc. in the acknowledgements or prominently in the Work, based on the proportion of content generated.
If entire sections (tables, graphs, images) are AI-generated, specify the tools, versions, prompts, and any post-generation edits in an Appendix or Supplementary Material.
Note that the allowable amount/type of AI-generated text varies by section type. For example, using such tools to generate portions of a Related Work section is fundamentally different than generating novel results or interpretations.
For small amounts of generated text (limited to phrases or sentences), add a footnote and a general disclaimer in the Acknowledgements.
No disclosure is needed for using AI tools to edit and improve existing text, similar to using Grammarly for spelling and grammar.

Additional Resources

UVA Arts & Sciences Learning Design and Technology: Getting Started with GenAI

This page from the UVA Arts & Sciences Learning Design and Technology group provides practical guidance on how to begin using GenAI as well as discussing data security and ethical concerns. Example prompts for text generation, image generation, and coding/data analysis assistance are provided.

UVA Shannon Library: GenAI Guide

This page from the UVA Shannon Library provides a comprehensive introduction to GenAI at UVA. Topics covered include a general overview, ethical and copyright concerns, citing GenAI, and more.

Office of the Vice President for Research: AI and Research

The page from the Office of the Vice President for Research discusses the importance of research integrity while using AI. It provides scenarios where AI could be misused as well as acceptable use cases.

Provost's Office: Guidance for Faculty & Students on GenAI

This page from the Provost's Office contains FAQs to provide guidance on the appropriate use of GenAI tools in teaching and learning at UVA.

Teaching Hub: Generative AI in Teaching and Learning

This page from the UVA Teaching Hub has a collection of guides and tutorials on using GenAI for teaching and learning in higher education.

UVA Advancement Hub: AI Resource Center

This site from the UVA Advancement Hub provides guidance on how to use Copilot Chat in your work. While this site is designed for UVA Advancement Hub employees, much of the guidance and information provided is applicable to anyone using GenAI.

UVA Darden's LaCross Institute for Ethical Artificial Intelligence in Business

The LaCross Institute is designed to explore the use and impact of GenAI on business and education. They also collaborate with the UVA School of Data Science.

GenAI Coding Checklist

With the increasing use of generative AI chatbots, people are now using them to assist with programming and data analysis. However, there are some important things to keep in mind when using GenAI for programming, especially if you're still a beginner. To help make sure you end up with the correct code, follow the GenAI Coding Checklist below.

You often have to provide more context in your prompt than you would think. Below is an example prompt that could assist a student learning how to program in R.