01. Document summary

This tutorial will take a look at how to do a document summary.

Below is the main overview of the tutorial.

Stuff: summary of the entire document at once
Map-Reduce: batch merge after split summary
Map-Refine: gradual merging after split summary
Chain of Density: Runs N times repeatedly, complements missing entity, and improves summary
Clustering-Map-Refine: Divide the document's Chunk into N clusters, and Refine summary for a summary of documents close to the center point in each cluster.

Representatively known summary method

The central question when building a summary is how to deliver the document to LLM's context window. Here are some known ways to do this:

Stuff : It's simply a way to "put" all documents in a single prompt. This is the simplest approach.
Map-reduce : This is how each document is summarized individually in the "map" step, and then the summary in the "reduce" step is combined into the final summary.
Refine : Organize responses by traversing input documents and repeatedly updating answers. For each document, you get a new answer by entering all the inscriptions, the current document, and the latest intermediate answer to chain.

Copy

# Configuration file for managing API KEY as environment variable.
from dotenv import load_dotenv

# Load API KEY information
load_dotenv()

Copy

 True

Copy

# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("Summary")

Copy

 Start tracking LangSmith. 
[Project name] 
Summary

Stuff

stuff documents chain ("stuff" means "fill" or "fill") is the simplest way in the document chain. Take the document list, insert it all into the prompt, and then forward the prompt to LLM.

This chain is suitable for applications where documents are small and only a few are delivered to most calls.

Load data.

Copy

from langchain_community.document_loaders import TextLoader

# Load news data
loader = TextLoader("data/news.txt")
docs = loader.load()
print(f"Total number of characters: {len(docs[0].page_content)}")
print("\n========= Preview of the front part =========\n")
print(docs[0].page_content[:500])

Copy

 Total letters: 1708 

========= Preview of the front ========= 

title:  
AI2 launches'real' open source LLM'olmo' free to commercial use 

Contents: 
AlanAI Institute (AI2) has released a complete open source large language model (LLM)'OLMo' ’. The entire process of data collection, learning, and distribution is transparently disclosed, and it is an open source LLM in a true sense that even commercial use has been allowed. 
Venture Beat reported that AI2, a non-profit private AI research institute on the 1st (local time), launched ‘Olmo’, which introduced ‘the first true open source LLM and framework ’.  
According to this, Olmo provides not only model code and model weights, but also training codes, training data, related toolkits and evaluation toolkits. This allows you to better understand how the model was built, how LLM works and how it generates responses.  
The Olmo Framework provides a 4-digit variant model with ‘Olmo 7B’ with 70 billion parameters and a ‘Olmo 1B’ model with 100 billion parameters. Models include code that generates training data.

Below is a prompt with the phrase to write a summary in Korean.

Copy

from langchain import hub

prompt = hub.pull("teddynote/summary-stuff-documents-korean")
prompt.pretty_print()

Copy

 Please summarize the sentence according to the following REQUEST. 
REQUEST: 
One. Summarize the main points in bullet points in KOREAN. 
2. Each summarized sentence must start with an emoji that feet the meaning of the each sentence. 
3. Use various emojis to make the summary more interesteding. 
4. Translate the summary into KOREAN if it is written in ENGLISH. 
5. DO NOT translate any technical terms. 
6. DO NOT include any unnecessary information. 

CONTEXT: 
{context} 

SUMMARY:"

Copy

from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_teddynote.callbacks import StreamingCallback


llm = ChatOpenAI(
    model_name="gpt-4o-mini",
    streaming=True,
    temperature=0,
    callbacks=[StreamingCallback()],
)


stuff_chain = create_stuff_documents_chain(llm, prompt)
answer = stuff_chain.invoke({"context": docs})

Copy

 -🚀 AlanAI Institute (AI2) has released a complete open source LLM'OLMo'.  
-📊 Data collection, learning, distribution processes are transparently disclosed and commercial use is permitted.  
In-depth analysis is possible by providing ️ model code, weights, training codes, data and evaluation toolkits.  
-📈 Contains various variant models such as'Olmo 7B' and'Olmo 1B'.  
-🔍 Training data is based on AI2's'Dolma' data set, featuring three-shelled tokens.  
-📜 There is no limit to commercial use according to the Apache 2.0 license.  
-💡 AI2 said it helps researchers build a safe and reliable LLM system.  
-🏆 Olmo has the same performance as commercial products, and has performed well in some benchmark tests.  
-🌍 There are restrictions on non-English language and code generation capabilities.  
-🔄 AI2 plans to continuously improve Olmo.  
-🌐 Currently, all Olmo resources are provided free of charge on chewers and hugglingfaces.

Map-Reduce

A summary of the Map-reduce method is a technique for efficiently summarizing long documents.

This method first consists of a "map" step that divides the document into small chunk, and a "reduce" step that combines a summary of each chunk.

In the Map phase, each chunk is summarized in parallel
In the reduce phase, these summaries are incorporated into one final summary.

This approach is particularly useful when dealing with large-scale documents, allowing you to bypass the token restrictions in language models.

Load data.

Copy

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("data/SPRI_AI_Brief_2023년12월호_F.pdf")
docs = loader.load()
docs = docs[3:8]  # Here is a summary of only part of the document:
print(f"총 페이지수: {len(docs)}")

Copy

 Total pages: 5

Map

The map phase creates a summary for each Chunk.

(In fact, the rectification is a summary generation for Chunk, but I proceed by changing to extracting the core content. It doesn't matter because it's the process of putting summaries together in the reduce phase anyway.)

I thought this method was more valid, but I can proceed by changing at my own discretion whether to summarize in the map phase or extract the core content.

Copy

from langchain import hub
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(
    temperature=0,
    model_name="gpt-4o-mini",
)

# Download map prompt
map_prompt = hub.pull("teddynote/map-prompt")

# Prompt Output
map_prompt.pretty_print()

Copy

 ================================ System Message ================================ 

You are a professional main thesis extractor. 

================================ Human Message ================================= 

Your task is to extract main thesis from given documents. Answer should be in name language as given document.  

#Format:  
- thesis 1 
- thesis 2 
- thesis 3 
- ... 

Here is a given document:  
{doc} 

Write 1~5 sentences. 
#Answer:

Generates map_chain.

Copy

# Create map chain
map_chain = map_prompt | llm | StrOutputParser()

Call batch() to generate a summary for each document.

Copy

# Extract key content from documents
doc_summaries = map_chain.batch(docs)

Copy

# Output the number of summarized documents
len(doc_summaries)

Copy

# Print summary of some documents
print(doc_summaries[0])

Copy

 -President Biden of the United States has issued an administrative order to ensure safe and reliable AI development and use. 
-These mandates include AI's safety and security standards, privacy protection, equity and citizenship improvement, consumer protection, worker support, innovation and competition promotion, and international cooperation. 
-AI system development companies need to share safety test results and system information with the US government, and establish standards and best practices for AI-generated content. 
-We will expand measures to prevent discrimination and prejudice from irresponsible use of AI to improve equity and citizenship. 
-We plan to promote AI research across the United States through the National AI Researcher (NAIRR) and support technology and infrastructure for SMEs and developers.

Reduce

In the Reduce phase, the key content done in the map phase is incorporated into one final summary.

Copy

# Download reduce prompt
reduce_prompt = hub.pull("teddynote/reduce-prompt")

# Prompt Output
reduce_prompt.pretty_print()

Copy

 ================================ System Message ================================ 

You are a professional summarizer. You are given a list of summaries of documents and you are asked to create a single summary of the documents. 

================================ Human Message ================================= 

#Instructions:  
One. Extract main points from a list of summaries of documents 
2. Make final summaries in bullet points format. 
3. Answer should be written in {language}. 

#Format:  
- summary 1 
- summary 2 
- summary 3 
- ... 

Here is a list of summaries of documents:  
{doc_summaries} 

#SUMMARY:

Generate Reduce Chain.

Copy

# Create a reduce chain
reduce_chain = reduce_prompt | llm | StrOutputParser()

Below is an example of streaming output using Reduce Chain.

Copy

from langchain_teddynote.messages import stream_response

answer = reduce_chain.stream({"doc_summaries": doc_summaries, "language": "Korean"})
stream_response(answer)

Copy

 President Biden of the United States issued an administrative order to ensure the safety and reliability of AI, and demanded sharing of the safety test results of AI system development companies. 
-G7 agreed to an international action force for AI companies through the'Hiroshima AI Process' and proposed developing an authentication mechanism to increase the reliability of AI-generated content. 
-28 countries have announced the Leslie Declaration at the AI Safety Summit, the launch of the AI Safety Institute and the safety test plan. 
-The US court dismissed the copyright infringement lawsuit against AI by artists and pointed out the issue of non-registration. 
-The U.S. Federal Trade Commission (FTC) expressed the potential for consumer damage to the generating AI and concerns about Big Tech's market dominance and emphasized maintaining a fair competitive market.

Copy

from langchain_core.runnables import chain


@chain
def map_reduce_chain(docs):
    map_llm = ChatOpenAI(
        temperature=0,
        model_name="gpt-4o-mini",
    )

    # Download map prompt
    map_prompt = hub.pull("teddynote/map-prompt")

    # Create map chain
    map_chain = map_prompt | map_llm | StrOutputParser()

    # Create a chain by concatenating the first prompt, ChatOpenAI, and the string output parser..
    doc_summaries = map_chain.batch(docs)

    # Download reduce prompt
    reduce_prompt = hub.pull("teddynote/reduce-prompt")
    reduce_llm = ChatOpenAI(
        model_name="gpt-4o",
        temperature=0,
        callbacks=[StreamingCallback()],
        streaming=True,
    )

    reduce_chain = reduce_prompt | reduce_llm | StrOutputParser()

    return reduce_chain.invoke({"doc_summaries": doc_summaries, "language": "Korean"})

Copy

# Output the results
answer = map_reduce_chain.invoke(docs)

Copy

 -President Biden of the United States has issued an administrative order to ensure safe and reliable AI development and use. These orders include AI's safety and security standards, privacy protection, equity and citizenship improvement, consumer protection, worker support, innovation and competition promotion, and international cooperation. 
-G7 agreed to an international action force for AI companies through the'Hiroshima AI Process'. This Code requires risk assessment and mitigation throughout the AI life cycle, ensuring transparency and accountability, and cooperation between information and stakeholders. 
-28 countries have issued a declaration of the Leslie at the AI Safety Summit in Blasley Park, UK, with a way to ensure AI safety. The Declaration emphasizes the importance of all stakeholders' cooperation to ensure the safety of the AI system, and points out the responsibilities of advanced AI development companies. 
-The Northern California District Court of the United States has dismissed copyright infringement lawsuits filed by artists to create AI companies. For reasons of dismissal, the works presented in the complaint were not registered with the Copyright Office, and it was difficult to prove the similarity between AI-generated images and certain works. 
-The U.S. Federal Trade Commission (FTC) has expressed concerns about the potential for consumer and creator damage from generating AI and strengthening Big Tech's market dominance. The FTC warned that the development and distribution of generated AI could pose a variety of risks, including consumer privacy breaches, automation of discrimination and prejudice, and fraud.

Map-Refine

The Map-refine method is another approach for document summarization, similar to map-reduce, but with some differences.

Map steps: Divide documents into multiple small chunks, and create summaries individually for each chunk.
Refine phase: sequentially handles the generated summaries and gradually improves the final summary. At each stage, the summary is updated by combining the previous summary with the information from the new chunk.
Repeat process: Repeat the refine step until all chunk is processed.
Final summary: The summary obtained after processing up to the last chunk will be the final result.

The advantage of the map-refine method is that you can gradually improve the summary while maintaining the order of the documents. This can be especially useful when the context of the document is important. However, this method is handled sequentially compared to map-reduce, so parallelization is difficult and can take longer to process large documents.

Copy

 [{'missing_entities':'AI International Action Decree; Hiroshima AI Process;G7 Country','denser_summary': On October 30, 2023, G7 countries agreed to the AI International Action Force for AI companies through the Hiroshima AI process. This Code of Conduct recommends voluntary adoption for AI risk identification and mitigation, and requires risk assessment, transparency, accountability, information sharing and security control throughout the AI life cycle.'}, {'missing_entities':'Risk Assessment; Information Sharing;AI Governance','denser_summary':'2023 years 30 months, G This Code of Conduct recommends voluntary adoption for AI risk identification and mitigation, and requires risk assessment, information sharing, AI governance and security control.'}, {'missing_entities':'AI life cycle; Transparency; Trusted Content','denser_summary':'2023 years 30 months old, G7 This Code of Conduct requires risk assessment, transparency, reliable content authentication, and information sharing throughout the AI life cycle.'}, {'missing_entities':'Strong security controls; Social risk; Climate crisis response','denser_summary': '2023 October 30,  G7 countries have agreed to the AI International Action Force for AI companies through the Hiroshima AI process. This Code of Conduct calls for risk assessment, strong security control, social risk mitigation and climate crisis response throughout the AI life cycle.'}, {'missing_entities':'AI generated content; Risk-based approach; International Technical Standards','denser_summary':'October 30, 2023, G7 countries have AI companies through Hiroshima AI processes This Code of Conduct requires AI-generated content identification, risk-based access, development of international technical standards, and risk assessment throughout the AI life cycle.'}] 

### CoD Summary 1/5, added entity: AI International Action Force, Hiroshima AI Process, G7 Country 

On October 30, 2023, G7 countries agreed to the AI International Action Force for AI companies through the Hiroshima AI process. This Code of Conduct is AI 
We recommend voluntary adoption for risk identification and mitigation, and require risk assessment, transparency, accountability, information sharing and security control throughout the AI life cycle. 

### CoD Summary 2/5, added entity: risk assessment, information sharing, AI governance 

On October 30, 2023, G7 countries agreed to the AI International Action Force for AI companies through the Hiroshima AI process. This Code of Conduct is AI 
We recommend voluntary adoption for risk identification and mitigation, and require risk assessment, information sharing, AI governance and security control. 

### CoD Summary 3/5, added entity: AI life cycle, transparency, reliable content 

On October 30, 2023, G7 countries agreed to the AI International Action Force for AI companies through the Hiroshima AI process. This Code of Conduct is AI 
It requires risk assessment throughout the life cycle, ensuring transparency, authenticating reliable content and sharing information. 

### CoD Summary 4/5, added entity: strong security control, social risk, climate crisis response 

On October 30, 2023, G7 countries agreed to the AI International Action Force for AI companies through the Hiroshima AI process. This Code of Conduct is AI 
It requires risk assessment throughout the life cycle, strong security controls, social risk mitigation and climate crisis response. 

### CoD Summary 5/5, added entity: AI generated content, risk-based access, international technical standards 

On October 30, 2023, G7 countries agreed to the AI International Action Force for AI companies through the Hiroshima AI process. This Code of Conduct is AI 
Requires identification of generated content, risk-based access, development of international technical standards and risk assessment throughout the AI life cycle. 


============== [final summary] ================= 

On October 30, 2023, G7 countries agreed to the AI International Action Force for AI companies through the Hiroshima AI process. This Code of Conduct requires AI-generated content identification, risk-based access, international technology standards development, and risk assessment throughout the AI life cycle.

Copy

# Initialize an empty list to store the results
results: list[dict[str, str]] = []

# Run cod_chain in streaming mode and process partial JSON results
for partial_json in cod_chain.stream(
    {"content": content, "content_category": "Article"}
):
    # Update results at each iteration
    results = partial_json

    # Print current result on the same line (overwrite previous output using carriage return)
    print(results, end="\r", flush=True)

# Calculate total summary count
total_summaries = len(results)
print("\n")

# Iterate through each summary and process it.
i = 1
for cod in results:
    # Extract and format missing entities
    added_entities = ", ".join(
        [
            ent.strip()
            for ent in cod.get(
                "missing_entities", 'ERR: "missing_entiies" key not found'
            ).split(";")
        ]
    )
    # Extract more dense summaries
    summary = cod.get("denser_summary", 'ERR: missing key "denser_summary"')

    # Print summary information (number, total count, added entities)
    print(
        f"### CoD Summary {i}/{total_summaries}, 추가된 엔티티(entity): {added_entities}"
        + "\n"
    )
    # Print the summary by wrapping it into 80 characters
    print(textwrap.fill(summary, width=80) + "\n")
    i += 1

print("\n============== [최종 요약] =================\n")
print(summary)

So, rather than simply connecting, to make the next chunk seem to cover the previous chunk and repeatedly add streaming Carriage return printing is required.

Partial JSON streaming. Each streamed chunk is a list of the same JSON dicks with new suffixes added.

Copy

 SPRi AI Brief |  
December 2023 
2G7 agrees to international action decree for AI companies through Hiroshima AI process 
nG7 is voluntary for AI risk identification and mitigation for companies developing advanced AI systems  
Establish an AI International Action Force that recommends adoption 
The n-behavioral order includes risk assessment and mitigation throughout the AI life cycle, ensuring transparency and accountability, and information and  
Requires action such as stakeholder collaboration, security controls, content authentication and source verification.KEY Contents 
£G7 establishes international action force for risk management in advanced AI systems 
n Main 7 countries (G7)* Silver October 30, 2023 ‘Hiroshima AI process ’AI International for AI companies  
Agreement on International Code of Conduct for Advanced AI Systems 
∙G7 prepared an international standard for AI created at the summit in Hiroshima, Japan in May 2023  
‘Hiroshima AI process ’for informatics ** 
∙This Code of Conduct, designed for voluntary adoption by companies, is based on high-tech AI systems, including base models and generated AI.  
Includes measures necessary to identify and mitigate risk 
* The main seven countries (G7) mean the United States, Japan, Germany, the United Kingdom, France, Italy and Canada 
** Eight countries, including Korea, Australia, and Vietnam, were invited to the May Summit, but the AI International Action Force first adopted only G7 countries. 
nG7 has presented the following actions through the Code of Conduct, so that it can respond to rapidly developing technologies  
Will be revised as needed through stakeholder consultation 
∙Adopt measures to assess and mitigate risk throughout the AI life cycle during the development of advanced AI systems,  
Mitigate vulnerabilities, misuse accidents, and misuse types after the launch and deployment of advanced AI systems 
∙Transparency as a way to disclose the performance and limitations of advanced AI systems and inform appropriate or inappropriate usage  
Ensure and strengthen accountability 
∙In the event of inter-organizational information and accidents developing advanced AI systems, including industry, government, civil society, and academia  
Cooperate for reporting and include personal information protection policies and risk mitigation measures based on risk-based approaches  
Establish AI governance and risk management policies 
∙Implement strong security controls, including physical security, cybersecurity, and insider threat security throughout the AI life cycle 
∙Technically possible techniques, including watermarks, to help users identify AI-generated content  
Develop and build reliable content authentication and source verification mechanisms  
∙Priority investment in research and effective mitigation measures to mitigate social risks and safety and security issues, climate crisis  
Prioritize advanced AI systems to address global challenges such as response and global health and education 
∙Accelerate the development and adoption of international technical standards, and enter and collect data to protect personal information and knowledge property rights  
Implementing timely protection 
Source: G7, Hiroshima Process International Code of Conduct for Advanced AI Systems, 2023.10.30.

Copy

content = docs[1].page_content
print(content)

Check the data to summarize.

Copy

import textwrap
from langchain import hub
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import SimpleJsonOutputParser

# {content} Specify default values for all inputs except
cod_chain_inputs = {
    "content": lambda d: d.get("content"),
    "content_category": lambda d: d.get("content_category", "Article"),
    "entity_range": lambda d: d.get("entity_range", "1-3"),
    "max_words": lambda d: int(d.get("max_words", 80)),
    "iterations": lambda d: int(d.get("iterations", 5)),
}

# Download Chain of Density Prompt
cod_prompt = hub.pull("teddynote/chain-of-density-prompt")

# Chain of Density Create a chain
cod_chain = (
    cod_chain_inputs
    | cod_prompt
    | ChatOpenAI(temperature=0, model="gpt-4o-mini")
    | SimpleJsonOutputParser()
)

# Generate a second chain, extract only the final summary (cannot be streamed, final result is required)
cod_final_summary_chain = cod_chain | (
    lambda output: output[-1].get(
        "denser_summary", 'Error: 'denser_summary' key not found in last dictionary'
    )
)

The first chain shows intermediate results, and the second chain extracts only the final summary.

Progressive improvement: CoD initially generates a simple summary with fewer objects, then step by step adds important objects and improves the summary. As the length of the summary is maintained during this process, the density of information increases, creating a readable yet informative summary.
Balance of information density and readability: The CoD method regulates the information density in the summary to find the optimal balance between informability and readability. Studies have shown that people prefer CoD summaries that are more dense than typical GPT-4 summaries, but not as dense as man-made summaries.
Abstraction and information fusion improvement: CoD-generated summaries are more abstract, excellent information fusion, and less prone to the front of the original text (lead bias). This contributes to improving the overall quality and readability of the summary. Chain of Density Prompt Input parameter description
content_category : Content rectification (e.g. articles, video recordings, blog posts, research papers). Default: Article
content : Content to summarize
entity_range : The range of entities to select from the content and add to the summary. Default 1-3
max_words : 1 summary time, the maximum word to include in the summary. Default 80 is.
iterations : Number of entity high density rounds. Total summary Repeat count +1 is. For 80 words, 3 iterations is ideal. If the summary is longer, 4~5 rounds, and entity_range Changing 1~4 for example can also help. Default: 3. This code uses the Chain of Density prompt to construct a chain that creates a text summary.

This method initially creates a summary with fewer objects, and then goes through the process of repeatedly integrating missing important objects without increasing the length. Studies have shown that CoD-generated summaries are more abstract than regular prompts, have excellent information fusion, and have a density similar to human-written summaries.

The "Chain of Density" (CoD) prompt is a technique developed to improve the creation of summaries using GPT-4.

Thesis: https://arxiv.org/pdf/2309.04269

Chain of Density

Copy

 -The President of the United States, Biden, has issued an administrative order to develop and use a safe and reliable AI. 
-These mandates include AI's safety and security standards, privacy protection, equity and citizenship improvement, consumer protection, worker support, innovation and competition promotion, and international cooperation. 
-AI system development companies need to share safety test results and system information with the US government, and legal guidelines are in place for responsible use of AI. 
-The emphasis is on promoting AI use and training tools in the medical field for consumer protection and worker support. 
It also includes ways to promote AI research through the National AI Researchers (NAIRR) and help foreign experts study and work in the United States. 
In addition, G7 has agreed to an international action force for AI companies through the'Hiroshima AI Process', which encourages voluntary adoption to identify and mitigate risks in AI systems, and highlights risk assessment, transparency and accountability throughout the AI life cycle. Emphasis.  
-G7 plans to revise the Code of Conduct through stakeholder consultation to respond to technological advances and prioritizes the implementation of appropriate devices to mitigate social risks and protect personal information. 

----------------- 


-The President of the United States, Biden, has issued an administrative order to develop and use a safe and reliable AI. 
-These mandates include AI's safety and security standards, privacy protection, equity and citizenship improvement, consumer protection, worker support, innovation and competition promotion, and international cooperation. 
-AI system development companies need to share safety test results and system information with the US government, and legal guidelines are in place for responsible use of AI. 
-The emphasis is on promoting AI use and training tools in the medical field for consumer protection and worker support. 
It also includes ways to promote AI research through the National AI Researchers (NAIRR) and help foreign experts study and work in the United States. 
-G7 has agreed to international action mandates for AI companies through the'Hiroshima AI Process', which encourages voluntary adoption to identify and mitigate risks in AI systems, highlighting risk assessment, transparency and accountability throughout the AI life cycle. 
-G7 plans to revise the Code of Conduct through stakeholder consultation to respond to technological advances and prioritizes the implementation of appropriate devices to mitigate social risks and protect personal information. 
-In addition, 28 countries have issued a declaration of the Leslie to ensure AI safety at the AI Safety Summit in Leslie Park, UK. The declaration emphasizes the importance of all stakeholders' cooperation to ensure the safety of the AI system, especially pointing out the responsibilities of AI development companies. 
-The British Prime Minister announced a safety test plan for the advanced AI model with the launch of the AI Safety Institute, and governments agreed to share test results and work on developing joint standards. 
-Participating countries have agreed to write a report to scientifically assess the risks and possibilities of AI, and South Korea will co-host an AI mini summit with the UK. 

----------------- 


-The President of the United States, Biden, has issued an administrative order to develop and use a safe and reliable AI. 
-These mandates include AI's safety and security standards, privacy protection, equity and citizenship improvement, consumer protection, worker support, innovation and competition promotion, and international cooperation. 
-AI system development companies need to share safety test results and system information with the US government, and legal guidelines are in place for responsible use of AI. 
-The emphasis is on promoting AI use and training tools in the medical field for consumer protection and worker support. 
It also includes ways to promote AI research through the National AI Researchers (NAIRR) and help foreign experts study and work in the United States. 
-G7 has agreed to international action mandates for AI companies through the'Hiroshima AI Process', which encourages voluntary adoption to identify and mitigate risks in AI systems, highlighting risk assessment, transparency and accountability throughout the AI life cycle. 
-G7 plans to revise the Code of Conduct through stakeholder consultation to respond to technological advances and prioritizes the implementation of appropriate devices to mitigate social risks and protect personal information. 
-In addition, 28 countries have issued a declaration of the Leslie to ensure AI safety at the AI Safety Summit in Leslie Park, UK. The declaration emphasizes the importance of all stakeholders' cooperation to ensure the safety of the AI system, especially pointing out the responsibilities of AI development companies. 
-The British Prime Minister announced a safety test plan for the advanced AI model with the launch of the AI Safety Institute, and governments agreed to share test results and work on developing joint standards. 
-Participating countries have agreed to write a report to scientifically assess the risks and possibilities of AI, and South Korea will co-host an AI mini summit with the UK. 
Meanwhile, the Northern California District Court in the United States has dismissed copyright infringement lawsuits that artists have filed against Middlesey, Stabilization AI, and Divident Art. The court found that the works presented in the complaint were not registered with the Copyright Office and it was difficult to prove the similarity between AI-generated images and certain works. The judge asked the plaintiff to revise the complaint and raise it again, reducing the scope of the lawsuit around a specific image of copyright infringement. However, there will continue to be a lawsuit against the copyright infringement of Stabilization AI for 16 copyrighted works by Sarah Anderson. 

----------------- 


-The President of the United States, Biden, has issued an administrative order to develop and use a safe and reliable AI. 
-These mandates include AI's safety and security standards, privacy protection, equity and citizenship improvement, consumer protection, worker support, innovation and competition promotion, and international cooperation. 
-AI system development companies need to share safety test results and system information with the US government, and legal guidelines are in place for responsible use of AI. 
-The emphasis is on promoting AI use and training tools in the medical field for consumer protection and worker support. 
It also includes ways to promote AI research through the National AI Researchers (NAIRR) and help foreign experts study and work in the United States. 
-G7 has agreed to international action mandates for AI companies through the'Hiroshima AI Process', which encourages voluntary adoption to identify and mitigate risks in AI systems, highlighting risk assessment, transparency and accountability throughout the AI life cycle. 
-G7 plans to revise the Code of Conduct through stakeholder consultation to respond to technological advances and prioritizes the implementation of appropriate devices to mitigate social risks and protect personal information. 
-In addition, 28 countries have issued a declaration of the Leslie to ensure AI safety at the AI Safety Summit in Leslie Park, UK. The declaration emphasizes the importance of all stakeholders' cooperation to ensure the safety of the AI system, especially pointing out the responsibilities of AI development companies. 
-The British Prime Minister announced a safety test plan for the advanced AI model with the launch of the AI Safety Institute, and governments agreed to share test results and work on developing joint standards. 
-Participating countries have agreed to write a report to scientifically assess the risks and possibilities of AI, and South Korea will co-host an AI mini summit with the UK. 
Meanwhile, the Northern California District Court in the United States has dismissed copyright infringement lawsuits that artists have filed against Middlesey, Stabilization AI, and Divident Art. The court found that the works presented in the complaint were not registered with the Copyright Office and it was difficult to prove the similarity between AI-generated images and certain works. The judge asked the plaintiff to revise the complaint and raise it again, reducing the scope of the lawsuit around a specific image of copyright infringement. However, there will continue to be a lawsuit against the copyright infringement of Stabilization AI for 16 copyrighted works by Sarah Anderson. 
-The United States Federal Trade Commission (FTC) has expressed concern over the damage that AI can do to consumers and creators and strengthening Big Tech's market dominance. The FTC warns that the use of generated AI may increase the risk of personal information infringement, discrimination, fraud, etc., and in relation to copyright law, the generated AI is likely to undermine the creator's competitiveness and consumers may misunderstand AI's work. Pointed out. The FTC expresses concern that some Big Techs can leverage resources to strengthen market dominance, and is taking legal action against AI-related illegal activities. The FTC emphasized that the development of generating AI could make a difference in many industries, but under current law, it would take advantage of all powers to protect consumers and maintain fair competition.

Copy

refined_summary = map_refine_chain.invoke(docs)

Copy

from langchain_core.runnables import chain


@chain
def map_refine_chain(docs):

    # map chain 생성
    map_summary = hub.pull("teddynote/map-summary-prompt")

    map_chain = (
        map_summary
        | ChatOpenAI(
            model_name="gpt-4o-mini",
            temperature=0,
        )
        | StrOutputParser()
    )

    input_doc = [{"documents": doc.page_content, "language": "Korean"} for doc in docs]

    # Create a chain by concatenating the first prompt, ChatOpenAI, and the string output parser.
    doc_summaries = map_chain.batch(input_doc)

    refine_prompt = hub.pull("teddynote/refine-prompt")

    refine_llm = ChatOpenAI(
        model_name="gpt-4o-mini",
        temperature=0,
        callbacks=[StreamingCallback()],
        streaming=True,
    )

    refine_chain = refine_prompt | refine_llm | StrOutputParser()

    previous_summary = doc_summaries[0]

    for current_summary in doc_summaries[1:]:

        previous_summary = refine_chain.invoke(
            {
                "previous_summary": previous_summary,
                "current_summary": current_summary,
                "language": "Korean",
            }
        )
        print("\n\n-----------------\n\n")

    return previous_summary

Weave a series of processes so far into one chain.

Below is an example that creates map_reduce_chain.

Copy

# generate refine llm
refine_llm = ChatOpenAI(
    temperature=0,
    model_name="gpt-4o-mini",
)

# Generate refine chain
refine_chain = refine_prompt | refine_llm | StrOutputParser()

Copy

 ================================ System Message ================================ 

You are an expert summarizer. 

================================ Human Message ================================= 

Your job is to product a final summary 

We have proven an existing summary up to a certain point: 
{previous_summary} 

We have the exportunity to refine the existing summary (only if needle) with some more context below. 
------------ 
{current_summary} 
------------ 
Given the new context, refine the original summary in {language}. 
If the context isn't usful, return the original summary.

Copy

# Download refine prompt
refine_prompt = hub.pull("teddynote/refine-prompt")

# Prompt Output
refine_prompt.pretty_print()

The Refine phase sequentially handles the chunk created in the previous map phase and gradually improves the final summary.

Refine

Copy

 ['- President Biden of the United States has issued an administrative order to develop and use safe and reliable AI. \n- Administrative orders include AI safety and security standards, privacy protection, equity and citizenship improvement, consumer protection, worker support, innovation and competition Promotion, international cooperation.\n- AI system development companies need to share safety test results and system information with the government, and will establish standards and best practices for AI-generated content.\n-Expand measures to prevent discrimination due to irresponsible use of AI to improve equity and citizenship, and establish principles and best practices for consumer protection and worker support. It was planned to help foreign experts study and work in the United States.', "- G7, on October 30, 2023,'Hiroshima AI Process'n- This Code of Conduct recommends voluntary adoption for risk identification and mitigation of AI systems, highlighting risk assessment, transparency, and accountability throughout the AI life cycle.\n- Highlights include information sharing, collaboration, security control, content authentication, and source identification. \n- G7 will revise the Code of Conduct as needed through stakeholder consultation to respond to the rapid development of AI technology.\n- Also, Prioritize the development of AI systems to mitigate social risks and address global challenges, and plan to implement appropriate safeguards to protect personal information.”, "- 28 countries to respond jointly to AI risks at the AI Safety Summit in Blasley Park, UK." ``The Declaration of Blletzley''  Announced. The \n-declaration emphasizes the importance of all stakeholders' cooperation to ensure the safety of the AI system, especially pointing out the responsibility for the safety assessment of AI development companies. \n- The British Prime Minister, along with the launch of the AI Safety Institute, has developed a safety test plan for advanced AI models. The governments agreed to share test results and strive to develop joint standards.\n- Participating countries have agreed to write a report on scientific assessment of AI risks and possibilities, South Korea will co-host the AI Mini Summit with the United Kingdom.”,'- The Northern California District Court dismissed copyright infringement lawsuits filed by artists against Middlesey, Stabilization AI, and Dividend.\n- For reasons of dismissal, the works presented in the complaint were not registered with the Copyright Office. There was a point that it was difficult to prove the similarity between the image created by AI and the specific work.\ Judge asked the plaintiff to revise the complaint and reduce the scope of the lawsuit.n- However, the coverage AI's copyright infringement lawsuit against Sarah Anderson's 16 copyrighted works continues.','- The United States Federal Trade Commission (FTC), in a comment submitted to the Copyright Office, is concerned about the potential for consumer and creator damage from generated AI. Expressed concerns about strengthening market dominance.\n- FTC said that the use of generated AI is the consumer's infringement of personal information, automation of discrimination and prejudice, He pointed out that fraudulent crimes, etc., can lead to unfair competition or deception, which is against copyright law, which can negatively affect the creator's reputation and the value of the work.  \n- FTC raised concerns that some big-tech could prevent the departure of generated AI users and secure exclusive licenses to strengthen market dominance.\n- FTC leverages legal powers to combat AI-related illegal activities, It was emphasized that all powers would be used to protect consumers and maintain a fair competitive market.']

Copy

# Prints a summary of all documents.
print(map_chain.batch(input_doc))

Copy

# Define all documents as input.
input_doc = [{"documents": doc, "language": "Korean"} for doc in docs]

Copy

 -The President of the United States, Biden, has issued an administrative order to develop and use a safe and reliable AI. 
-Administrative orders include AI's safety and security standards, privacy protection, equity and citizenship improvement, consumer protection, worker support, innovation and competition promotion, and international cooperation. 
-AI system development companies need to share safety test results and system information with the government, and will establish standards and best practices for AI-generated content. 
-Expand measures to prevent discrimination due to irresponsible use of AI to improve equity and citizenship, and establish principles and best practices for consumer protection and worker support. 
-We plan to promote AI research through the National AI Researcher (NAIRR) and help foreign experts study and work in the United States.

Copy

# Output summary of the first document
print(map_chain.invoke({"documents": docs[0], "language": "Korean"}))

Output a summary of the first document.

Copy

# map chain 생성
map_chain = map_summary | llm | StrOutputParser()

Generates map_chain.

Copy

 ================================ System Message ================================ 

You are an expert summarizer. Your task is to summarize the following document in {language}. 

================================ Human Message ================================= 

Extract most important main thesis from the documents, then summarize in bullet points. 

#Format: 
- summary 1 
- summary 2 
- summary 3 
-... 

Here is a given document:  
{documents} 

Write 1~5 sentences. Think step by step. 
#Summary:

Copy

from langchain import hub
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# map llm 생성
map_llm = ChatOpenAI(
    temperature=0,
    model_name="gpt-4o-mini",
)

# map chain 생성
map_summary = hub.pull("teddynote/map-summary-prompt")

# 프롬프트 출력
map_summary.pretty_print()

Copy

print(summary)

Copy

On October 30, 2023, G7 countries agreed to the AI International Action Force for AI companies through the Hiroshima AI process. This Code of Conduct requires AI-generated content identification, risk-based access, international technology standards development, and risk assessment throughout the AI life cycle.

Clustering-Map-Refine

The original author of this tutorial, gkamradt, made an interesting suggestion for a summary of long documents.

The background is as follows.

The map-reduce or map-refine method is all time consuming and expensive.
Therefore, after dividing the documents into clusters of several (N), the document closest to the most central axis is recognized as the representative document of the cluster, suggesting a way to summarize them in the map-reduce (or map-refine) way.

In fact, the cost is reasonable and the results are satisfactory, so we modify and share the code of the original author's tutorial.

Original author and source -gkamradt

Copy

from langchain_community.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader("data/SPRI_AI_Brief_2023년12월호_F.pdf")
docs = loader.load()
len(docs)

Copy

When you run the code below, the text is combined into one document. The purpose of the combination is not to be separated by page stars.

The combined number of characters is about 28K.

Copy

# Concatenate all documents into one Text.
texts = "\n\n".join([doc.page_content for doc in docs])
len(texts)

Copy

RecursiveCharacterTextSplitter Divide one Text into multiple documents using.

Copy

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
split_docs = text_splitter.split_text(texts)

Check the number of documents divided. It was divided into 79 documents here.

Copy

# Check the total number of documents
len(split_docs)

Copy

Embed documents using the Upstage Embeddings model.

Copy

from langchain_upstage import UpstageEmbeddings

embeddings = UpstageEmbeddings(model="solar-embedding-1-large-passage")

vectors = embeddings.embed_documents(split_docs)

A total of 79 documents are divided into 10 clusters. At this time KMeans Perform clustering using.

Copy

from sklearn.cluster import KMeans

# The number of clusters you choose can be adjusted depending on the content of your document.
num_clusters = 10

# Perform K-means clustering
kmeans = KMeans(n_clusters=num_clusters, random_state=123).fit(vectors)

Check the labeled results.

Copy

# Check the results
kmeans.labels_

Copy

 array [5, 3, 7, 6, 4, 4, 5, 3, 8, 8, 1, 1, 8, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 7, 7, 7, 7,       4, 8, 4, 9, 4, 4, 9, 6, 2, 5, 4, 4, 4], dtype=int32)

Copy

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import seaborn as sns

# Remove warning
import warnings

warnings.filterwarnings("ignore")

# Perform t-SNE and reduce to 2D
tsne = TSNE(n_components=2, random_state=42)
reduced_data_tsne = tsne.fit_transform(np.array(vectors))

# seaborn style settings
sns.set_style("white")

# Reduced data plot
plt.figure(figsize=(10, 8))
sns.scatterplot(
    x=reduced_data_tsne[:, 0],
    y=reduced_data_tsne[:, 1],
    hue=kmeans.labels_,
    palette="deep",
    s=100,
)
plt.xlabel("Dimension 1", fontsize=12)
plt.ylabel("Dimension 2", fontsize=12)
plt.title("Clustered Embeddings", fontsize=16)
plt.legend(title="Cluster", title_fontsize=12)

# Set background color
plt.gcf().patch.set_facecolor("white")

plt.tight_layout()
plt.show()

Then you need to find and save the embedding closest to the center point of each cluster.

Copy

import numpy as np

# Create an empty list to store the closest points
closest_indices = []

# Repeat as many times as the number of clusters
for i in range(num_clusters):

    # Get a list of distances from the center of the cluster
    distances = np.linalg.norm(vectors - kmeans.cluster_centers_[i], axis=1)

    # Find the index of the closest point (find minimum distance using argmin)
    closest_index = np.argmin(distances)

    # Add the given index to the list of nearest indices
    closest_indices.append(closest_index)

Sort ascending to proceed with the summary of the documents in order.

Copy

# Sort in ascending order to proceed with the summary of the document in order.
selected_indices = sorted(closest_indices)
selected_indices

Copy

 [4, 16, 24, 28, 34, 37, 41, 47, 51, 55]

Output 10 selected documents. In this process Document Create documents using objects.

Copy

from langchain_core.documents import Document

selected_docs = [Document(page_content=split_docs[doc]) for doc in selected_indices]
selected_docs

Copy

 [Document (page_content='▹ Bill Gates, prospect of paradigm change in computer use due to AI agents········································ 13\n ▹ YouTube, mandatory display of AI generated content from 24 Technology / Research \n ▹ UK Science and Innovation Technology, AI Safety Institute established · · · · · · · · · · · · · · · · · · · · · · · · · · Personnel/training'), Document (page_content='1. Policy/law \n2. Enterprise/Industry \n3. Technology / Research \n 4. Personnel/Education\n 28 countries participating in the UK AI Safety Summit, Joint Response to AI Risks \nn 28 countries participating in the AI Safety Summit held in Leslie Park, UK, announced the Leslie Declaration with a way to \n AI safety assurance. Countries and companies developing have agreed on a safety test plan for the AI system, and \n UK's AI Safety Institute will work with countries around the world to testn£ Participating countries of the AI Safety Summit,  ~ 28 countries with \n participation in the AI Safety Summit held at the Wetley Park in the UK on November 2023 1‘2, 2023 International Organization for Management of AI Risks ’ ∙ Announced = Policy/law \n2. Enterprise/Industry \n3. Technology / Research \n 4. Personnel/Education\n US Federal Trade Commission submits AI comments on consumer protection and competition to the Copyright Office \nn US FTC provides consumer protection and competition \n-side comments on copyright and AI-related queries conducted by the Copyright Office \nn FTC is the creator of the generated AI And while raising concerns about the possibility of consumer damage, some \nbigtech may take advantage of massive resources to further strengthen market dominance \nKEY Contents Consumers and creators' damage from generated AI and Big Tech's market dominance concerns\nn The United States Federal Trade Commission (FTC) on October 30, 2023. Copyright Office (USCO) announces comments on copyright and AI-related questions published last September \n∙ Copyright Office is investigating copyright laws and policy issues related to generated AI, and through broad comment'), Document (page_content='SPRi AI Brief  In the final negotiations of the AI law, which the EU Executive Committee is working on, France, Italy, and \n Germany oppose regulation on the base model, and negotiations are in trouble. \nn France, Italy, Germany 3 countries introduce autonomous action mandates against the base model development companies And proposed ways to mandate \nKEY Contents\n£ AI law tripartite negotiations, some countries oppose basil n regulation While the EU board of directors is in the process of final negotiations on the ‘AI Act ’, \n some countries oppose regulation on the base model and negotiations are in trouble \n∙ at a three-party negotiation meeting held on October 24th, a greater impact on society The basic consensus on the regulation of the base model according to the hierarchical approach to \n'to reach a more strict AI model, Plans to use funds to fulfill commitments to support \n funding for AI system evaluation by external research groups \n£ AI Safety Fund plans to focus on developing model evaluation techniques for AI Red Team \nn Frontier Model Forum is committed to AI Red Team activities through AI Safety Fund support for the development of new model evaluation techniques for \npoint support \n∙ Forum improves AI Red Team safety and security standardsWith the prospect of helping to gain insights from industry, government, and civil society on risk-response measures, the forum plans to receive requests for proposals for funding in the next few \n months. \N limit,  Data transparency is not secured due to the fragility of advanced AI models or the ability to share potentially dangerous features and risk mitigation-related information \n public procedures are also being developed'), Document (page_content='∙, and the unclear source of the dataset used to train the AI model. Due to the occurrence of various \nlegal and ethical issues, the researchers at 2,00 Relicensing status, authors, and other data attributes are specified, and \n releases a platform to access this information \n∙ Data source explorer in the form of a large platform makes it easy to grasp the license status of the dataset, and \n main dataset Configuration and data genealogy are also traceable \nn researchers are key to data transparency through extensive audits of open source data sets In open source LLM trained with \ndata collected from crowdsourcing platforms such as Papers with Code, the missing percentage of data licenses is 72~83%'), Document (page_content='∙ Alibaba Cloud is a Tongi 20 In benchmark \n tests such as math (GSM8k), question answer (ARC-C), it surpasses major AI models including Lama-2-70B and GPT-3.5 \n∙ Tongyuan 2.  0 is provided to the public through Alibaba Cloud's web site and mobile apps, and developers can use it through \nAPI \nn Alibaba Cloud uses generated AI in multiple industry areas to help improve business performance \n∙ Industry area includes customer support development and legal counseling, healthcare, finance, document management, audio and video management, code development, character \n), Document (page_content='1. Policy/law \n2. Enterprise/Industry \n3. Technology / Research \n 4. Workforce/training\ngoggles create $20 billion investment in Aspics Enhances AI cooperation \nn Google has agreed to invest up to $20 billion in and over $500 billion, and Aspics has also signed a contract to use cloud services with \nggles. Google, Microsoft, and Amazon expand cooperation with \n Androvic and OpenAI, the leading companies in the next generation AI model Up to $20 billion investment agreement and cloud service delivery to Ansropic \nn Google agreed to invest up to $20 billion in Ansropic on October 27, 2023, of which 500 million \ndollar would be invested first and an additional $1.5 billion in the next month \n∙ Google is already in 2023.Disclosure of investment plans of up to $400 billion in n and topic'), Document (page_content='SPRi AI Brief | \n2023-December issue\n12\nIDC, 2027 AI software sales 2,  $500 billion breakthrough forecast \nn IDC's forecast predicts that the AI software market will reach $2,510 billion in 2027, and the creation \nAI platform and applications will generate $283 billion in sales by 2027 \nn As of 2023, AI application, the largest market occupying a third of AI software sales, \nn£ Supporting companies' increased AI investment AI software market fast-paced \nn Market Research Institute IDC expects the AI software market to grow rapidly, from $640 billion in 2022 to $2510 billion in 2027.4%. Document (page_content='1. Policy/law \n2. Enterprise/Industry \n3. Technology / Research \n 4. Manpower/training\nville Gates predicts paradigm change in computer use due to AI agents\nn Bill Gates is expected to have a full change in the way computers \n are used, with AI agents capable of handling all tasks in everyday language within 5 years. , Medical and \n education, productivity, especially as a prospect that will affect all industries beyond the computer sector.  Prospects for high-priced services in the entertainment and shopping area to become popular \nKEY Contents\n£ 5 years I expect the dissemination of AI agents just to speak in everyday language to my device \nn Bill Gates Microsoft founder on November 9, 2023 Official Blog On the basis of the AI agent's computer \n usage and the prospect that it will completely change the software industry>n

Copy

# Generate a summary using the previously created map_refine_chain
refined_summary = map_refine_chain.invoke(selected_docs)

Copy

 -Bill Gates predicted that AI agents would make a big difference in the way computers are used. 
- YouTube plans to mandate the display of AI-generated content from 2024. 
-The British Ministry of Science and Innovation announced the establishment of the AI Safety Institute. 
At the British AI Safety Summit, where 28 countries participated, the Wetzley Declaration to Ensure AI Safety was announced, highlighting the cooperation of all stakeholders to manage AI risk. 
-High-tech AI development countries and companies have agreed on a safety test plan for the AI system, and the British AI Safety Institute will lead it. 
-Google Deepmind has released a new classification system for the functionality and behavior of universal AI models. 
In Galileo's study, GPT-4 performed best in the LLM hallucinations index assessment. 

----------------- 


-Bill Gates predicted that AI agents would make a big difference in the way computers are used. 
- YouTube plans to mandate the display of AI-generated content from 2024. 
-The British Ministry of Science and Innovation announced the establishment of the AI Safety Institute. 
At the British AI Safety Summit, where 28 countries participated, the Wetzley Declaration to Ensure AI Safety was announced, highlighting the cooperation of all stakeholders to manage AI risk. 
-High-tech AI development countries and companies have agreed on a safety test plan for the AI system, and the British AI Safety Institute will lead it. 
-Google Deepmind has released a new classification system for the functionality and behavior of universal AI models. 
In Galileo's study, GPT-4 performed best in the LLM hallucinations index assessment. 
-The U.S. Federal Trade Commission (FTC) has submitted comments to the Copyright Office on consumer protection and competition related to AI, expressing concern about the potential damage to creators and consumers due to the creation AI. 
-The FTC also raised concerns that some large technology companies could strengthen market dominance through enormous resources. 
-The Copyright Office is investigating copyright laws and policy issues related to generated AI, and is working on a wide range of comments. 

----------------- 


-Bill Gates predicted that AI agents would make a big difference in the way computers are used. 
- YouTube plans to mandate the display of AI-generated content from 2024. 
-The British Ministry of Science and Innovation announced the establishment of the AI Safety Institute. 
At the British AI Safety Summit, where 28 countries participated, the Wetzley Declaration to Ensure AI Safety was announced, highlighting the cooperation of all stakeholders to manage AI risk. 

...  
(meditation) 
... 

-Google has agreed to invest up to $20 billion in Ansropic, and $500 million in priority. 
-Anglo-Pick has signed a contract to use cloud services with Google. 
Major cloud operators such as Google, Microsoft, and Amazon are strengthening cooperation with next-generation AI model companies, Ansropic and OpenAI. 
-Google also invested $550 million in Ansropic in February 2023. 
-Amazon announced an investment plan of up to $400 billion in Ansropic. 
-IDC expects AI software sales to reach $2,510 billion by 2027, and generated AI platforms and applications are expected to generate $283 billion in revenue by 2027. 
-As of 2023, AI applications account for a third of AI software sales, and are expected to grow by an average of 21.1% per year by 2027. 
-The AI software market grows from $640 billion in 2022 to $2,510 billion in 2027, with an average annual growth rate expected to reach 31.4%. The market includes AI platforms, AI applications, AI system infrastructure software, and AI application development and deployment software. 

-----------------

Copy

# Output the final result
print(refined_summary)

Copy

 -Bill Gates predicts that within five years, AI agents will be able to handle all tasks in everyday language, and these AI agents will completely change the way computers are used and have a major impact on the software industry. The introduction of AI agents will be an opportunity for expensive services to become popular in various industries, including healthcare, education, productivity, entertainment and shopping. 
- YouTube plans to mandate the display of AI-generated content from 2024. 
-The British Ministry of Science and Innovation announced the establishment of the AI Safety Institute. 
At the British AI Safety Summit, where 28 countries participated, the Wetzley Declaration to Ensure AI Safety was announced, highlighting the cooperation of all stakeholders to manage AI risk. 
-High-tech AI development countries and companies have agreed on a safety test plan for the AI system, and the British AI Safety Institute will lead it. 
-The AI Safety Fund plans to fund the evaluation of AI systems by external research groups, and will focus on developing new model evaluation techniques for the AI Red Team. 
-Funding for the AI Red Team is expected to contribute to improving the safety and security standards of the AI model, and the forum plans to receive requests for proposals for funding in the coming months. 
-In addition, an open process is being developed to share vulnerability and risk mitigation information from advanced AI models. 
-Google Deepmind has released a new classification system for the functionality and behavior of universal AI models. 
In Galileo's study, GPT-4 performed best in the LLM hallucinations index assessment. 
-The U.S. Federal Trade Commission (FTC) has submitted comments to the Copyright Office on consumer protection and competition related to AI, expressing concern about the potential damage to creators and consumers due to the creation AI. 
-The FTC also raised concerns that some large technology companies could strengthen market dominance through enormous resources. 
-The Copyright Office is investigating copyright laws and policy issues related to generated AI, and is working on a wide range of comments. 
-In the final negotiations of the EU's AI law, France, Italy and Germany are struggling to negotiate against the regulation of the base model, and these countries have proposed ways to introduce and mandate autonomous action mandates for base model development companies. 
At the October 24 meeting, a basic agreement was reached on a hierarchical approach to applying stricter rules for AI models with a large impact on society. 
-The source of the dataset used for AI model training is unclear, causing legal and ethical issues, and researchers audited over 2,000 fine-tuned datasets to tag and verify information from original data sources, relicense status, authors, etc. Developed a platform. 
-This platform provides an interactive explorer that allows you to easily track the license status and configuration of the dataset and the data lineage. 
-Audit results for open source datasets showed that the percentage of LLM's data license omissions trained with data collected from GitHub and Papers with Code reached 72~83%. 
-Alibaba Cloud announced that Tongyuan One 2.0 was released in April 2023, and performance improved over its predecessor. 
-Kunichien One 2.0 has transcended major AI models in various benchmark tests, including language understanding, math, and answering questions. 
-This model is available to the public through Alibaba Cloud's web site and mobile apps, and developers can access it through the API. 
Alibaba Cloud has also launched an industry-specific model that can leverage generated AI to improve business performance in multiple industries. Supported industries include customer support, legal counseling, medical, financial, document management, audio and video management, code development, and character creation. 
-Google has agreed to invest up to $20 billion in Ansropic, and $500 million in priority. 
-Anglo-Pick has signed a contract to use cloud services with Google. 
Major cloud operators such as Google, Microsoft, and Amazon are strengthening cooperation with next-generation AI model companies, Ansropic and OpenAI. 
-Google also invested $550 million in Ansropic in February 2023. 
-Amazon announced an investment plan of up to $400 billion in Ansropic. 
-IDC expects AI software sales to reach $2,510 billion by 2027, and generated AI platforms and applications are expected to generate $283 billion in revenue by 2027. 
-As of 2023, AI applications account for a third of AI software sales, and are expected to grow by an average of 21.1% per year by 2027. 
-The AI software market grows from $640 billion in 2022 to $2,510 billion in 2027, with an average annual growth rate expected to reach 31.4%. The market includes AI platforms, AI applications, AI system infrastructure software, and AI application development and deployment software.

PreviousCH14 chain (Chains)Next02. SQL

Last updated 5 months ago