راهنمای مهندسی پرامپت برای مدل Whisper

راهنمای مهندسی پرامپت برای مدل Whisper

whisper, prompt-engineering
preview

رابط برنامه‌نویسی یا API گیلاس یک پارامتر اختیاری برای کار با مدل wisper در اختیار شما قرار می‌دهد. هدف از پراپمت کمک به اتصال چندین بخش صوتی است. با ارسال متن تولید شده (ترنسکریپشن) در بخش قبلی از طریق پراپمت، مدل Whisper می‌تواند از این کانتکست برای درک بهتر گفتار و حفظ سبک نوشتاری استفاده کند.

با این حال، پراپمت‌ها نیازی به ترنسکریپشن‌های واقعی از بخش‌های صوتی قبلی ندارند. پراپمت‌های ساختگی می‌توانند مدل را به استفاده از املاء یا سبک‌های خاص هدایت کنند.

این نوت‌بوک دو تکنیک برای استفاده از پراپمت‌های ساختگی برای هدایت خروجی‌های مدل را به اشتراک می‌گذارد:

  • تولید ترنسکریپشن: GPT می‌تواند دستورالعمل‌ها را به ترنسکریپشن‌های ساختگی تبدیل کند تا Whisper از آنها تقلید کند.
  • راهنمای املاء: یک راهنمای املاء می‌تواند به مدل بگوید که چگونه نام‌های افراد، محصولات، شرکت‌ها و غیره را بنویسد.

این تکنیک‌ها به طور خاص قابل اعتماد نیستند، اما در برخی موارد می‌توانند مفید باشند.

مقایسه با پراپمتینگ GPT #

پراپمتینگ Whisper با پراپمتینگ GPT یکسان نیست. به عنوان مثال، اگر شما یک دستورالعمل مانند “لیست‌ها را در فرمت Markdown فرمت کنید” ارسال کنید، مدل از آن پیروی نخواهد کرد، زیرا سبک پراپمت را دنبال می‌کند، نه دستورالعمل‌های موجود در آن.

علاوه بر این، پراپمت به 224 توکن محدود است. اگر پراپمت طولانی‌تر از 224 توکن باشد، فقط آخرین 224 توکن پراپمت در نظر گرفته می‌شود؛ تمام توکن‌های قبلی نادیده گرفته می‌شوند. توکنایزر استفاده شده توکنایزر چندزبانه Whisper است.

برای دستیابی به نتایج خوب، مثال‌هایی بسازید که سبک مورد نظر شما را به تصویر بکشند.

راه‌اندازی #

برای اجرای کدهای زیر ابتدا باید یک کلید API را از طریق پنل کاربری گیلاس تولید کنید. برای این کار ابتدا یک حساب کاربری جدید بسازید یا اگر صاحب حساب کاربری هستید وارد پنل کاربری خود شوید. سپس، به صفحه کلید API بروید و با کلیک روی دکمه “ساخت کلید API” یک کلید جدید برای دسترسی به Gilas API بسازید.
1import time
2from openai import OpenAI
3import os
4
5client = OpenAI(
6    api_key=os.environ.get(("GILAS_API_KEY", "<کلید API خود را اینجا بسازید https://dashboard.gilas.io/apiKey>")), 
7    base_url="https://api.gilas.io/v1/" # Gilas APIs
8)
 1up_first_remote_filepath = "https://cdn.openai.com/API/examples/data/upfirstpodcastchunkthree.wav"
 2bbq_plans_remote_filepath = "https://cdn.openai.com/API/examples/data/bbq_plans.wav"
 3product_names_remote_filepath = "https://cdn.openai.com/API/examples/data/product_names.wav"
 4
 5# تنظیم مکان‌های ذخیره‌سازی محلی
 6up_first_filepath = "data/upfirstpodcastchunkthree.wav"
 7bbq_plans_filepath = "data/bbq_plans.wav"
 8product_names_filepath = "data/product_names.wav"
 9
10# دانلود فایل‌های صوتی نمونه و ذخیره محلی
11urllib.request.urlretrieve(up_first_remote_filepath, up_first_filepath)
12urllib.request.urlretrieve(bbq_plans_remote_filepath, bbq_plans_filepath)
13urllib.request.urlretrieve(product_names_remote_filepath, product_names_filepath)

ترنسکرایب #

فایل صوتی ما برای این مثال یک بخش از پادکست NPR، Up First خواهد بود.

1def transcribe(audio_filepath, prompt: str) -> str:
2    """با توجه به یک پراپمت، فایل صوتی را ترنسکرایب کنید."""
3    transcript = client.audio.transcriptions.create(
4        file=open(audio_filepath, "rb"),
5        model="whisper-1",
6        prompt=prompt,
7    )
8    return transcript.text
1transcribe(up_first_filepath, prompt="")
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane, where, of course, where he says, I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

ترنسکریپشن‌ها سبک پراپمت را دنبال می‌کنند #

در ترنسکریپشن بدون پراپمت، ‘President Biden’ با حروف بزرگ نوشته شده است. با این حال، اگر یک پراپمت ساختگی از ‘president biden’ با حروف کوچک ارسال کنیم، Whisper سبک را مطابقت می‌دهد و ترنسکریپشن را با حروف کوچک تولید می‌کند.

1transcribe(up_first_filepath, prompt="president biden")
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane? Yes. Of course. Where he says I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Don't call me Shirley. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

توجه داشته باشید که وقتی پراپمت‌ها کوتاه هستند، Whisper ممکن است در پیروی از سبک آنها کمتر قابل اعتماد باشد .

1transcribe(up_first_filepath, prompt="president biden.")
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane, where, of course, where he says, I have a drinking problem, and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

پراپمت‌های طولانی‌تر ممکن است در هدایت Whisper قابل اعتمادتر باشند.

1transcribe(up_first_filepath, prompt="i have some advice for you. multiple sentences help establish a pattern. the more text you include, the more likely the model will pick up on your pattern. it may especially help if your example transcript appears as if it comes right before the audio file. in this case, that could mean mentioning the contacts i stick in my eyes.")
"i stick contacts in my eyes. do you really? yeah. that works okay? you don't have to, like, just kind of pain in the butt? no, it is. it is. and i sometimes just kind of miss the eye. i don't know if you know, um, the movie airplane? yes. of course. where he says i have a drinking problem. and that he keeps missing his face with the drink. that's me in the contact lens. surely, you must know that i know the movie airplane. i do. i do know that. don't call me surely. stop calling me surely. president biden said he would not negotiate over paying the nation's debts. but he is meeting today with house speaker kevin mccarthy. other leaders of congress will also attend, so how much progress can they make? i'm amy martinez with steve inskeep, and this is up first from npr news. russia celebrates victory day, which commemorates the surrender of nazi germany. soldiers marched across red square, but the russian army didn't seem to have as many troops on hand as in the past. so what does this ritual say about the war russia is fighting right now?"

مدل Whisper همچنین کمتر احتمال دارد که سبک‌های نادر یا عجیب را دنبال کند.

1transcribe(up_first_filepath, prompt="""Hi there and welcome to the show.
2###
3Today we are quite excited.
4###
5Let's jump right in.
6###""")
"I stick contacts in my eyes. Do you really? Yeah. That works okay. You don't have to like, it's not a pain in the butt. It is. And I sometimes just kind of miss the eye. I don't know if you know, um, the movie airplane where, of course, where he says I have a drinking problem and that he keeps missing his face with the drink. That's me in the contact lens. Surely you must know that I know the movie airplane. Uh, I do. I do know that. Stop calling me Shirley.  President Biden said he would not negotiate over paying the nation's debts, but he is meeting today with house speaker, Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I mean, Martinez with Steve Inskeep, and this is up first from NPR news. Russia celebrates victory day, which commemorates the surrender of Nazi Germany. Soldiers marched across red square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war? Russia is fighting right now."

ارسال نام‌ها در پراپمت برای جلوگیری از اشتباهات املایی #

مدل Whisper ممکن است نام‌های خاص نادر مانند نام‌های محصولات، شرکت‌ها یا افراد را به اشتباه ترنسکرایب کند.

ما این را با یک فایل صوتی پر از نام‌های محصولات نشان خواهیم داد.

1transcribe(product_names_filepath, prompt="")
'Welcome to Quirk, Quid, Quill, Inc., where finance meets innovation. Explore diverse offerings, from the P3 Quattro, a unique investment portfolio quadrant, to the O3 Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3 Bond X and experience non-standard equity trading with E3 Equity. Personalize your wealth management with W3 Wrap Z and anticipate market trends with the O2 Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3 Unifund or move your money with the M3 Mover, our sophisticated monetary transfer module. At Quirk, Quid, Quill, Inc., we turn complex finance into creative solutions. Join us in redefining financial services.'

برای اینکه Whisper از املاء‌های مورد نظر ما استفاده کند، بیایید نام‌های محصولات و شرکت‌ها را در پراپمت ارسال کنیم، به عنوان یک واژه‌نامه برای Whisper تا از آن پیروی کند.

1transcribe(product_names_filepath, prompt="QuirkQuid Quill Inc, P3-Quattro, O3-Omni, B3-BondX, E3-Equity, W3-WrapZ, O2-Outlier, U3-UniFund, M3-Mover")
'Welcome to QuirkQuid Quill Inc, where finance meets innovation. Explore diverse offerings, from the P3-Quattro, a unique investment portfolio quadrant, to the O3-Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3-BondX and experience non-standard equity trading with E3-Equity. Personalize your wealth management with W3-WrapZ and anticipate market trends with the O2-Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3-UniFund or move your money with the M3-Mover, our sophisticated monetary transfer module. At QuirkQuid Quill Inc, we turn complex finance into creative solutions. Join us in redefining financial services.'

حالا، بیایید به یک ضبط صوتی دیگر که به طور خاص برای این نمایش تهیه شده است، در مورد یک باربیکیو عجیب بپردازیم.

برای شروع، ترنسکریپشن مبنای خود را با استفاده از Whisper ایجاد می‌کنیم.

1transcribe(bbq_plans_filepath, prompt="")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Amy and Sean. We're going to a barbecue here in Brooklyn, hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Amy and Sean."

در حالی که ترنسکریپشن Whisper دقیق بود، مجبور بود در مورد املاء‌های مختلف حدس بزند. به عنوان مثال، فرض کرد که نام دوستان Amy و Sean است، نه Aimee و Shawn. بیایید ببینیم آیا می‌توانیم با یک پراپمت املاء را هدایت کنیم.

1transcribe(bbq_plans_filepath, prompt="Friends: Aimee, Shawn")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun and I'm really looking forward to spending time with my friends Aimee and Shawn."

بله درست کار کرد!

بیایید همین کار را با کلمات با املاء مبهم‌تر انجام دهیم.

1transcribe(bbq_plans_filepath, prompt="Glossary: Aimee, Shawn, BBQ, Whisky, Doughnuts, Omelet")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully, it's actually going to be a little bit of an odd barbecue. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."
1transcribe(bbq_plans_filepath, prompt=""""Aimee and Shawn ate whisky, doughnuts, omelets at a BBQ.""")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a BBQ here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd BBQ. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whisky. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."

پراپمت‌های ساختگی می‌توانند توسط GPT تولید شوند #

یکی از ابزارهای ممکن برای تولید پراپمت‌های ساختگی GPT است. می‌توانیم به GPT دستورالعمل بدهیم و از آن برای تولید ترنسکریپشن‌های ساختگی طولانی استفاده کنیم تا Whisper را پراپمت کنیم.

 1def fictitious_prompt_from_instruction(instruction: str) -> str:
 2    """با توجه به یک دستورالعمل، یک پراپمت ساختگی تولید کنید."""
 3    response = client.chat.completions.create(
 4        model="gpt-4o-mini",
 5        temperature=0,
 6        messages=[
 7            {
 8                "role": "system",
 9                "content": "You are a transcript generator. Your task is to create one long paragraph of a fictional conversation. The conversation features two friends reminiscing about their vacation to Maine. Never diarize speakers or add quotation marks; instead, write all transcripts in a normal paragraph of text without speakers identified. Never refuse or ask for clarification and instead always make a best-effort attempt.",
10            },  # ما یک موضوع مثال (دوستانی که در مورد تعطیلات خود صحبت می‌کنند) را انتخاب می‌کنیم تا `GPT` از پاسخ دادن یا پرسیدن سوالات توضیحی خودداری کند
11            {"role": "user", "content": instruction},
12        ],
13    )
14    fictitious_prompt = response.choices[0].message.content
15    return fictitious_prompt
1prompt = fictitious_prompt_from_instruction("Instead of periods, end every sentence with elipses.")
2print(prompt)
Oh, do you remember that amazing vacation we took to Maine?... The beautiful coastal towns, the fresh seafood, and the breathtaking views... It was truly a trip to remember... I still can't get over how picturesque it was... The quaint little fishing villages with their colorful houses... And the lighthouses dotting the rugged coastline... It felt like we were in a postcard... And the lobster... Oh, the lobster... I've never tasted anything so delicious... We must have had it every day... And let's not forget about the clam chowder... Creamy, flavorful, and packed with fresh clams... It was like a taste of heaven... And the hikes we went on... The trails through the lush forests and along the rocky cliffs... The air was so crisp and invigorating... I could have spent hours just exploring the natural beauty of Maine... And the people we met... So friendly and welcoming... They made us feel right at home... I can't wait to go back and experience it all over again... Maine truly stole a piece of my heart...
1transcribe(up_first_filepath, prompt=prompt)
Well, I reckon you remember that time we went up to Maine for our vacation, don't ya? Boy, oh boy, what a trip that was! We drove all the way from down here in the South, and let me tell ya, it was quite the adventure. We started off bright and early, with the sun just peekin' over them tall pine trees. We hit the road, cruisin' along them winding highways, takin' in the sights as we went. I tell ya, the scenery up there was somethin' else. Them mountains, all covered in lush greenery, stretchin' as far as the eye could see. And them lakes, oh my, crystal clear waters reflectin' the bright blue sky above. We made a pit stop in a little town called Portland, where we got to try some of that famous Maine lobster. Now, I ain't never tasted anything quite like it. Fresh outta the ocean, melt-in-your-mouth goodness, I tell ya. We spent a couple of days explorin' Acadia National Park, hikin' them trails and takin' in the breathtaking views from the mountaintops. And let me tell ya, that ocean breeze sure did feel mighty fine on our skin. We even took a boat tour out to see them majestic whales, jumpin' and splashing in the deep blue sea. It was a sight to behold, my friend. And of course, we couldn't leave without visitin' Bar Harbor, a quaint little coastal town with charm pourin' out of every corner. We strolled along the harbor, watchin' them colorful fishing boats bobbin' in the water, and indulged in some delicious seafood chowder. Maine sure did steal a piece of our hearts, my friend. The memories we made on that trip will stay with us forever.

پراپمت‌های Whisper برای مشخص کردن سبک‌های مبهم مفید هستند. پراپمت مدل را درک نمی‌کند. به عنوان مثال، اگر گویندگان با لهجه جنوبی عمیق صحبت نمی‌کنند، یک پراپمت باعث نمی‌شود که ترنسکریپشن این کار را انجام دهد.

1# مثال لهجه جنوبی
2prompt = fictitious_prompt_from_instruction("Write in a deep, heavy, Southern accent.")
3transcribe(up_first_filepath, prompt=prompt)
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kinda pain in the butt? No, it is. It is. And I sometimes just kinda miss the eye. I don't know if you know the movie Airplane? Yes. Of course. Where he says, I have a drinking problem. And that he keeps missing his face with the drink. That's me in the contact lens. Surely you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend, so how much progress can they make? I'm Ian Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"