نوشتن یونیت تست با پرامپت چند مرحله‌ای

نوشتن یونیت تست با پرامپت چند مرحله‌ای

prompt-engineering
preview

برای انجام وظایف پیچیده‌ای مثل نوشتن یونیت تست برای کد پایتون بهتر است از روش پراپمت چند مرحله ای یا chain of thoughts استفاده کنیم. برخلاف یک پرامپت تکی، یک پرامپت چند مرحله‌ای متن را از GPT تولید کرده و سپس آن متن را به پرامپت‌های بعدی می‌دهد.

این روش می‌تواند در مواردی که می‌خواهید GPT قبل از پاسخ دادن به موضوع فکر کند یا قبل از انجام کاری ابتدا برای آن برنامه‌ریزی کند، مفید باشد.

در این notebook، از یک پرامپت ۳ مرحله‌ای برای نوشتن یونیت تست‌ در Python استفاده می‌کنیم که شامل مراحل زیر است:

  1. توضیح: با دادن یک تابع Python، از GPT می‌خواهیم که توضیح دهد تابع چه کاری انجام می‌دهد و چرا.
  2. برنامه‌ریزی: از GPT می‌خواهیم که مجموعه‌ای از یونیت تست‌ها برای تابع برنامه‌ریزی کند. در اینجا منظور ما از برنامه‌ریزی چیزی شبیه به در نظر گرفتن تست های مختلف برای پوشش دادن حالت‌های مختلف است.
  3. اجرا: در نهایت، به GPT دستور می‌دهیم که یونیت تست‌هایی را بر اساس برنامه‌ریزی انجام شده بنویسد.

نوشتن توابع کمکی #

برای اجرای کدهای زیر ابتدا باید یک کلید API را از طریق پنل کاربری گیلاس تولید کنید. برای این کار ابتدا یک حساب کاربری جدید بسازید یا اگر صاحب حساب کاربری هستید وارد پنل کاربری خود شوید. سپس، به صفحه کلید API بروید و با کلیک روی دکمه “ساخت کلید API” یک کلید جدید برای دسترسی به Gilas API بسازید.
 1# imports needed to run the code in this notebook
 2import ast  # used for detecting whether generated Python code is valid
 3import os
 4from openai import OpenAI
 5
 6client = OpenAI(
 7    api_key=os.environ.get(("GILAS_API_KEY", "<کلید API خود را اینجا بسازید https://dashboard.gilas.io/apiKey>")), 
 8    base_url="https://api.gilas.io/v1/" # Gilas APIs
 9)
10
11color_prefix_by_role = {
12    "system": "\033[0m",  # gray
13    "user": "\033[0m",  # gray
14    "assistant": "\033[92m",  # green
15}
16
17
18def print_messages(messages, color_prefix_by_role=color_prefix_by_role) -> None:
19    """Prints messages sent to or from GPT."""
20    for message in messages:
21        role = message["role"]
22        color_prefix = color_prefix_by_role[role]
23        content = message["content"]
24        print(f"{color_prefix}\n[{role}]\n{content}")
25
26
27def print_message_delta(delta, color_prefix_by_role=color_prefix_by_role) -> None:
28    """Prints a chunk of messages streamed back from GPT."""
29    if "role" in delta:
30        role = delta["role"]
31        color_prefix = color_prefix_by_role[role]
32        print(f"{color_prefix}\n[{role}]\n", end="")
33    elif "content" in delta:
34        content = delta["content"]
35        print(content, end="")
36    else:
37        pass

نوشتن تابعی برای تولید یونیت تست‌ها #

در زیر تابعی را مشاهده می‌کنید که کار تولید یونیت تست ها را به عهده دارد. توجه کنید که چطور این تابع خروجی‌های تولید شده توسط GPT به عنوان ورودی برای مرحله بعدی فراخوانی GPT استفاده می‌کند.

لطفا برای درک بهتر عملکرد کد به کامنت‌های فارسی داخل کد توجه کنید.

  1# example of a function that uses a multi-step prompt to write unit tests
  2
  3def unit_tests_from_function(
  4    function_to_test: str,  # Python function to test, as a string
  5    unit_test_package: str = "pytest",  # unit testing package; use the name as it appears in the import statement
  6    approx_min_cases_to_cover: int = 7,  # minimum number of test case categories to cover (approximate)
  7    print_text: bool = False,  # optionally prints text; helpful for understanding the function & debugging
  8    explain_model: str = "gpt-4o-mini",  # model used to generate text plans in step 1
  9    plan_model: str = "gpt-4o-mini",  # model used to generate text plans in steps 2 and 2b
 10    execute_model: str = "gpt-4o-mini",  # model used to generate code in step 3
 11    temperature: float = 0.4,  # temperature = 0 can sometimes get stuck in repetitive loops, so we use 0.4
 12    reruns_if_fail: int = 1,  # if the output code cannot be parsed, this will re-run the function up to N times
 13) -> str:
 14    """Returns a unit test for a given Python function, using a 3-step GPT prompt."""
 15
 16    # مرحله اول: ارزیابی تابع ورودی و توضیح نحوه عملکرد آن
 17
 18    explain_system_message = {
 19        "role": "system",
 20        "content": "You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. You carefully explain code with great detail and accuracy. You organize your explanations in markdown-formatted, bulleted lists.",
 21    }
 22    explain_user_message = {
 23        "role": "user",
 24        "content": f"""Please explain the following Python function. Review what each element of the function is doing precisely and what the author's intentions may have been. Organize your explanation as a markdown-formatted, bulleted list.
 25        ```python
 26        {function_to_test}
 27        ```""",
 28    }
 29
 30    explain_messages = [explain_system_message, explain_user_message]
 31    if print_text:
 32        print_messages(explain_messages)
 33
 34    explanation_response = client.chat.completions.create(model=explain_model,
 35    messages=explain_messages,
 36    temperature=temperature,
 37    stream=True)
 38
 39    explanation = ""
 40    for chunk in explanation_response:
 41        delta = chunk.choices[0].delta
 42        if print_text:
 43            print_message_delta(delta)
 44        if "content" in delta:
 45            explanation += delta.content
 46    explain_assistant_message = {"role": "assistant", "content": explanation}
 47
 48    # مرحله دوم: برنامه ریزی برای تولید چندین یونیت تست بر اساس کد تابع و توضیحات تولید شده در مرحله قبل
 49
 50    # Asks GPT to plan out cases the units tests should cover, formatted as a bullet list
 51    plan_user_message = {
 52        "role": "user",
 53        "content": f"""A good unit test suite should aim to:
 54        - Test the function's behavior for a wide range of possible inputs
 55        - Test edge cases that the author may not have foreseen
 56        - Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain
 57        - Be easy to read and understand, with clean code and descriptive names
 58        - Be deterministic, so that the tests always pass or fail in the same way
 59
 60        To help unit test the function above, list diverse scenarios that the function should be able to handle (and under each scenario, include a few examples as sub-bullets).""",
 61    }
 62
 63    plan_messages = [
 64        explain_system_message,
 65        explain_user_message,
 66        explain_assistant_message,
 67        plan_user_message,
 68    ]
 69    
 70    if print_text:
 71        print_messages([plan_user_message])
 72    plan_response = client.chat.completions.create(model=plan_model,
 73    messages=plan_messages,
 74    temperature=temperature,
 75    stream=True)
 76
 77    plan = ""
 78    for chunk in plan_response:
 79        delta = chunk.choices[0].delta
 80        if print_text:
 81            print_message_delta(delta)
 82        if "content" in delta:
 83            explanation += delta.content
 84    plan_assistant_message = {"role": "assistant", "content": plan}
 85
 86    # مرحله ۲-۲: اگر توضیحات تولید شده خیلی کوتاه است از مدل می‌خواهیم که کار خود را مجددا انجام دهد. برای بررسی میزان توضیحات تعداد بولت پوینت های تولید شده را می‌شماریم.
 87
 88    num_bullets = max(plan.count("\n-"), plan.count("\n*"))
 89    elaboration_needed = num_bullets < approx_min_cases_to_cover
 90    if elaboration_needed:
 91        elaboration_user_message = {
 92            "role": "user",
 93            "content": f"""In addition to those scenarios above, list a few rare or unexpected edge cases (and as before, under each edge case, include a few examples as sub-bullets).""",
 94        }
 95        elaboration_messages = [
 96            explain_system_message,
 97            explain_user_message,
 98            explain_assistant_message,
 99            plan_user_message,
100            plan_assistant_message,
101            elaboration_user_message,
102        ]
103        if print_text:
104            print_messages([elaboration_user_message])
105
106        elaboration_response = client.chat.completions.create(model=plan_model,
107        messages=elaboration_messages,
108        temperature=temperature,
109        stream=True)
110
111        elaboration = ""
112        for chunk in elaboration_response:
113            delta = chunk.choices[0].delta
114        if print_text:
115            print_message_delta(delta)
116        if "content" in delta:
117            explanation += delta.content
118        elaboration_assistant_message = {"role": "assistant", "content": elaboration}
119
120    # مرحله سوم: تولید یونیت تست ها بر اساس خروجی مرحله قبل
121
122    # create a markdown-formatted prompt that asks GPT to complete a unit test
123    package_comment = ""
124    if unit_test_package == "pytest":
125        package_comment = "# below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator"
126    execute_system_message = {
127        "role": "system",
128        "content": "You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. You write careful, accurate unit tests. When asked to reply only with code, you write all of your code in a single block.",
129    }
130    execute_user_message = {
131        "role": "user",
132        "content": f"""Using Python and the `{unit_test_package}` package, write a suite of unit tests for the function, following the cases above. Include helpful comments to explain each line. Reply only with code, formatted as follows:
133
134        ```python
135        # imports
136        import {unit_test_package}  # used for our unit tests
137        {{insert other imports as needed}}
138
139        # function to test
140        {function_to_test}
141
142        # unit tests
143        {package_comment}
144        {{insert unit test code here}}
145        ```""",
146    }
147
148    execute_messages = [
149        execute_system_message,
150        explain_user_message,
151        explain_assistant_message,
152        plan_user_message,
153        plan_assistant_message,
154    ]
155
156    if elaboration_needed:
157        execute_messages += [elaboration_user_message, elaboration_assistant_message]
158
159    execute_messages += [execute_user_message]
160    if print_text:
161        print_messages([execute_system_message, execute_user_message])
162
163    execute_response = client.chat.completions.create(model=execute_model,
164        messages=execute_messages,
165        temperature=temperature,
166        stream=True)
167
168    execution = ""
169    for chunk in execute_response:
170        delta = chunk.choices[0].delta
171        if print_text:
172            print_message_delta(delta)
173        if delta.content:
174            execution += delta.content
175
176    # check the output for errors
177    code = execution.split("```python")[1].split("```")[0].strip()
178    try:
179        # پارس کردن کد تولید شده برای اینکه از صحت سینتکس آن مطمپن شویم
180
181        ast.parse(code)
182    except SyntaxError as e:
183        print(f"Syntax error in generated code: {e}")
184        if reruns_if_fail > 0:
185            print("Rerunning...")
186            return unit_tests_from_function(
187                function_to_test=function_to_test,
188                unit_test_package=unit_test_package,
189                approx_min_cases_to_cover=approx_min_cases_to_cover,
190                print_text=print_text,
191                explain_model=explain_model,
192                plan_model=plan_model,
193                execute_model=execute_model,
194                temperature=temperature,
195                reruns_if_fail=reruns_if_fail
196                - 1,  # decrement rerun counter when calling again
197            )
198
199    # return the unit test as a string
200    return code
 1example_function = """def pig_latin(text):
 2    def translate(word):
 3        vowels = 'aeiou'
 4        if word[0] in vowels:
 5            return word + 'way'
 6        else:
 7            consonants = ''
 8            for letter in word:
 9                if letter not in vowels:
10                    consonants += letter
11                else:
12                    break
13            return word[len(consonants):] + consonants + 'ay'
14
15    words = text.lower().split()
16    translated_words = [translate(word) for word in words]
17    return ' '.join(translated_words)
18"""
19
20unit_tests = unit_tests_from_function(
21    example_function,
22    approx_min_cases_to_cover=10,
23    print_text=True
24)
[system]
You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. You carefully explain code with great detail and accuracy. You organize your explanations in markdown-formatted, bulleted lists.

[user]
Please explain the following Python function. Review what each element of the function is doing precisely and what the author's intentions may have been. Organize your explanation as a markdown-formatted, bulleted list.

def pig_latin(text):
    def translate(word):
        vowels = 'aeiou'
        if word[0] in vowels:
            return word + 'way'
        else:
            consonants = ''
            for letter in word:
                if letter not in vowels:
                    consonants += letter
                else:
                    break
            return word[len(consonants):] + consonants + 'ay'

    words = text.lower().split()
    translated_words = [translate(word) for word in words]
    return ' '.join(translated_words)


[user]
A good unit test suite should aim to:
- Test the function's behavior for a wide range of possible inputs
- Test edge cases that the author may not have foreseen
- Take advantage of the features of `pytest` to make the tests easy to write and maintain
- Be easy to read and understand, with clean code and descriptive names
- Be deterministic, so that the tests always pass or fail in the same way

To help unit test the function above, list diverse scenarios that the function should be able to handle (and under each scenario, include a few examples as sub-bullets).

[user]
In addition to those scenarios above, list a few rare or unexpected edge cases (and as before, under each edge case, include a few examples as sub-bullets).

[system]
You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. You write careful, accurate unit tests. When asked to reply only with code, you write all of your code in a single block.

[user]
Using Python and the `pytest` package, write a suite of unit tests for the function, following the cases above. Include helpful comments to explain each line. Reply only with code, formatted as follows:

# imports
import pytest  # used for our unit tests
{insert other imports as needed}

# function to test
def pig_latin(text):
    def translate(word):
        vowels = 'aeiou'
        if word[0] in vowels:
            return word + 'way'
        else:
            consonants = ''
            for letter in word:
                if letter not in vowels:
                    consonants += letter
                else:
                    break
            return word[len(consonants):] + consonants + 'ay'

    words = text.lower().split()
    translated_words = [translate(word) for word in words]
    return ' '.join(translated_words)


# unit tests
# below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator
{insert unit test code here}

execute messages: [{'role': 'system', 'content': 'You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. You write careful, accurate unit tests. When asked to reply only with code, you write all of your code in a single block.'}, {'role': 'user', 'content': "Please explain the following Python function. Review what each element of the function is doing precisely and what the author's intentions may have been. Organize your explanation as a markdown-formatted, bulleted list.\n\n```python\ndef pig_latin(text):\n    def translate(word):\n        vowels = 'aeiou'\n        if word[0] in vowels:\n            return word + 'way'\n        else:\n            consonants = ''\n            for letter in word:\n                if letter not in vowels:\n                    consonants += letter\n                else:\n                    break\n            return word[len(consonants):] + consonants + 'ay'\n\n    words = text.lower().split()\n    translated_words = [translate(word) for word in words]\n    return ' '.join(translated_words)\n\n```"}, {'role': 'assistant', 'content': ''}, {'role': 'user', 'content': "A good unit test suite should aim to:\n- Test the function's behavior for a wide range of possible inputs\n- Test edge cases that the author may not have foreseen\n- Take advantage of the features of `pytest` to make the tests easy to write and maintain\n- Be easy to read and understand, with clean code and descriptive names\n- Be deterministic, so that the tests always pass or fail in the same way\n\nTo help unit test the function above, list diverse scenarios that the function should be able to handle (and under each scenario, include a few examples as sub-bullets)."}, {'role': 'assistant', 'content': ''}, {'role': 'user', 'content': 'In addition to those scenarios above, list a few rare or unexpected edge cases (and as before, under each edge case, include a few examples as sub-bullets).'}, {'role': 'assistant', 'content': ''}, {'role': 'user', 'content': "Using Python and the `pytest` package, write a suite of unit tests for the function, following the cases above. Include helpful comments to explain each line. Reply only with code, formatted as follows:\n\n```python\n# imports\nimport pytest  # used for our unit tests\n{insert other imports as needed}\n\n# function to test\ndef pig_latin(text):\n    def translate(word):\n        vowels = 'aeiou'\n        if word[0] in vowels:\n            return word + 'way'\n        else:\n            consonants = ''\n            for letter in word:\n                if letter not in vowels:\n                    consonants += letter\n                else\n                    break\n            return word[len(consonants):] + consonants + 'ay'\n\n    words = text.lower().split()\n    translated_words = [translate(word) for word in words]\n    return ' '.join(translated_words)\n\n\n# unit tests\n# below, eachtest case is represented by a tuple passed to the @pytest.mark.parametrize decorator\n{insert unit test code here}\n```"}]
1print(unit_tests)
# imports
import pytest

# function to test
def pig_latin(text):
    def translate(word):
        vowels = 'aeiou'
        if word[0] in vowels:
            return word + 'way'
        else:
            consonants = ''
            for letter in word:
                if letter not in vowels:
                    consonants += letter
                else:
                    break
            return word[len(consonants):] + consonants + 'ay'

    words = text.lower().split()
    translated_words = [translate(word) for word in words]
    return ' '.join(translated_words)


# unit tests
@pytest.mark.parametrize('text, expected', [
    ('hello world', 'ellohay orldway'),  # basic test case
    ('Python is awesome', 'ythonPay isway awesomeway'),  # test case with multiple words
    ('apple', 'appleway'),  # test case with a word starting with a vowel
    ('', ''),  # test case with an empty string
    ('123', '123'),  # test case with non-alphabetic characters
    ('Hello World!', 'elloHay orldWay!'),  # test case with punctuation
    ('The quick brown fox', 'ethay ickquay ownbray oxfay'),  # test case with mixed case words
    ('a e i o u', 'away eway iway oway uway'),  # test case with all vowels
    ('bcd fgh jkl mnp', 'bcday fghay jklway mnpay'),  # test case with all consonants
])
def test_pig_latin(text, expected):
    assert pig_latin(text) == expected

مطمئن شوید که هر کدی را قبل از استفاده بررسی کنید، زیرا GPT ممکن است اشتباهات زیادی مرتکب شود (به خصوص در وظایف مبتنی بر کاراکتر مانند این). برای دریافت بهترین نتایج پیشنهاد می‌دهیم که از قوی‌ترین مدل GPT (در این تاریخ gpt-4o) استفاده کنید.